Cloud Connected

Thoughts and Ideas from the Gitana Development Team

Introduction to Changeset Versioning

Cloud CMS provides you with content repositories that are powered by a “changeset” versioning model.  This a powerful versioning model that you won’t find in most conventional CMS products.  It’s one of the reasons why Cloud CMS is such a great platform for collaboration!

Document-level Versioning

A lot of legacy CMS products feature document-level versioning.  With document-level versioning, when you make a change to a document, the system simply increments a version counter.  You end up with multiple versions of your document.

It might look something like the following:

We all have or had an awesome grandparent who knew how to cook something good. For a recipe stored in a Microsoft Word file, the document-versioning model works pretty well!

Problems with Document-level Versioning

That said, there are some major drawbacks.

  1. Desktop Documents Only.  Document-level versioning is really only good for desktop documents (like Microsoft Office files) where everything (all of your nested images, fonts, etc) are contained within a single file.

    That’s why Dropbox uses file-level versioning.  It makes sense for people who work almost exclusively with desktop documents.
     
  2. No way to handle Sets of Changes.  If you’re working on mobile applications, web sites, or just about any non back-office projects, your content will be spread over multiple files.

    Think about a web site.  A web site might have hundreds or thousands of files - things like HTML, CSS, JS, image files and much more.  When you publish a web site, you really want to version the full set of files all at once so that you can push, pull and roll back updates to your web site.
     
  3. Bottlenecks.  If you’ve ever worked with Microsoft Sharepoint or any document-versioning CMS, then you’re aware of the bottlenecks that get introduced when two people want to work on something at the same time.  Either they both make changes (and you have to manually merge them together) or one person locks the file and the other person is sits on their hands.

    Most products that feature document-level versioning do so simply because it’s easy to implement.  However, it leaves your business users with the extremely limited tools for collaboration.  This makes collaboration frustrating as it cuts off people’s initiative, creativity and productivity.
     
  4. No ability to scale.  Okay, so let’s suppose now that you want to scale your content ingestion and production capabilities out to the broader world.  You might want to pull in content from Twitter, Facebook or Quora in real-time.  And let a broad community collaborate together…

    Nah, forget it.  With document-level versioning, that’d be like give everyone a phone and telling them to call each other.

    And then only giving them one phone line.

Changeset Versioning

Fortunately, this problem has been solved.  The solution comes out of the source control world and it is known as distributed “changeset versioning”.

If you’ve ever used Git, Mercurial or any modern source control software, then you’re already familiar with the concept.  It’s been around for awhile and has become extremely popular since it enables folks to work unimpeded, fully distributed and without any of the headaches of file locking and so forth.

It should be noted.  Cloud CMS is the only Content Management System to offer changeset versioning.  We’re it.  Why?  I suppose because it is hard to implement.  

And maybe because everyone else is busy chasing the desktop document problem.  However, if you’ve ever try to build a web or mobile app or tried consuming social content from Twitter, Facebook, LinkedIn, etc… well, then you know it’s all about JSON, XML, object relationships, lots of composite documents, highly concurrent writes and reads and so on!

Only your sales person will believe that a document-versioning system could be used for that purpose!

Changeset Versioning: The Basics

This article by no means intends to provide a Masters thesis on how changeset versioning works.  However, lets delve into the basics!

Let’s start with writing, editing and deleting content.  

When you write content into the Cloud CMS repository, your content gets stored on a “changeset”.  A changeset is a lot like a transparency (from the old transparency projector days).  This is a see-through sheet of plastic that you write on with one of those Sharpie pens.  The projector projects whatever you write up onto the screen.

The cool thing about transparencies is that you can layer them, one on top of the other.  What ends up getting projected is the composite of everything layered together.

So when you write content, the repository basically gets a new transparency and puts your content onto it.

If you make a change, it gets out another transparency, writes your change and layers it on top.

It also does this if you delete something.  It gets out a new transparency, masks (or covers up) your content so that it appears deleted.  

However, your content isn’t really deleted.  It is safe and tucked away somewhere in the stack of transparencies.  It’s just been hidden by the top-most transparency!

You can write as many things onto a changeset (transparency) as you want.  Cloud CMS manages the changesets for you, keeps them in a nice stack and lets you roll back changes if you make a mistake anywhere along the way.

Changeset Versioning: Branches and Merges

As noted, Cloud CMS manages your changesets for you.  The “stack” of changesets is known as a Branch.  As you add more changesets to the branch, the length of the branch gets longer (just like the stack of transparencies gets thicker).

A read operation simple pulls information out of the repository.  A write or a delete adds a new changeset.  Consider the branch shown below.  The reading operation just peeks at the branch looking down from the top.  The writing operation adds a new changeset.

With just a single branch, you can still get into the situation where two people want to change the same file at the same time.  Cloud CMS lets you lock the object and all that kind of thing if you want.  Or, you can create new branches so that everyone can work together at the same time and on the same things.

It kind of looks like this:

Here we have two workspaces.  Each workspace has its own branch which was stemmed off of the Master Branch at changeset V5.  The first user works on Branch A and the second user works on Branch B.  Both Branch A and Branch B have a common ancestor (changeset V5 in the Master Branch).

This allows both users to do whatever they want without stepping on each other’s toes. They can update documents, delete things and create new content.  At any time, they can push and pull changes between their workspace and any other workspace.  This gives them a way to preview what other people are working on and merge their work into their own branches.  They can also merge back to the Master Branch.

Cloud CMS provides an elegant merge algorithm that walks the changeset history tree from the common ancestor on up.  It uses a JSON differencing algorithm to allow for JSON property-level conflicts (as opposed to document level conflicts).  And it provides content model and scriptable policy validation for the merged result.

The result is a highly collaborative experience that encourages your users to experiment and take a shot at contributing without the worry of blocking others or screwing up the master content.

In a future blog, we’ll cover the details of how branching and merging works.  Our approach is one that did not seek to reinvent the wheel but rather ride on top of the wonderful innovation that has already occurred over the last decade within source control tools like Mercurial, Git and Bazaar.

OAuth2, Clients and Authentication Grants

One of the things that I really like about our approach to server authorization is that we’ve elected to get completely behind the OAuth2 specification.

Cloud CMS provides support for all of the OAuth2 flows.  We provide an authorization and resource server so that you can separate concerns and perform the full three-legged “auth code” flow.  Or you can simplify things and use something like a “password” or “implicit” flow depending on the security environment of your application.

For environments like HTML5/JS, we continue to recommend (rather strongly) that you employ a full untrusted “auth code” handshake.

When we started out, we briefly dabbled with OAuth 1.0 before realizing that it was tedious to use (much less implement).  Signatures needed to be computed and passed along with every request.  This meant that if you wanted to serve assets out of the repository directly, information would have to be either encoded within URLs or it would need to be available as a cookie.  And if you were going to store a cookie, then you had to implement some kind of server-side token registry with expiration and refresh in order to offer any kind of assurance of fidelity.  However, to do so meant that we’d be drifting from the spec and doing our own thing.

Fortunately, lots of other vendors were noticing the same shortcomings of OAuth 1.0 (not really shortcomings, per se, but definitely things that were not within the bounds of the specification to address).  OAuth2 represents a best effort by Facebook, Twitter, LinkedIn and a whole host of other vendors to address many of the issues we found we needed to deal with.

Cloud CMS - Clients

We decided early on that we wanted platform owners to be able to create as many OAuth2 “client” key/secret combinations as they wanted.  That way, they could provision client/keys on a per-application basis.  Or they could have a single client/key service a whole bunch of applications.

Furthermore, if for any reason a client/key combination were compromised (as in, some hacker out there figured out your client secret), you could monitor this, identity it and then shut down the client.  Create a new client key/secret, issue it and away you go.

Thus, we let you manage as many client key/secrets as you’d like.

In addition, we let you define on a per client key/secret basis what kinds of features or flows you’d like to enable.  You might restrict certain clients from participating in an untrusted client flow (like the “implicit” flow).  You can just toggle this stuff on and off.

Cloud CMS - Authentication Grants

Another feature that wanted to implement is what we can “authentication grants”.  These are basically alternative “username/password” combinations that you can grant to a principal running on a specified client.  Sounds kind of complicated, right?

The idea is that you often have a mobile app that just wants to sign on to Cloud CMS as a user.  You set up the user ahead of time.  The user might be called “app”.  To sign on, you need to send username and password information over the wire.

Anytime you send password information over the wire, there is a risk.  We fallback to HTTPS (SSL), so the risk in transport (i.e. over the network) is minimized.  OAuth2 requires SSL for this very reason.

However, there is still a risk in the application code itself.  What if someone could crack it open and see what password you set up?  Fortunately, this isn’t very easy to do with compiled application code like native iOS, Android or Appcelerator Titanium code.

But there is a really big problem if you’re using HTML5/JS running the browser.  Basically, anything running in a browser is completely snoop-able.  You don’t have to be a really proficient “hacker” to open up the source code of the application and find the embedded passwords in the <script></script> blocks.

We provide a few tools to protect against this.

One is to create an Authentication Grant object which provisions alternative credentials (a key and secret) that can be used to authenticate as the “app” user against a specific client key/secret combination (and only that combination).  That way, if a snooper were to figure out the Authentication Grant secret, they’d still only be able to use it for a) the “app” user and b) for that specific client.

Thus, if you detect foul play, you can shut down the Authentication Grant.  Worst case, they gain wrongful access to that one user on that one application.  The upside is that they can’t gain greater access to your platform.  They’re constrained, you can detect it, and shut it down.

Another tool is to specify valid Domain URLs from which token-requesting authorization calls are allowed to arrive.  You can constrain those who wish to authenticate for a given Client or Authorization Grant so that they must arrive from a specified set of Domain URLs.  That way, if someone tries to use your Authentication Grant username/password for a completely different application, it won’t work.

This works but it’s also pretty easy to trick.  HTTP headers can be manipulated and things like that.  However, it’s a good safeguard and yet another way that you can detect foul play.

Finally, we offer auto-provisioning of Client keys and Authentication Grant keys running in an implicit “untrusted” capacity.  If you use any of the Gitana drivers, you can elect to have the appropriate client/auth keys sent to you when you connect.  These can then change once application deployment or even once per-connection.  This is really the most full-proof way to bolster security for HTML5/JS applications.

It should also be mentioned that in all cases we fully support OAuth2 refresh tokens and expiration of access tokens.  So all the while, the client code must re-assert the validity of its tokens which offers all kinds of chances to detect and stop tokens from being tampered with.

Cloud CMS supports OAuth2 for everything in its REST API and provides convenience functions with all of its drivers so that connecting and working with Cloud CMS is completely transparent.  You normally don’t have to deal with any of the OAuth2 capabilities under the hood.  But, should you need to, we’ve really given you a good engine so that you can ensure the security of your mobile applications.

Semiotic Systems and Web Site Design

During my freshman year of college, I took a class in semiotic systems.  It was a French Literature class and was also part of the Women’s Studies program.  I was one of the five males in the class who, like me, were all engineering students.  And, like me, they were all in over there heads.  Surrounded by women who were much smarter than we were.  Yet, we soldiered on.

The class proved to be very interesting as it dealt with “semiotic systems” which, at the time, was a completely new field for me.  We studied the role of symbolism and inference in marketing and specifically how it related to women.  Essentially, the course provided an in-depth study of the perception of females in advertisements, poetry, literature, fairy tales, commercials and the like. 

It was my first crash course on the subject.  So it was pretty new to me.  I later learned, of course, that those systems basically form the backbone for all marketing.  Since the dawn of time.

While working on Cloud CMS, we’ve thought intensely about the brand we want to develop and the market that we’re going after.  For us, the key intangible that we make possible is the sense of “customer touch” or “interaction” that can now be realized by having information flow sensibly both to and from the customer.  Intelligibly.  And in real-time.

We know that businesses work on content and push it out to the audience.  Through social streams, crowd-sourcing and interactivity through applications, the audience now pushes information back at the business.  Until recently, the latter channel didn’t exist.  But today it absolutely does exist.  And it’s growing.

Cloud CMS brings this together into a focal point around the notion of “customer touch”.  We also think of it in terms of “social resonance”.  These are exciting concepts to us that we feel paint the vision of what we’re all about.

We’re very serious about this vision and producing the technology to enable it.  For us, the branding exercise is more about setting our course and building the platform to get there.  We hate overselling.  We don’t like companies that manipulate advertising just to make a buck.  We don’t want to be a company that lies.  We want to say what we’re about and then stick to it.

So, while working on the web site, we’ve been looking for imagery that shares this vision and gets people excited about what it means.  This is, after all, where the world is heading.  We want to share that, find the people who, like us, are excited about it and then go there all together.

I’m very happy with the work we’ve done on the web site so far.  However, I’m also a little surprised at how the focal point in our imagery seems to center, so far, around the figure of the female hand.  Or more specifically, the female finger touching a virtual surface (i.e. an application running on an iPad or a kiosk running in a store).

This wasn’t intentional.  It really wasn’t.  It seemed to just work.  

However, the lessons from my freshman year Women’s Studies course are not forgotten.  The implications around female touch are there.  The submissive finger posture, the bestowing of female attributes upon inanimate objects.  I see that in the imagery.  I know we’re not trying to manipulate symbolism and yet… it seems we are.

I suppose if my French Lit teacher were to browse to our web site, she might find her way to this blog entry.  And, then, I might have a chance to explain that we’re still a work in progress.  We may adjust the imagery if we end up deeming it manipulative.  

One thing is for sure.  I know my professor would point out that I only got a B+ in the class.  Perhaps if I’d studied a little more, I’d be able to write an intelligible blog post about my perceptions on the matter.