Transfer

Cloud CMS provides a universal import and export facility that lets you transfer your data in and out of Cloud CMS installations. It also allows you to copy or move data from one Cloud CMS installation to another.

Everything in Cloud CMS is portable in this respect. You are always free to export your data and download it. You're also free to upload your data and import it.

This universal transfer service is very well suited for:

  • Backing up your content or projects
  • Moving your content or projects between environments (for example, from QA to Production)
  • Reusing interesting content sets across projects

Archives

The Archive is the storage format for the transfer service. When you export content, it is written into an Archive. An Archive is a ZIP file with a manifest and some metadata on it.

Archives are stored within Vaults by default.

When you export something in Cloud CMS, that operation will run in the background (asynchronously, as a background job). When it finishes, an Archive will be placed into a Vault. You can then download that Archive.

Similarly, you can upload an Archive to a Vault. Once the Archive is uploaded, you can import it.

Archives are identified by three primary properties:

  • groupId - a namespace for your archive
  • artifactId - the name of the archive
  • versionId - the version of the archive

An Archive ZIP filename is constructed from these parts like this:

{groupId}-{artifactId}-{versionId}.zip

For example, suppose you work for Acme Corporation and you're exporting backups of a project named Web Site. You might choose the following:

  • groupId - com.acme
  • artifactId - website
  • versionId - 1.0.0

When the Archive is exported, the ZIP file will be named:

com.acme-website-1.0.0.zip

How it Works

Let's take a look at how export and import work.

Exporting

You may export any object or datastore from Cloud CMS. You may also choose to export multiple objects or datastores. There are known as source dependencies.

The export process walks over these source dependencies and considers each one. For each, it looks to see if there any other dependencies that should be exported to satisfy the needs of a holistic export. For example, if you were to export a Domain, the exporter checks to see if the Domain has any Users or Groups. If so, those Users and Groups are added as sub-dependencies of the source dependency. The exporter then dives down into those sub-dependencies to see if they, in turn, have further dependencies.

At the end of the day, the exporter produces an export graph. The export graph consists of a set of dependency chains and their relationships. The exporter's goal is to put enough information into the export so that a generated Archive will have enough information in it so as to be useful to a later import.

The export process also features the ability to export binary attachments, ACLs, Access Control policies and other aspects for every dependency it finds.

When exporting, you may specify options to limit the range of content that is included. You may specify start and end modification dates (for any object or data store) and also ranges of changesets (for content node exports).

All of this ends up becoming part of the Archive. The dependency graph structure is stored in a top-level document called manifest.json. The manifest is wholly descriptive of the contents of the ZIP and can be used by external processes to parse and piece together the contents of the ZIP (if you wish to traverse the file set manually).

Nodes and Associations

There are some special considerations given to content nodes (which is fitting, given that Cloud CMS is a Content Management System). Here are a few things to consider:

  1. The exporter will export both node and associations. Associations will be exported for cases where it makes sense for the holistic representation of the content structure.

  2. The exporter will walk the a:owned and a:child associations of any exported nodes and will include the connecting node in addition to the connector node. As such, if you have complex structures that are connected via Owned or Child relationships, the entirety of the complex structure will be exported together.

As an example, suppose you have a Book definition (my:book), a Page definition (my:page) and a Has Page association definition (my:has-page) that extends from a:owned.

You might create a book like this:

Book 1
    -> Has Page 1
        Page 1
    -> Has Page 2
        Page 2
    -> Has Page 3
        Page 3

If you export Book 1, the following dependencies would be exported in total:

Node - Book 1
Node - Page 1
Node - Page 2
Node - Page 3
Owned Association - Has Page 1
Owned Association - Has Page 2
Owned Association - Has Page 3
  1. The exporter may optionally specify the contentIncludeFolders setting. If this option is set true, the export may include folder chains leading up to the source dependency or dependencies.

For example, suppose you have the following folder hierarchy:

/Images
    /TCL
        /Roku
            65R615.png
                

If you were to export the file 65R615.png using the contentIncludeFolders option, the following dependencies would be exported in total:

Node - Images
Node - TCL
Node - Roku
Node - 65R615.png
Child Association (Root -> Images)
Child Association (Images -> TCL)
Child Association (TCL -> Roku)
Child Association (Roku -> 65R615.png)

If you were to then import the resulting Archive into a branch, the folder structures would be merged with the target. You'd see all the folders that you'd expect on the target after import.

Importing

You can import an Archive into any Cloud CMS installation. This may be the same Cloud CMS installation from which you exporter or it may be an entirely different Cloud CMS installation (perhaps on your laptop or perhaps in a remote data center somewhere else in the world). Archives are intended to be wholly inclusive so that you can move them from installation to installation and things should "just work".

The import process looks at the incoming Archive's manifest.json and considers all of the dependencies that are contained in the Archive. It also considers the import target and looks at the datastores and objects that already exist. It compares the contents of the Archive with any existing content and figures out how to fit and stitch the incoming content into the target. This includes merging properties, overwrites, collision detection, substitution of IDs and much more.

Collision Detection

The import process compares the contents of the incoming Archive with the target and its existing objects. One of its fundamental obsessions is the discovery of any content that may collide with the incoming data set. To determine this, the importer fundamentally looks at:

  1. The _doc of the dependency. If a target datastore or object has the same _doc, then it is considered to be a collision.

  2. If the incoming object specifies a _existing object, then that object is used as a query to discover any potential collisions on the target. You must be careful when using _existing to ensure that it will resolve to a result set of size 1.

In addition, if you're importing Nodes or Associations, the following is also checked for collision:

  1. The _qname of the dependency. If a target node or association has the same _qname, then it is considered to be a collision.

  2. The path of the node. If a target node exists at the same path, then it is considered to be a collision.

When collisions are detected, they are automatically handled.

Strategies

You may specify an Import Strategy to determine the strategy used by the importer to handle IDs from the Archive. The following strategies are supported:

  • CLONE
  • COPY_EVERYTHING
CLONE

The CLONE strategy is the default strategy. With this strategy, all of the IDs in the incoming Archive are retained. This means that if you import the same thing twice into an empty target, the first time will result in the creation of the data set and the second time will result in a merge of the data set.

With CLONE, you don't have an option to produce copies of things. The IDs are fixed and retained.

The CLONE strategy is good for replication scenarios (such as replicating content across data centers) or incremental backup. It is also good for some publishing scenarios when you're publishing across branches or across projects.

COPY_EVERYTHING

The COPY_EVERYTHING strategy will result in a new ID being generated for every import dependency. If you have 1000 items in the Archive, all 1000 items will receive new IDs in the target. In addition, Cloud CMS will perform ID substitution across the entire imported structure, resolving any ID adjustments to relators, links and associations and more.

With COPY_EVERYTHING, you can stamp out multiple copies of things as many times as you want. Each copy is uniquely ID'd and its substructure is preserved albeit with new IDs as well.

The COPY_EVERYTHING strategy is good for scenarios where you want to duplicate things and you expected the duplicated content to maintain no future relationship with the original content. This applies to some deployment scenarios and actions such as Copy.

Nodes and Associations

When importing a node or the contents of a branch (multiple nodes), the importer process provides transactional commits. This means that if your Archive contains multiple nodes or associations, those dependencies will import all within a single transaction. If anything fails, the entire transaction rolls back.

This transactional behavior of nodes and associations and branches is distinct from how the rest of the importer works.

If you were to import a Domain, for example, and it had 100 users, those users would import along with the Domain. If the 51st User failed to import, the import job on the whole would fail but you'd still be left with a Domain that has 50 users (the 51st having failed and errored out the job).

However, if you import a set of 100 nodes and associations, those dependencies would import in a single transaction. If the 51st item failed to import, the import job would fail completely and everything would be rolled back. At the end of the day, 0 of the 100 nodes and associations would be present in your target branch.

As noted in the section on Collision Detection, the import of Nodes comes with some automatic support for collision detection based on QName and path. You may opt to use the copyOnExisting import setting to have the import create a copy instead of merging (which is the default behavior).

If copyOnExisting is set true and you're importing a folder structure like this:

/Images
    /TCL
        /Roku
            65R615.png

And the path /Images/TCL/Roku/65R615.png already exists on the target, your import will yield the following in the target branch:

/Images
    /TCL
        /Roku
            65R615.png
            65R615.png (Copy 1)

API

This section provides API-levels on how to use the transfer service.

Export

To export an archive, you will generally want to POST to the URL of a resource with an /export at the end. For example, you can read a Node like this:

GET /repositories/{repositoryId}/branches/{branchId}/nodes/{nodeId}

To export the Node, you might do:

POST /repositories/{repositoryId}/branches/{branchId}/nodes/{nodeId}/export

To export a Branch, you might do:

POST /repositories/{repositoryId}/branches/{branchId}/export

And to export a Project, you might do:

POST /projects/{projectId}/export

In all of these cases, the following request parameters must be provided:

  • group - the Archive group ID
  • artifact - the Archive artifact ID
  • version - the Archive version
  • vault - the ID of the Vault where the Archive should be created
  • schedule - should be ASYNCHRONOUS

In addition, the POST payload should be a JSON object that provides the export configuration. This is a key/value map.

The following may be specified:

Property Type Default Description
startDate long Only include objects who modification date are after this time (epoch millis). This is used for partial or rolling exports.
endDate long Only include objects who modification date are before this time (epoch millis). This is used for partial or rolling exports.
includeACLs boolean true Whether to include ACLs during export.
includeTeams boolean true Whether to include Teams during export.
includeTeamMembers boolean true Whether to include Team members during export.
includeActivities boolean true Whether to include Activities during export.
includeBinaries boolean true Whether to include Binaries for data stores during export.
includeAttachments boolean true Whether to include Attachments for Attachables during export.
includeRoles boolean true Whether to include any custom Roles during export.

For exports of repositories, the following apply:

Property Type Default Description
startChangeset text Only include data that is relevant to changesets at or beyond the specified changeset.
endChangeset text Only include data that is relevant to changesets at or before the specified changeset.
selectedBranchIds array Allows you to specify the branches that should be exported

For exports of branches or nodes, the following apply:

Property Type Default Description
tipChangesetOnly boolean false Exports only the tip view of the content, not preserving the changeset history but instead only copying what the user sees.
endChangeset text Only include data that is relevant to changesets at or before the specified changeset.

For exports of nodes, the following apply:

Property Type Default Description
contentIncludeFolders boolean false Whether to include all parent folders for an exported node. This allows the exact folder structure to be likewise imported on thet target.

Export Response

An export produces an export job that runs in the background. Your API call will return with a response like this:

{
    "_doc": "{jobId}"
}

You can then poll for completion of the job:

GET /jobs/{jobId}

The state field will be any of the following:

NONE,
WAITING
RUNNING
FINISHED
ERROR
PAUSED
AWAITING

It will settle at either FINISHED if the job completes successfully or ERROR if there is a problem.

The job will have some interesting properties on it:

Property Type Description
archiveGroup text The Group ID of the Archive.
archiveArtifact text The Artifact ID of the Archive.
archiveVersion text The Version ID of the Archive.
vaultId text The ID of the Vault where the archive is stored.
configuration object The export configuration settings being utilized by the job.
sources array An array of objects which describe the source data stores or objects being exported.

When the job completes, you can download the archive like this:

GET /vaults/{vaultId}/archives/download

And pass the following request parameters:

  • groupId
  • artifactId
  • versionId

Where these should match what's in the job.

Import

To import, you must first have an Archive in a vault.

Let's assume you have an Archive on disk. The first step is to upload the Archive to a vault.

POST /vaults/{vaultId}/archives

Upload your ZIP file to that endpoint using either a direct ZIP payload or a multi-part POST. You should get back something like:

{
    "contentType": "application/zip",
    "length": 12345,
    "objectId": "{objectId},
    "groupId": "{groupId}",
    "artifactId": "{artifactId}",
    "versionId": "{versionId}",
    "_doc": "{id}    
}

You should then check to see if the Archive is ready. Typically, Cloud CMS performs some antivirus scanning or other processing of the ZIP file upon upload and so it may take a few seconds. You can poll like this:

GET /vaults/{vaultId}/archives/{archiveId}

You may get back a 404 until the Archive is available.

Import Target

When importing, you need to decide what you're importing into. Here are some examples:

  • Data Stores (Repositories, Domains, etc) are imported into Platforms
  • Branches are imported into Repositories
  • Nodes are imported into Branches

Thus, to import a Node into a Branch, you can make an API call like this:

POST /repositories/{repositoryId}/branches/{branchId}/import

The following request parameters must be provided:

  • group - the Archive group ID
  • artifact - the Archive artifact ID
  • version - the Archive version
  • vault - the ID of the Vault where the Archive should be created
  • schedule - should be ASYNCHRONOUS

In addition, the POST payload should be a JSON object that provides the import configuration. This is a key/value map.

The following may be specified:

Property Type Default Description
strategy string CLONE Either `CLONE` or `COPY_EVERYTHING`. Use `CLONE` to keep the same IDs on import. Use `COPY_EVERYTHING` to generate all new IDs for all entries on every import.
childrenOnly boolean false Whether to skip the import of the top-most thing and only import the children. This is primarily use for branches. Using this, you can import a branch export into a target branch (by specifying `childrenOnly` so that the nodes inside the branch are what get imported).
includeACLs boolean true Whether to include ACLs during import.
includeTeams boolean true Whether to include Teams during import.
includeTeamMembers boolean true Whether to include Team Members during import.
includeRoles boolean true Whether to include Roles during import.
includeActivities boolean true Whether to include Activities during import.
includeBinaries boolean true Whether to include Binaries for data stores during import.
includeAttachments boolean true Whether to include Attachments for Attachables during import.

For node imports (or branch imports with childrenOnly), the following applies:

Property Type Default Description
copyOnExisting boolean false If a collision is detected for a node in a folder, instead of overwriting, add a suffix to the file indicating that it is a copy.

Import Response

An import produces an import job that runs in the background. Your API call will return with a response like this:

{
    "_doc": "{jobId}"
}

You can then poll for completion of the job:

GET /jobs/{jobId}

The state field will be any of the following:

NONE,
WAITING
RUNNING
FINISHED
ERROR
PAUSED
AWAITING

It will settle at either FINISHED if the job completes successfully or ERROR if there is a problem.

The job will have some interesting properties on it:

Property Type Description
archiveGroup text The group id of the archive.
archiveArtifact text The artifact id of the archive.
archiveVersion text The version id of the archive.
vaultId text The vault id of the archive.
configuration object The import configuration settings being utilized by the job.
targets array An array of objects which describe the target data stores or objects that the job is importing into.
imports array An array of objects which describe the newly created or updated data stores or objects that are a result or product of the import process.