Transfer
Cloud CMS provides a universal import and export facility that lets you transfer your data in and out of Cloud CMS
installations. It also allows you to copy or move data from one Cloud CMS installation to another.
Everything in Cloud CMS is portable in this respect. You are always free to export your data and download it. You're
also free to upload your data and import it.
This universal transfer service is very well suited for:
- Backing up your content or projects
- Moving your content or projects between environments (for example, from QA to Production)
- Reusing interesting content sets across projects
Archives
The Archive is the storage format for the transfer service. When you export content, it is written into an Archive.
An Archive is a ZIP file with a manifest and some metadata on it.
Archives are stored within Vaults by default.
When you export something in Cloud CMS, that operation will run in the background (asynchronously, as a background
job). When it finishes, an Archive will be placed into a Vault. You can then download that Archive.
Similarly, you can upload an Archive to a Vault. Once the Archive is uploaded, you can import it.
Archives are identified by three primary properties:
groupId
- a namespace for your archiveartifactId
- the name of the archiveversionId
- the version of the archive
An Archive ZIP filename is constructed from these parts like this:
{groupId}-{artifactId}-{versionId}.zip
For example, suppose you work for Acme Corporation and you're exporting backups of a project named Web Site
. You
might choose the following:
groupId
-com.acme
artifactId
-website
versionId
-1.0.0
When the Archive is exported, the ZIP file will be named:
com.acme-website-1.0.0.zip
How it Works
Let's take a look at how export and import work.
Exporting
You may export any object or datastore from Cloud CMS. You may also choose to export multiple objects or datastores.
There are known as source dependencies.
The export process walks over these source dependencies and considers each one. For each, it looks to see if there any
other dependencies that should be exported to satisfy the needs of a holistic export. For example, if you were to export
a Domain, the exporter checks to see if the Domain has any Users or Groups. If so, those Users and Groups are added
as sub-dependencies of the source dependency. The exporter then dives down into those sub-dependencies to see if they,
in turn, have further dependencies.
At the end of the day, the exporter produces an export graph. The export graph consists of a set of dependency chains
and their relationships. The exporter's goal is to put enough information into the export so that a generated Archive
will have enough information in it so as to be useful to a later import.
The export process also features the ability to export binary attachments, ACLs, Access Control policies and other
aspects for every dependency it finds.
When exporting, you may specify options to limit the range of content that is included. You may specify start and
end modification dates (for any object or data store) and also ranges of changesets (for content node exports).
All of this ends up becoming part of the Archive. The dependency graph structure is stored in a top-level document
called manifest.json
. The manifest is wholly descriptive of the contents of the ZIP and can be used by external
processes to parse and piece together the contents of the ZIP (if you wish to traverse the file set manually).
Nodes and Associations
There are some special considerations given to content nodes (which is fitting, given that Cloud CMS is a Content
Management System). Here are a few things to consider:
The exporter will export both node and associations. Associations will be exported for cases where it makes sense
for the holistic representation of the content structure.The exporter will walk the
a:owned
anda:child
associations of any exported nodes and will include the connecting
node in addition to the connector node. As such, if you have complex structures that are connected via Owned or
Child relationships, the entirety of the complex structure will be exported together.
As an example, suppose you have a Book definition (my:book
), a Page definition (my:page
) and a Has Page
association definition (my:has-page
) that extends from a:owned
.
You might create a book like this:
Book 1
-> Has Page 1
Page 1
-> Has Page 2
Page 2
-> Has Page 3
Page 3
If you export Book 1
, the following dependencies would be exported in total:
Node - Book 1
Node - Page 1
Node - Page 2
Node - Page 3
Owned Association - Has Page 1
Owned Association - Has Page 2
Owned Association - Has Page 3
- The exporter may optionally specify the
contentIncludeFolders
setting. If this option is settrue
, the export
may include folder chains leading up to the source dependency or dependencies.
For example, suppose you have the following folder hierarchy:
/Images
/TCL
/Roku
65R615.png
If you were to export the file 65R615.png
using the contentIncludeFolders
option, the following dependencies
would be exported in total:
Node - Images
Node - TCL
Node - Roku
Node - 65R615.png
Child Association (Root -> Images)
Child Association (Images -> TCL)
Child Association (TCL -> Roku)
Child Association (Roku -> 65R615.png)
If you were to then import the resulting Archive into a branch, the folder structures would be merged with the target.
You'd see all the folders that you'd expect on the target after import.
Importing
You can import an Archive into any Cloud CMS installation. This may be the same Cloud CMS installation from which
you exporter or it may be an entirely different Cloud CMS installation (perhaps on your laptop or perhaps in a remote
data center somewhere else in the world). Archives are intended to be wholly inclusive so that you can move them
from installation to installation and things should "just work".
The import process looks at the incoming Archive's manifest.json
and considers all of the dependencies that are
contained in the Archive. It also considers the import target and looks at the datastores and objects that already
exist. It compares the contents of the Archive with any existing content and figures out how to fit and stitch the
incoming content into the target. This includes merging properties, overwrites, collision detection, substitution of
IDs and much more.
Collision Detection
The import process compares the contents of the incoming Archive with the target and its existing objects. One of its
fundamental obsessions is the discovery of any content that may collide with the incoming data set. To
determine this, the importer fundamentally looks at:
The
_doc
of the dependency. If a target datastore or object has the same_doc
, then it is considered to be
a collision.If the incoming object specifies a
_existing
object, then that object is used as a query to discover any potential
collisions on the target. You must be careful when using_existing
to ensure that it will resolve to a result set of
size 1.
In addition, if you're importing Nodes or Associations, the following is also checked for collision:
The
_qname
of the dependency. If a target node or association has the same_qname
, then it is considered to
be a collision.The path of the node. If a target node exists at the same path, then it is considered to be a collision.
When collisions are detected, they are automatically handled.
Strategies
You may specify an Import Strategy to determine the strategy used by the importer to handle IDs from the Archive.
The following strategies are supported:
CLONE
COPY_EVERYTHING
CLONE
The CLONE
strategy is the default strategy. With this strategy, all of the IDs in the incoming Archive are retained.
This means that if you import the same thing twice into an empty target, the first time will result in the creation
of the data set and the second time will result in a merge of the data set.
With CLONE
, you don't have an option to produce copies of things. The IDs are fixed and retained.
The CLONE
strategy is good for replication scenarios (such as replicating content across data centers) or
incremental backup. It is also good for some publishing scenarios when you're publishing across branches or
across projects.
COPY_EVERYTHING
The COPY_EVERYTHING
strategy will result in a new ID being generated for every import dependency. If you have 1000
items in the Archive, all 1000 items will receive new IDs in the target. In addition, Cloud CMS will perform ID
substitution across the entire imported structure, resolving any ID adjustments to relators, links and associations
and more.
With COPY_EVERYTHING
, you can stamp out multiple copies of things as many times as you want. Each copy is uniquely
ID'd and its substructure is preserved albeit with new IDs as well.
The COPY_EVERYTHING
strategy is good for scenarios where you want to duplicate things and you expected the duplicated
content to maintain no future relationship with the original content. This applies to some deployment scenarios and
actions such as Copy.
Nodes and Associations
When importing a node or the contents of a branch (multiple nodes), the importer process provides transactional commits.
This means that if your Archive contains multiple nodes or associations, those dependencies will import all within a
single transaction. If anything fails, the entire transaction rolls back.
This transactional behavior of nodes and associations and branches is distinct from how the rest of the importer works.
If you were to import a Domain, for example, and it had 100 users, those users would import along with the Domain. If
the 51st User failed to import, the import job on the whole would fail but you'd still be left with a Domain that
has 50 users (the 51st having failed and errored out the job).
However, if you import a set of 100 nodes and associations, those dependencies would import in a single transaction.
If the 51st item failed to import, the import job would fail completely and everything would be rolled back. At the
end of the day, 0 of the 100 nodes and associations would be present in your target branch.
As noted in the section on Collision Detection, the import of Nodes comes with some automatic support for collision
detection based on QName and path. You may opt to use the copyOnExisting
import setting to have the import create
a copy instead of merging (which is the default behavior).
If copyOnExisting
is set true
and you're importing a folder structure like this:
/Images
/TCL
/Roku
65R615.png
And the path /Images/TCL/Roku/65R615.png
already exists on the target, your import will yield the following in the
target branch:
/Images
/TCL
/Roku
65R615.png
65R615.png (Copy 1)
API
This section provides API-levels on how to use the transfer service.
Export
To export an archive, you will generally want to POST to the URL of a resource with an /export
at the end.
For example, you can read a Node like this:
GET /repositories/{repositoryId}/branches/{branchId}/nodes/{nodeId}
To export the Node, you might do:
POST /repositories/{repositoryId}/branches/{branchId}/nodes/{nodeId}/export
To export a Branch, you might do:
POST /repositories/{repositoryId}/branches/{branchId}/export
And to export a Project, you might do:
POST /projects/{projectId}/export
In all of these cases, the following request parameters must be provided:
group
- the Archive group IDartifact
- the Archive artifact IDversion
- the Archive versionvault
- the ID of the Vault where the Archive should be createdschedule
- should beASYNCHRONOUS
In addition, the POST payload should be a JSON object that provides the export configuration.
This is a key/value map.
The following may be specified:
Property | Type | Default | Description |
---|---|---|---|
startDate | long | Only include objects who modification date are after this time (epoch millis). This is used for partial or rolling exports. | |
endDate | long | Only include objects who modification date are before this time (epoch millis). This is used for partial or rolling exports. | |
includeACLs | boolean | true | Whether to include ACLs during export. |
includeTeams | boolean | true | Whether to include Teams during export. |
includeTeamMembers | boolean | true | Whether to include Team members during export. |
includeActivities | boolean | true | Whether to include Activities during export. |
includeBinaries | boolean | true | Whether to include Binaries for data stores during export. |
includeAttachments | boolean | true | Whether to include Attachments for Attachables during export. |
includeRoles | boolean | true | Whether to include any custom Roles during export. |
For exports of repositories, the following apply:
Property | Type | Default | Description |
---|---|---|---|
startChangeset | text | Only include data that is relevant to changesets at or beyond the specified changeset. | |
endChangeset | text | Only include data that is relevant to changesets at or before the specified changeset. | |
selectedBranchIds | array | Allows you to specify the branches that should be exported |
For exports of branches or nodes, the following apply:
Property | Type | Default | Description |
---|---|---|---|
tipChangesetOnly | boolean | false | Exports only the tip view of the content, not preserving the changeset history but instead only copying what the user sees. |
endChangeset | text | Only include data that is relevant to changesets at or before the specified changeset. |
For exports of nodes, the following apply:
Property | Type | Default | Description |
---|---|---|---|
contentIncludeFolders | boolean | false | Whether to include all parent folders for an exported node. This allows the exact folder structure to be likewise imported on thet target. |
Export Response
An export produces an export job that runs in the background. Your API call will return with a response like this:
{
"_doc": "{jobId}"
}
You can then poll for completion of the job:
GET /jobs/{jobId}
The state
field will be any of the following:
NONE,
WAITING
RUNNING
FINISHED
ERROR
PAUSED
AWAITING
It will settle at either FINISHED
if the job completes successfully or ERROR
if there is a problem.
The job will have some interesting properties on it:
Property | Type | Description |
---|---|---|
archiveGroup | text | The Group ID of the Archive. |
archiveArtifact | text | The Artifact ID of the Archive. |
archiveVersion | text | The Version ID of the Archive. |
vaultId | text | The ID of the Vault where the archive is stored. |
configuration | object | The export configuration settings being utilized by the job. |
sources | array | An array of objects which describe the source data stores or objects being exported. |
When the job completes, you can download the archive like this:
GET /vaults/{vaultId}/archives/download
And pass the following request parameters:
groupId
artifactId
versionId
Where these should match what's in the job.
Import
To import, you must first have an Archive in a vault.
Let's assume you have an Archive on disk. The first step is to upload the Archive to a vault.
POST /vaults/{vaultId}/archives
Upload your ZIP file to that endpoint using either a direct ZIP payload or a multi-part POST.
You should get back something like:
{
"contentType": "application/zip",
"length": 12345,
"objectId": "{objectId},
"groupId": "{groupId}",
"artifactId": "{artifactId}",
"versionId": "{versionId}",
"_doc": "{id}
}
You should then check to see if the Archive is ready. Typically, Cloud CMS performs some antivirus scanning or other
processing of the ZIP file upon upload and so it may take a few seconds. You can poll like this:
GET /vaults/{vaultId}/archives/{archiveId}
You may get back a 404 until the Archive is available.
Import Target
When importing, you need to decide what you're importing into. Here are some examples:
- Data Stores (Repositories, Domains, etc) are imported into Platforms
- Branches are imported into Repositories
- Nodes are imported into Branches
Thus, to import a Node into a Branch, you can make an API call like this:
POST /repositories/{repositoryId}/branches/{branchId}/import
The following request parameters must be provided:
group
- the Archive group IDartifact
- the Archive artifact IDversion
- the Archive versionvault
- the ID of the Vault where the Archive should be createdschedule
- should beASYNCHRONOUS
In addition, the POST payload should be a JSON object that provides the import configuration.
This is a key/value map.
The following may be specified:
Property | Type | Default | Description |
---|---|---|---|
strategy | string | CLONE | Either `CLONE` or `COPY_EVERYTHING`. Use `CLONE` to keep the same IDs on import. Use `COPY_EVERYTHING` to generate all new IDs for all entries on every import. |
childrenOnly | boolean | false | Whether to skip the import of the top-most thing and only import the children. This is primarily use for branches. Using this, you can import a branch export into a target branch (by specifying `childrenOnly` so that the nodes inside the branch are what get imported). |
includeACLs | boolean | true | Whether to include ACLs during import. |
includeTeams | boolean | true | Whether to include Teams during import. |
includeTeamMembers | boolean | true | Whether to include Team Members during import. |
includeRoles | boolean | true | Whether to include Roles during import. |
includeActivities | boolean | true | Whether to include Activities during import. |
includeBinaries | boolean | true | Whether to include Binaries for data stores during import. |
includeAttachments | boolean | true | Whether to include Attachments for Attachables during import. |
For node imports (or branch imports with childrenOnly
), the following applies:
Property | Type | Default | Description |
---|---|---|---|
copyOnExisting | boolean | false | If a collision is detected for a node in a folder, instead of overwriting, add a suffix to the file indicating that it is a copy. |
Import Response
An import produces an import job that runs in the background. Your API call will return with a response like this:
{
"_doc": "{jobId}"
}
You can then poll for completion of the job:
GET /jobs/{jobId}
The state
field will be any of the following:
NONE,
WAITING
RUNNING
FINISHED
ERROR
PAUSED
AWAITING
It will settle at either FINISHED
if the job completes successfully or ERROR
if there is a problem.
The job will have some interesting properties on it:
Property | Type | Description |
---|---|---|
archiveGroup | text | The group id of the archive. |
archiveArtifact | text | The artifact id of the archive. |
archiveVersion | text | The version id of the archive. |
vaultId | text | The vault id of the archive. |
configuration | object | The import configuration settings being utilized by the job. |
targets | array | An array of objects which describe the target data stores or objects that the job is importing into. |
imports | array | An array of objects which describe the newly created or updated data stores or objects that are a result or product of the import process. |