Binary Files

Cloud CMS lets you upload any kind of desktop file. The system will automatically detect the type of content that you upload and will inspect and work with the content to provide the following services:

  • Automatic antivirus scanning to detect malicious files
  • Extraction of metadata properties from the payload body and headers
  • Conversion of content type to thumbnails (using image transformation)
  • Detection and extraction of text for full-text search
  • Execution of custom rules to further process the file and trigger event handlers such as calling out to web hook endpoints, launching workflows, sending emails or running custom server-side scripts

The result is a new content item that retains the original binary file but also gains the benefits from the series of operations list above. The original binary file is stored as a default attachment on the newly created content item. This binary file can be retrieved at any time.

When you make changes to the default attachment going forward, the series of steps listed above will be repeated. As such, your editorial team simply works with files and these services run automatically in the background.

For more information on Attachments, please see the section on Attachments below or check out our documentation on Content Attachments.

Binary Storage Providers

Cloud CMS stores binary files into a Binary Storage provider that on-premise installations can configure or adjust according to your company's needs.

As noted in the previous section, a series of services execute on the binary file when it is uploaded. After it has been worked with, Cloud CMS will send the file over to the configured Binary Storage provider.

Note: If you're using Cloud CMS on SaaS, we use Amazon S3 to store the binary file.

The following Binary Storage Providers are available:

Binary files are stored using directory structures (key prefixes) that allow for fast object retrieval from any of these systems. These storage paths are optimized for retrieval and write speed and are subject to implementation changes.

The actual API retrieval of these files, on the other hand, utilizes a simple filename convention. You simply retrieve the binary resource without having to worry about how the back end storage is managed.

MongoDB Grid FS is the default storage mechanism and is left enabled for Docker installations.

The public cloud uses Amazon S3 which is what we recommend for all production installs that you intend to scale elastically across multiple API server instances

The local file system implementation can be used for single-server development boxes.

Every binary retains information about the content type, length and stream. The filename is also optionally stored with the binary.

Data Store Binaries

Every data store in Cloud CMS supports straight binary storage. If you have CREATE_SUBOBJECTS permissions against the data store, you can store binaries within it.

For example, you might upload a file to a Repository like this:

POST /repositories/{repositoryId}/files/{filename}

The HTTP POST's content type header and stream will be read from and written into the binary object identified by filename.

You can then retrieve the file like this:

GET /repositories/{repositoryId}/files/{filename}

Or delete it like this:

DELETE /repositories/{repositoryId}/files/{filename}

Attachments

In addition to raw data store storage, Cloud CMS also supports per-object storage called "attachments". Attachments are similar in concept to email attachments in that they're additional binary payloads attached to a JSON object. They aren't part of the JSON data itself but rather ancillary binary parts that ride alongside it.

The Attachments API is delivered as a suffix to the object itself. The upload and download mechanisms are the same and the HTTP headers are all worked with in the same way.

For example, to upload an attachment to a content node, you'd do this:

POST /repositories/{repositoryId}/branches/{branchId}/nodes/{nodeId}/attachments/{attachmentId}

Where the attachment is identified by attachmentId. The only way to then retrieve this attachment is to scope the request against the node, as in:

GET /repositories/{repositoryId}/branches/{branchId}/nodes/{nodeId}/attachments/{attachmentId}

Under the hood, attachments are simply binary files that are stored against the very same GridFS, S3 or file storage systems. In the case of attachments, the filenames are dynamically generated on the fly so as to associate the underlying file with the object it is being attached to. This generation of filenames helps to ensure there are no collisions and that storage is optimal.

For more information on attachments, please read about Attachments.