Extractable

QName: f:extractable

Indicates that a node should have metadata properties extracted from its primary binary attachment. When applied to a node, Cloud CMS will look at the attachment's MIME type and consider the binary object as a potential source of metadata properties that should be extracted and set onto the JSON of the document ahead of write.

This capability is triggered at write time whenever the binary attachment is updated. It does not necessarily indicate that the JSON document will receive property values but rather that it is a candidate to receive property values if the binary attachment can be extracted from.

Cloud CMS understands many MIME types out-of-the-box including Microsoft Office document formats, PDF and several others.

Configuration

Extracted Properties

Different properties will be extracted depending on the kind of file you extract from and what the extractor engine was able to determine and find within your file.

The following core properties will be extracted for all file types (if available):

  • altitude
  • comments
  • contributor
  • created
  • creator
  • creatorTool
  • description
  • format
  • identifier
  • keywords
  • language
  • latitude
  • longitude
  • metadataDate
  • modified
  • modifier
  • originalResourceName
  • printDate
  • publisher
  • rating
  • relation
  • rights
  • source
  • title
  • type

The following properties will be extracted for image file types (if available):

  • imageHeight
  • imageWidth

Title and Description

If a node does not have a title property at the time of the extraction and the extracted properties contain the title core property, the title will be applied to the node.

Similarly, if a node does not have a description property at the time of the extraction and the extracted properties contain the description core property, the description will be applied to the node.

Example: JPEG image

Suppose we have a content type that has the f:extractable feature as a mandatory feature. It might look like this:

{
    "title": "Image",
    "type": "object",
    "_qname": "my:image",
    "properties": {
        "title": {
            "title": "Title",
            "type": "string"
        }
    },
    "mandatoryFeatures": {
        "f:extractable": {}
    }
}

Suppose that we now create a my:image instance with a default attachment. The default attachment is a JPEG image of mimetype image/jpeg named chuño.jpg.

The moment we save this, the default attachment will be interrogated and information will be extracted from it. As a result, our JSON will end up looking something like this:

{
    "title": "dog.jpeg",
    "_features": {
        "f:extractable": {
            "extracted": {
                "default": {
                    "title": "chuño.jpg",
                    "description": "Chuño the wonder dog",                    
                    "imageHeight": "768",
                    "imageWidth": "768",
                    "created": "2015-01-18T23:31:01",
                    "modified": "2015-05-08T12:44:45"                     
                }
            }
        }
    }
}