Custom Indexes
Cloud CMS internally maintains indexes to improve the speed of your data lookups across all data store types. In most
cases, there is no need to concern yourself with these as they deliver optimal performance.
With respect to content repositories, however, Cloud CMS allows you to define custom database indexes on a
per-branch basis. These custom indexes add to the default set that Cloud CMS automatically maintains.
Each branch and/or snapshot maintains its own "tip" collection that provides a concise master view
of the branch looking from the head back to the root with all deletes normalized and removed. Each branch also
maintains a transaction collection used during the course of a running transaction so that data operations always have
a consistent view of the world.
These collections provide default, sensible indexes for achieving the things that Cloud CMS is typically used for such
as lookup for associations or quick graph traversal. In most cases, these indexes suffice just fine.
However, for large volumes of data or often relied-upon custom queries with interesting sub-object structures, you will
want to provide custom indexes at the branch level so that queries against these branch collections run faster and
deliver efficient performance for your apps.
Indexes
An index is essentially a pre-calculated list of results that is sorted in such a way as to make things fast.
Consider the White Pages. The White Pages is a printed phone catalog that lets you find a phone number for somebody if
you know their name. Each entry in the catalog consists of:
- firstName
- lastName
- phoneNumber
If the White Pages didn't have any indexes, then the printed entries would be in any random order. The first entry
might start with X and the next one with A and the next one with M. There would be no sensible order and so finding
people would be very exhaustive. You'd literally have to go through the entries one at a time!
Fortunately, the good people who print the White Pages have indexed their data. They index primarily on lastName,
then firstName and then phoneNumber. The index orders things so that you can find people more quickly if you know
their last name. You just flip to the correct first letter and work your way down.
By having an index in place, the lookup time for an entry is much, much faster.
Our lookup time will remain more or less constant as more and more entries are added.
Indexes are defined using a simple convention. An index has a name
and a set of properties that form the
key for the index
.
For the White Pages example, the index might look like this:
"primary" -> {
"lastName": 1,
"firstName": 1
}
The index is named primary
and it sorts first by "lastName" (ascending) and the "firstName" (ascending).
The value 1
is used to indicate ascending order and the value -1
is used to indicate
descending order.
You may also utilize dot-delimitted properties to index on nested structures. Here is an example of the same index
where firstName
and lastName
are child properties of a person
object:
"primary" -> {
"person.lastName": 1,
"person.firstName": 1
}
Branch Collections
Cloud CMS provides an API to create, drop and list custom indexes that you've created per branch. These indexes can
also be created via the Cloud CMS user interface.
When you create or drop indexes, Cloud CMS will apply those changes right away. New indexes will be created and a
background process will begin to index your data. In this way, you may have many custom indexes, each serving the
purpose of optimizing your lookup times for different queries.
Suppose we define an Article within our repository branch. It might look like this:
{
"title": "Article",
"type": "object",
"properties": {
"author": {
"type": "string"
},
"location": {
"type": "string"
},
"body": {
"type": "string"
}
}
}
Suppose now that our application wants to run a query like this:
{
"author": {
"$in": ["joe", "bob", "frank", "laura"]
}
}
This will work fine with reasonable volumes of data. But if you have hundreds of thousands or millions of content
items, the fact that Cloud CMS does not automatically index the author
field will prove to be problematic.
Your queries will still work fine but you will notice them getting slower over time as more data is added.
To solve this, let's add a custom index for the author
field.
Using the user interface, go to the Project Setting within your project.
Then, select Indexes:
We then click the button to Create a Custom Index.
Give the index the name author1
. The index name must be lowercase and simple. No spaces, nothing fancy.
It just has to be unique. You can not have two indexes on a branch with the same name.
And then define the index like this:
{
"author": 1
}
And click Create to save. That's it. You've now added an index where content is sorted by author.
All of your content will be indexed to take advantage of this new index right away.
The index automatically sorts the author
property in ascending order and in doing so, helps Cloud CMS
to optimize your queries to run faster. Essentially, the idea is that Cloud CMS no longer needs to look through all
of your content items one at a time. It has a fast index for the author
field and so it can run your
query much faster.
Composite indexes are also possible. You might want to query for authors in specific cities. Suppose you have
hundreds of thousands or millions of documents, each with an author at a specific location. You could define the
index on the branch like this:
{
"author": 1,
"location": 1
}
And then you could run more efficiently for dual-key queries like this:
{
"author": {
"$in": ["joe", "bob", "frank", "laura"]
},
"location": {
"$regex": "san",
"$options": "i"
}
}
This would find everything written by Joe, Bob, Frank and Laura in, say, San Diego, San Francisco, San Lucas and
Santa Monica. And, well, many other Sans! If you can think of any. Which at the moment, frankly, I can't!
(Editor's note: San Marcos, San Juan, Santa Clara, Santiago...)