What is a Data Product Platform?

About this Article
Published October 10 2025
By Michael Uzquiano
Tags:
data product, governance, rag

Gitana allows you to package up data from your existing, messy data sources into first-class data products that be shared, reused and trained to deliver precise answers that solve your business problems.

Gitana changes the game by giving your data the same lifecycle discipline as other traditional products. Your data is thought of in the same way as software developers might think of source code. It is versioned, validated, approved, governed and measured -- so that teams can package up the truth and make it available to the business.

With that internal data truth in hand, Gitana further allows you to generate and train externally-facing data sets which provide answers to specific customers, problems and domains.

These externally-facing data sets are sometimes referred to as "data as a product". Today, they're very useful for defining the high performing corpus text required to power the best retrieval-augmented generated (RAG) applications.

What is a Data Product?

A data product is a curated, trustworthy, and well-described data asset that solves a specific business problem. It has a defined schema including properties, metadata and relational information. In Gitana, these data products are stored in a graph that captures the full set of relationships and the multi-level, nested structure inherent in a complex schema.

A data product also includes quality checks such as validation logic, business rules and policies that govern the quality and consistency of the data. Access control policies let you hone in on ownership and the various roles that teams or groups within your organization may play in terms of managing the product.

In Gitana, data products are stored in a Git-like versioning repository that features multi-object, commit-level versioning with branches, tags and fork/merge mechanics. Data products can be worked on in parallel and teams are free to push and pull changes between workspaces. The best ideas are encouraged to percolate upward into the final result.

Every change to every object is captured in an auditable change history. Data observability is provided for the full lifetime of an object, from its inception to its disposition (i.e. deletion or transition to archival). Interfaces are provided to support discovery, retrieval and consumption of the data product. These include APIs for query, search, traversal, GraphQL, vector search and more.

Data Products are curated by a governance team that inspects changes and signs off on adjustments. This includes changes that flow in from data ingestion processes.

Ultimately, a Data Product is an internal capture of valuable business information. It may be discovered or reused within the business as needed for internal business decisions.

What is "Data as a Product"?

The high-quality, internal data products described in the previous section are useful to your business in many ways. For example, they play an important role in terms of making business decisions.

That said, they are also the critical components needed to build new, externally facing data sets that could be shipped to customers or used to support customer applications and customer decision-making.

Consider a RAG (retrieval-augmented generation) application. This application receives a question from a customer. It then must construct a prompt to send to an LLM that encapsulates the question asked but also provides supporting documents from its corpus.

A naive approach to generating this corpus is to simply take your raw data and chunk it from retrieval by the DB (usually a vector DB) that supports the RAG application. In practice, this approach invariably fails because your raw data hasn't been cleaned, sanitized, or enriched. Quite simply, it isn't trustworth data.

On the other hand, your Product Data is trustworthy. One could already improve the corpus greatly by only relying on trusted product data to generate it.

However, your Product Data is also intended for an internal audience. It may contain information that you wouldn't want to see become part of the RAG application's corpus. This may include sensitive personal information, part numbers, pricing details and more.

To solve this problem, it is important to ask what information does the customer-facing RAG application need? That is to say, what types of answers do you need to provide? What types of questions are going to be asked?

By knowing this, you can generate and provide a corpus to the RAG application that is generated and structured in such a way as to provide a perfect match to a given question. If the customer were to ask:

What are the most durable running shoes for people with plantar fasciitis?

You could pre-emptively generate your corpus to provide this answer. In fact, you could provide many variations of this answer for different types of scenarios (i.e. for women, for 4th graders, in Hawaii, etc).

In that sense, the corpus itself is something your company can generate and release as a product in its own right. We refer to this as "Data as a Product".

Data as a Product is a reflection of your internal data that has been constrained and calibrated for the precise needs of the target application or customer. It provides optimal answers to deliver the greatest possible end-user experience and most positive customer impact.

For these externally-facing "data as a product" offerings, success is measured by the delivery of customer value, the target SLA (service-level agreements), revenue and compliance.

The Gitana Data Product Platform

Gitana provides a platform for managing the ingestion, governance, training and production of data products. It also provides the ability to incrementally generate and deploy the external, customer-facing datasets that power your live RAG applications.

Our Data Product Platform includes:

  • Lifecycle management: from ingestion to deployment
  • Quality and trust: automated checks, lineage, and versioning
  • Governance and safety: security, auditing, and policy enforcement
  • Collaboration: human-in-the-loop review and approvals
  • Delivery: consistent, measurable distribution to apps and customers

Get started today with a free trial.

We'd love to be a part of the success of your business.