A significant hurdle for many companies is effective and useful data management. It can be difficult to comb through vast amounts of data to find meaningful information, it’s often siloed, queries can be time consuming and it’s commonly outdated or inaccurate. A data warehouse can easily contain hundreds of terabytes, growing day by day. And only a fraction of the data stored is relevant to any given business division, department or product line. Enter the data mart.
In the data mart model, marketing, sales, finance and each operational business unit has access to the segment of the data warehouse and the information that is most relevant to them. Instead of live queries run against mammoth tables, queries run through a data mart are tight and targeted — and fast. Just as important, data marts provide frameworks to deliver actionable reports more rapidly.
The result: Groups within a business can make more efficient use of the data that helps them run.
What Is a Data Mart?
At its core, a data mart is a subject-oriented subset of a data warehouse. While still maintaining the value of a data warehouse, a data mart enables a company to serve business divisions and product lines with access to the data that is relevant to their individual operations.
However, a data mart is not simply access to raw data. Instead, data marts help turn raw data into actionable information quickly to help business units succeed. They accomplish this by providing pre-built summaries and queries specifically designed for the internal business leaders in the department or business unit to which the data in the mart applies.
- A data mart is a focused subset of a data warehouse designed to present actionable information quickly to a specific department, business unit or product line.
- Data marts blend data from a variety of sources — owned and licensed — to answer specific business questions.
- Performance is critical with data marts. Slow queries not only impact user experience but also cost more resources.
Data Marts Defined
Data marts are like little “storefronts of data” that align with a department within a larger business. The human resources department might have a data mart that draws from the company’s data warehouse, but because it includes only information about employees — salaries, locations, performance, chosen health and retirement benefits — ad hoc queries, reports and analytics all happen much faster than if run against the full data warehouse. A finance department data mart might contain all financial information for the entire firm; sales might have a data mart that enables lightning-fast pipeline analysis, showing which products are selling best and which are selling worst.
Data marts maintain the integrity and value of an overall data warehouse strategy while exposing business divisions and product lines to only the data relevant to their operations. And as the cost of database technologies continues to decline, data-driven business insights from data warehouses and data marts are becoming more accessible to small and midsize companies.
Reasons to Use a Data Mart
Any company that is managing a large enough set of data that they use the term “data warehouse” will likely benefit from deploying a data mart strategy. Here are three key reasons many companies choose to use data marts:
- Regional offices can be more responsive to their local market by having access to data relevant to local customers. Yet, data they enter into records — sales call summaries and marketing data — is fed into the central data warehouse.
- Specific departments can have rapid access to critical information with a data mart. Instead of relying on queries against all data, their data mart can combine live and scheduled queries against only the relevant subset of data.
- Ad hoc queries will generate different results depending on the parameters used. This can lead to confusion for critical parts of the business. Data marts can make sure the entire division is looking at the same set of data based on the same parameters.
Data Mart Use Cases
Since data marts are so efficient at providing department-level data, they can be useful for many different departments around an organization.
A shipping department, for example, can use a data mart and a dashboard to keep track of the time it takes for an order to go from placement to delivery. In this case, the data mart interacts with internal sales data (order placed) and combines it with carrier data (order delivered) to show how long the process takes. Similarly, the shipping department’s data mart would delve into process data collected by the department to help pinpoint ways to make shipping more efficient.
Sales might be the most data-driven part of any organization. A sales data mart could combine week-over-week, month-over-month and year-over-year data for an easy and efficient view, say, of the performance of call centers vs. an internal sales team. A sales data mart can also tap into results from the shipping department data mart to show order deliveries and returns.
Meanwhile, the marketing department data mart can track the flow of inbound leads by channel and product, help manage a nurturing program and report on how many leads ultimately converted to customers.
Who Uses Data Marts? And How?
Any company that uses key performance indicators (KPIs) to measure success or that has already collected enough data to be using a data warehouse will likely find that a data mart strategy can help them be more competitive. Some of the characteristics of the types of companies that would benefit from a data mart include the following.
- Companies with long sales cycles can use a data mart to help different departments align on the needs of the sales process.
- Companies selling complex, bespoke solutions can use data marts to manage complex proposals so when the business is won they can more quickly move into implementation.
- Companies with many years of data can use a data mart to help departments focus on current data while still maintaining the historical information.
- Companies with tight profit margins can use a data mart to reveal efficiencies that will lead to more profit, while also analyzing current processes to preserve existing profit.
- Companies with large product sets will find that data marts enable them to manage all those products better by focusing each product group on its individual mission.
3 Types of Data Marts
There are fundamentally three types of data marts: dependent, independent and a combination of the two called hybrid. What they have in common is that all three present a subject-specific set of data to the business teams that will benefit most from that particular dataset.
Data warehouse pioneers Ralph Kimball and Bill Immon defined the first two. The Immon dependent model advocates that data should flow into a central data warehouse and then be segmented into individual data marts. In contrast, Kimball proposed the independent model, in which data flows directly into individual data marts and then is aggregated into a data warehouse. With hybrid data marts, organizations take the best of each model and apply it to different departments according to their requirements.
Dependent data mart:
A Dependent data mart is built on top of a central data warehouse. Practically speaking, the data warehouse controls all the data. All data sources, including licensed third-party data, is loaded first into the central data warehouse and then the subset of the data that is needed is propagated out to the data mart.
An advantage of this model is that the bulk of data administration happens in the central repository, requiring less technical know-how at the data mart level. Also, critical issues like technology, data and storage (including backups) are handled centrally. Conversely, if the data warehouse goes offline for any reason — planned or unplanned — dependent data marts can go down with them.
Independent data mart:
Independent data marts do not rely on a central data warehouse. In fact, this model proposes that a central data warehouse does not necessarily even need to exist. Instead, independent data marts can be set up to act as standalone entities — mini data warehouses focused on the needs of specific divisions. An advantage to this model is that individual business units can run the data mart that suits them best.
Of course, with this independence comes the need for technical administrative expertise at each data mart. Plus, if data will need to be aggregated across data marts — for executive-level reporting, for instance — you will need to construct queries that access multiple data marts. Therefore, with independent data marts it becomes more important to institute an organizational taxonomy that provides standard naming for tables and fields — or a thesaurus that maps tables and fields among all the company’s data marts — to make cross-data-mart reports easier to generate.
Hybrid data mart:
Some organizations find it practical to consider a hybrid model where some data marts are dependent on a central warehouse and some exist on their own. For example, it might be more efficient to use this model as a transitionary step for new data marts. New subject-specific data sources might be easier to deploy as independent data marts. Once they have proven their value, they can be deployed through to the central data warehouse, if needed. Or the hybrid model might be a good path to integrate acquisitions. The acquiring company could treat the subsidiary’s data as an independent data mart while planning to make it conform to the business’s own data policies later, and then integrate it.
A significant consideration is the human resources required to run the hybrid model. It requires technical administrative expertise at both the central data warehouse and at the data mart level.
Which type of data mart a business chooses depends on a lot of factors, including how the company is organized. These four questions can help an organization determine which type suits it best:
- Are business units managed as standalone entities or are they considered part of a whole?
- Does company culture encourage information sharing or do individual business units hoard knowledge?
- Are individual business units staffed to handle the intricacies of data management?
- Similarly, is the central database team committed to respond rapidly to the needs of discreet business units or do bureaucracy and egos interfere?
Data Mart vs. Data Warehouse
While a data warehouse is a repository for all the data that helps a business run, a data mart is a condensed subset of business data designed for a specific purpose, business unit or department. Data marts draw on fewer, more specialized data sources. A data mart strategy might not need to include a data warehouse. Instead, the data warehouse might be the aggregate of all your data marts.
Data Mart vs. Data Lake
Typically, the raw data in data lakes has a lot less structure and has yet to be cleaned and normalized. Data marts, on the other hand, are the result of highly structured, cleaned and normalized data. More importantly, though, is that data marts are designed to provide specific solutions to individual groups while data lakes are meant for more open-ended analyses — even unanticipated ones.
Data Mart vs. Database
A database is a foundational element of an organization’s data management technology. It stores data the organization owns and often third-party licensed data in a way that can be retrieved via queries. Structured query language (SQL) is the most prevalent way that data is output from a database. A database might feed multiple data marts and, depending on the size of your data set and your data strategy, a data mart might draw from more than one database.
Comparing 4 Data Management Elements
|Data Mart||Data Lake||Data Warehouse||Database|
|Cleaned and normalized data||Y||N||Y||Y|
|Answers specific questions||Y||N||N||N|
Structure of a Data Mart
There are three schema-level and interrelated data architectures for data marts: star, snowflake and denormalized tables.
The star structure is the most straightforward of the three, and thus reduces the complexity of deploying data marts. In the star structure, business-level data is broken out into tables of facts (for example, sales data). These tables interact with relevant dimensions. For example, a sales facts table may relate directly to a dimension table that lists products.
To visualize the schema and understand where the star label comes from, look at a representation of a sales facts table with at least four attributes: date, location, product and quantity. A “location” dimension lists all available stores and connects to the facts table via a store identifier. Similarly, a dimension of available products provides a centralized and official list of all products. These are connected to the sales facts table via a product identifier. As these dimensions blossom out, you start to see a star pattern where a central table interacts with a single one-dimensional layer of related tables.
Alternatively, and perhaps more realistically, consider the case where a star-structured data mart contains dimension tables that are themselves subject to further dimensions. Using the “location” example, it seems reasonable that a location table might be subject to a geography dimension that connects stores to regions via a geography identifier. If this is the way the data warehouse is structured and the data mart’s requirement is to provide a report showing store location and region information, then you’re using a snowflake model.
The star and snowflake structures require report queries to go through sometimes extensive “joins” to connect data in multiple tables or even multiple databases. Depending on how much data is involved, these joins will reduce the responsiveness of the reports. For reports based on scheduled queries, this isn’t a user-facing problem but the length of time it takes for live queries may frustrate business leaders. It could also be a significant resource hog.
An alternative is to use denormalized tables, eliminating the joins and thus performing queries more efficiently. A denormalized tables structure brings together all the data needed for a data mart report into one table which will produce faster queries and will likely generate redundant data. While this redundant data makes inserts and updates more expensive, the bet with denormalized tables is that the efficiencies of queries outweigh those costs.
Advantages of Data Marts
The biggest advantage of data marts is efficiency, both in terms of costs and data access. Data marts cost much less to deploy than a data warehouse and access to data is much faster because data marts refer to smaller datasets. Queries into a central data warehouse can be long and arcane as they negotiate with irrelevant data. A well-constructed data mart strategy can provide business unit and departmental leaders with very fast access to the data they need.
What’s more, frequently accessed data is not necessarily frequently updated. Some queries that were previously conducted live can be presented as scheduled queries in a data mart. This gives team members access to the information they need while using only a small fraction of the computing resources previously used. But even live-updated data is delivered more efficiently through a data mart simply because it is drawn from a more focused set of data.
Another advantage of data marts is they can be independent of each other, so an outage at the central data warehouse does not have to effect individual data marts.
And when a data mart includes licensed third-party data, a key advantage is that the license cost should be lower because the user base for the data is smaller than if it were in a data warehouse.
Disadvantages of Data Marts
Since a data mart contains only the data needed by a single business group, it does not on its own provide visibility into the broader set of data a business might need. Similarly, in an independent data mart model that excludes a central data warehouse, the business may not have ready access to cross-data-mart reporting for certain kinds of high-level reports.
Some additional disadvantages:
- Data marts are not necessarily the right solution for every group and too many data marts will become difficult to manage.
- A data mart strategy that automatically propagates data might result in unbudgeted expenses. Data and technology licensing fees are a large part of the cost for any data strategy; if a company licenses marketing data for the 10 people in its marketing department, but data is propagated out to other data marts, it might be on the hook for additional fees.
- If deploying independent data marts, often-overlooked details like field name syntax could suddenly become critically important to align at a high level. Otherwise, you could run into trouble when building reports to run across data marts.
6 Steps to Implement a Data Mart
In broad strokes, there are six specific phases to deploying a data mart strategy.
Comprehensive requirement-gathering helps to plan out the design that will result in the logical, physical and technical characteristics of a strong data mart deployment. Work with the teams who will use the data mart in the organization’s business units, departments and product lines to determine what specific kinds of data they need to access and the terms under which they need access. Use this information to plan out what kinds of data will appear in which data marts.
Designing data mart strategy
The most important step in creating a data mart strategy is to determine the organization’s business goals and strategy, which will be manifested in the data mart design. During this phase, most organizations decide on their data mart architecture and make other decisions that will have a long-lasting impact on how the data marts are used. If the organization has a data warehouse, review the existing data warehouse schema as well as licensed third-party data to determine criteria that will enable data to flow properly in the new data mart plan. At the same time, consider how the existing data warehouse technology and architecture can support your data mart needs. Some modifications will likely need to be made, both in the system itself as well as the licensing. When designing data marts for far-flung regional branches, consider potential service interruptions. Not all parts of the world have excellent connectivity; the need to keep people working might dictate key elements of data mart design.
Constructing data mart architecture
This phase is where you make decisions and purchases to create and deploy the physical and logical structures of the data mart architecture. Determine what database to use. If the data mart strategy is based on existing data warehouse technology, plan out and deploy needed modifications. Pay special attention to existing licenses, users’ data access needs and speeds and service reliability to avoid unplanned expenses down the road. An important consideration at this phase is the interface that business teams will use to access data — user experience is important to make sure business teams don’t struggle to access the information. Though it may seem early, this is also a good time to consider future administrative activities. Make a specific plan for logging and analyzing user activity and access statistics (including load and response times). This will be enormously helpful when fielding support questions. Backups and redundancy are also critical to build into your system.
Populating data mart architecture
During this phase you execute the data-flow plan between the data warehouse, if you’re using one, and individual data marts. If there’s no data warehouse, then execute data flow from the appropriate sources. Questions to consider include: Where is the data that you need to include (owned as well as licensed)? What are the terms (both legal and technical) in which that data is available? What fields will be used to join and connect disparately sourced datasets? How will data be cleaned and normalized? Look at what metadata and indices will help make queries more efficient and help less technical users speak intelligently about that they’re asking. Also, identify what components of your data mart will require live querying versus which can be scheduled out.
Accessing the data marts
This is the phase when the planned subsets of your overall data warehouse can first be accessed. Set up specific queries and reports so that they can be accessed through the data mart interface. Some reports will be the product of a scheduled task, others will be based on live queries and some will be ad-hoc. Run a limited pilot to a defined set of users. This is a good opportunity to make sure users can access the information they need and that back-end systems and feeds are working according to plan. Consider how the data mart structure responds to failure, such as if access to the data warehouse fails. Make sure business leaders get informative error messages. At this stage it’s important to document everything to help administrators know how different data mart components are generated so that future modifications can be more efficient. Keep the documentation up to date.
Managing data marts
As with any technology deployment, managing data marts is an ongoing process, from user-access issues to support queries. Monitor performance and usage statistics for each data mart. Are all the reports being accessed? If data marts are not being used, it’s important to figure out why. It might be that business teams need training or that their needs were not fully defined during requirements gathering. Establish a launch period during which reports and queries can be tuned to make sure they’re giving the department or business unit the data they need. The length of this period depends on the complexity of the deployment but make sure it’s a finite period — and is widely communicated. Periodically review how third-party data is flowing into the system to make sure the data appears properly in the individual data marts. Test backup and failover processes to make sure they will work when needed. And, of course, monitor security issues such as access coming from outside your virtual private network (VPN).
Best Practices for Implementing Data Marts
There are several important best practices to keep in mind when developing a data mart strategy, many of which may also apply to a company’s broader data strategy.
Define the scope. Spend the time needed to define the core scope and policies around which data marts need to exist and what elements they should contain. This upfront effort cascades through the entire process and helps ensure that any questions that come up are answered according to the same set of rules.
Think about scalability. Experience tells us the volume of data entering the system will increase over time — sometimes dramatically. It’s important to account for this in the data mart design.
Remember responsiveness. Nothing abuses resources and hurts user experience like slow query execution. Architect for speed and set up systems to identify slow queries. Remember, today’s fast queries may be slow by tomorrow’s standards, so ongoing monitoring is important, too.
Data Mart Example
To illustrate how an effective data mart can help a business unit, consider a marketing department tracking campaign performance. Not only do they want to know what sales are generated, but also how campaign exposure led prospects to become customers.
The marketing department data mart will draw data from the web analytics platform to measure responsiveness to campaigns and trace activity on the site. It also taps remarketing and ad data to identify potential customers’ previous activity. Then it connects that data to top-level sales data such as revenue generated by campaigns and other campaigns to which prospects and customers reacted.
Instead of drawing this data from the entire history stored in the data warehouse, this marketing department needs only to access data from the past two years, enough to enable them to see trends and adjust for seasonal variables.
Meanwhile, the data mart combines arcane marketing codes with a product identifier table to create user-readable labels that make the report easier to understand.
Physical vs. Cloud vs. Virtualized Data Marts
There are many ways to execute the construction of a data mart. Data must move from independent sources, a central data warehouse or a data lake into discreet and independent data marts. These data marts, in turn, might exist in on-premises hardware or could be accessed through a cloud architecture.
An alternative to actually moving data around is to use virtualized (or virtual) data marts. With virtualized data marts, the data is not physically moved but instead accessed through virtual tables that have limited access to the central data warehouse. Virtualized data marts — which can be cloud-based or housed on-premises — don’t require the overhead or failure risk of actually moving data.
As with any cloud deployment, a key consideration when evaluating vendors is to make sure that the data you own is not irretrievably locked into that system. Make sure there is a portability plan in place for if or when you need to change vendors.
Future of Data Marts
Organizations are relying more and more on data and, with that, the need to efficiently use and gain insight from that data is also rising. In fact, company spending on big data products (hardware and software combined) will nearly double between 2020 and 2027. And by 2027, global spending on software for managing data will reach 45% of the data budget, up from 20% in 2014.
An issue that will grow with this is data quality. Companies surveyed in 2020 estimated that the average per-company cost of poor data quality approaches $13 million per year. And 27% of respondents said that new demand for self-service was the biggest challenge they face in data management.
Artificial intelligence tools and automated systems will help to mitigate the problem of data quality. Since data marts access only a subject-specific segment of a company’s overall data assets, data marts will be a key tool to help companies make effective use of the onslaught of information.
Data marts are a logical and increasingly important step in companies’ effort to turn raw data into actionable information that specific parts of a business can use to improve performance. Since data marts require extensive planning and definition, they tend to reduce ad-hoc querying and help put an entire department on the same page.
Data Mart FAQs
What is meant by data mart?
When people use the term data mart, they are referring to a focused set of data that is designed to answer the data needs of a particular part of a larger organization.
What is data mart with example?
A data mart is a subject-specific subset of a company’s data that focuses on the needs of a specific part of an organization. For example, a marketing data mart would include portions of the overall data that relate to marketing such as leads gathered, campaign results and conversions. But the data mart for marketing excludes unrelated data such as shipping information, company finances, employee salaries, etc.
Does my data mart strategy require having a central data warehouse?
No, data marts can be constructed as standalone, subject-specific, independent data marts that do not require a central data warehouse. Keep in mind, though, that there will likely be a need for executive-level reports that run across data marts.
How is a data mart different than a data warehouse?
The key difference between a data mart and a data warehouse is that data marts run queries against subsets of data from the warehouse. For example, a marketing department needs to access only specific data and may only need data from the past two years. A data warehouse, on the other hand, is a central repository for all information and could extend back many, many years. Because the data in a mart is a subset, queries can run faster and more efficiently.
Should every department have their own data mart?
While data marts provide a simplified view of data from a broader data warehouse, it’s important to be judicious in deploying them. Organizations have to decide how broadly to deploy data marts, but since there is some overhead with every data mart it’s not good data policy for every department to have their own data mart.
We already use BI tools to report on data. Do we need data marts, too?
Data marts are efficient because they contain only subsets of data required by different departments. Likewise, BI tools also become more efficient when querying a limited set of data. If a central data warehouse is not too big, BI tools accessing the data may be sufficient. But data warehouses will grow, so having a strategy in place to segment data to increase query efficiency is still a good idea.
What is data mart and its types?
A data mart is a subject-specific set of data designed for a particular department. There are three types of data marts. “Dependent” data marts are populated from a central data repository. “Independent” data marts are standalone entities and might or might not be attached to a central data warehouse. “Hybrid” data marts enable an organization to have both dependent and independent data marts.
Why do we need data mart?
A data mart is a good solution for a data-driven organization that has a large central data warehouse. With data marts, different departments use data and resources more efficiently because they only access the data that relates to them.