Deep data analysis has become indispensable to businesses. To remain competitive, companies of all sizes rely on analytics tools to glean insights from disparate data, monitor their KPIs and provide reports to support sound decision-making. Underpinning all these efforts are data warehouses, specialized computer systems designed to store data efficiently, securely and quickly deliver simultaneous query results to data analysts and business decision-makers.
What Is a Data Warehouse?
A data warehouse is a computer system designed to store and analyze large amounts of structured or semi-structured data. It serves as a central repository, accessible to authorized business users who rely on analysis to make better-informed decisions. A data warehouse is a key component of most business intelligence (BI) strategies.
Data is routinely transformed and loaded into a data warehouse from various transactional systems, relational databases and other sources. Data engineers and scientists, business analysts and decision-makers access the data using BI tools, as well as other analytics applications like machine learning, and use it to populate dashboards and generate reports.
- To remain competitive, companies must base business decisions on data; to do so effectively, a data warehouse must power analysis.
- Data warehouses are used to analyze and obtain business insights from large amounts of historical data.
- Today’s data warehouses complement relational databases and work with data from multiple sources.
- The future of the data warehouse is in the cloud.
Data Warehouses Defined
Data warehouses are computer systems that used to store, perform queries on and analyze large amounts of historical data, which often come from multiple sources. Over time, it builds a historical record that can be invaluable to data scientists and business analysts. And because data entering a data warehouse goes through a series of cleaning and prepping processes, the data stored is of a high quality. Thus, the data warehouse’s records are often considered an organization’s definitive source of accurate data.
Data warehouses usually include:
- Automated, secure data pipelines connecting management systems to the warehouse
- Software to clean and prep data
- Data and metadata management tools
- A semantic layer to restructure data for fast, complex analytics and queries
- Analytics and reporting software ranging from basic to highly advanced
- Multiple options for presenting information meaningfully
Video: What Is a Data Warehouse?
How Does a Data Warehouse Work?
A data warehouse transforms relational data and other data source into multidimensional schemas for the sole purpose of analyzing. During this transformation, metadata is created to add speed to queries and searches. A semantic layer rests on top of this data layer, to organize and map complex data into familiar business language like ‘product’ or ‘customer’ so analysts can quickly build analyses without needing to know database table names. Finally, an analytics layer rests on top of the semantic layer to give authorized users access to the data, visualize it and interpret it.
What Are Data Warehouses Used For?
A data warehouse is used to analyze many different types of business data in a non-production environment. Using a data warehouse instead allows the operational databases to continue to record transactions and support the business. Companies use data warehouses to discover patterns, trends, outliers and other relationships in their data that develop over time.
Other major advantages of a data warehouse are that it can analyze data from multiple sources and extract data from different types of storage systems. It also safeguards the integrity of a company’s data by allowing businesspeople to query it without accidentally altering or disturbing a production environment in any way.
When to Use a Data Warehouse
While there a myriad of good reasons to use a data warehouse, these four stand out:
If you need to analyze data from different sources. For instance, if you want to track the activity of your most valuable customers, you may need to combine payment information from your credit card processor with financial information from your accounting system, along with the data these customers generate when they use your product or service. This is much easier when the data from all three sources can be stored and manipulated in one central location.
If you need to separate your analytical data from your transactional data. Your analysts, for example, may want to collect and analyze the data from a production application’s activity logs, but they don’t want to risk disrupting the business’s operations by working directly in the application’s database. They can sidestep that possibility by automatically sending the data to a data warehouse that's designed for complex querying and working on it there.
If your original data source is not suitable for querying. For example, the vast majority of BI tools do not work well with NoSQL databases. To work with this data, analysts must first transfer it to a data warehouse, where their BI apps can access it.
If you need to speed up your queries. When your transactional data resides in hundreds of thousands of rows, standard queries will be incredibly slow. It's far more efficient to use a data warehouse to create summary tables that aggregate the data, which can then be queried much faster. This will also unburden your transactional database so that it, too, will perform better.
Data Warehouses versus Data Lakes
A data warehouse can analyze vast amounts of relational data from many different sources, including transactional systems, operational databases and line-of-business applications. This can amount to hundreds of gigabytes and even petabytes (trillions of bytes) of data. Since the data is highly curated, it can serve as the company’s gold standard or definitive version of information. Common applications include BI analytics and graphic visualizations.
A data lake, on the other hand, can be used to analyze all different types of data, including both structured (such as the data found in a relational database) and unstructured (such as the bits and bytes that comprise a video, a text message or a social media posting). This may also include raw data that has not been scrubbed, deduped or curated. Common data lake applications include machine learning, data discovery, big data analysis and profiling.
Data Warehouses versus Databases
Databases are geared to create a record of transactions as they occur. They capture data “as is” from a single source, such as a credit card processing system. They do this continuously, in real time, as the transactions are processed.
Data warehouses, in comparison, are designed to perform analytics on vast amounts of data from many different sources. As opposed to registering individual data entries at top speed, data warehouses are optimized to rapidly query large volumes of that data after it has been recorded.
Data Warehouses versus Data Marts
A data mart is a subset of a data warehouse that is dedicated to the needs of a specific function or business unit, like finance, marketing or sales. A data mart is smaller and more specialized than a full-fledged data warehouse, and it aggregates data from fewer sources. It can be set up as a separate, discrete system or as part of a larger data warehouse.
How Do Data Warehouses, Databases and Data Lakes Work Together?
Many businesses use a combination of databases, data lakes and data warehouses to store and analyze their data. The data may be recorded in their operational databases and then fed to their data warehouses for further analysis.
But not all of their data comes from a structured database that stores data in a tabular format.
Some applications, like big data analytics, full text search and machine learning, can make use of unstructured data, such as phone calls and handwritten notes. This kind of data is captured and fed into a company’s data lake, where it can be prepped for further analysis in the data warehouse.
Functions of a Data Warehouse
A data warehouse is specially designed to perform data analytics. This usually entails sorting through large amounts of data from different sources in order to ferret out different trends and relationships captured by the data. It has two core functions:
To process and manage data so that it can be used for advanced analytics and reporting purposes.
To store data so that it can be tapped by a variety of BI and analytics programs.
Taken together, these basic functions allow a wide range of analytics tools to integrate various kinds of data from many different sources and then examine it to answer questions, spot business trends and predict future performance.
Types of Data Warehouses
Originally, all data warehouses were on-premises, but, like other information technology, they are rapidly migrating into the cloud. Here is a look at the options and what each one has to offer.
On-premises data warehouse.
With an on-premises approach, all the hardware and software required is purchased, licensed, deployed and maintained by the business that makes use of them. This approach is still in use and offers organizations several advantages:
- Complete control over the entire system, including who gets to use it and how.
- The ability to customize and tailor the data warehouse for specific requirements and to optimize its performance.
Data warehouse appliance.
One type of on-premises data warehouse is a data warehouse appliance. These self-contained hardware devices enable companies to more easily scale their data warehouse infrastructure to support their business analytics requirements as they grow and expand. However, these appliances, as well as on-premises systems in general, are being replaced as companies of all sizes migrate to the newest type of data warehouse.
Cloud data warehouses.
Like all cloud-based applications, cloud data warehouses don’t require an organization to purchase or maintain any hardware or software. A business simply pays for the subscription, storage space and computing power it needs at a given time. Expanding the capacity of a cloud data warehouse is a simple matter of adding more cloud resources; there’s no need to employ people to administer or maintain the underlying technology infrastructure since these tasks are handled by the cloud service provider.
Taking a cloud-based approach to data warehousing offers a company numerous benefits. These include:
Scalability: With a cloud data warehouse, capacity can be dialed up or down depending on the business’s ever-changing requirements.
Reduced costs: With a cloud data warehouse, there are no physical servers to buy or set up, reducing upfront costs and potentially lowering total cost of ownership. A business pays only for the storage and CPU time it needs. With this pay-as-you-go approach, the business no longer needs to make any capital expenditures, including for extra capacity to handle peaks in demand. And the cloud provider handles all the system maintenance, administration and upgrade requirements, so those costs are minimized as well.
Wide range of applications: The top-tier cloud data warehouses offer numerous related cloud services, such as identity and access management and full suites of data analytics tools. This helps to reduce software development costs.
Security: While security concerns have been cited in cases against migrating to the cloud, enhanced security is actually a benefit. Top-tier cloud service providers invest in security controls far beyond what most companies can match.
High availability: This is another area in which cloud service providers invest heavily. A guaranteed service level of 99.9% availability is common for cloud data warehouses, and their ability to replicate data across different geographic regions within the cloud environment means a business will still be able to access its data even in the event of a local system failure.
Reduced time to market: Given all of the above, using a cloud data warehouse means fewer delays and impediments. An organization’s business analysts and data scientists can obtain insights more quickly — which, in turn, means faster turnarounds and reduced lead times for new products and services.
Data Warehouse Architecture
The design or architecture of a data warehouse typically consists of three tiers:
The analytics layer is the user-facing front-end that presents the results of an analysis using data visualization tools.
The semantic layer consists of the analytics engine used to access and analyze the data.
The data layer of the architecture is the database server, where data is transformed, loaded, managed and stored.
While those three layers remain consistent, the architecture of any individual data warehouse usually include modifications specific to a company’s needs. Starting with the basics, all data warehouses include a central database to store metadata, summary data and raw data. That’s the repository that takes in data and is accessed by business decision makers for analysis. Additional approaches build on this simple architecture, including:
Simple with a staging area: Some data warehouses add a staging area where data is prepped before it enters the central warehouse repository.
Hub and spoke: This approach provides a company’s various business units with data marts containing data subsets specific to them, making their analyses faster and more efficient.
Sandboxes: Virtual sandboxes are secure computing spaces that allow companies to safely “play” with their data. Here, anything goes — data need not conform to the rules that govern their data warehouse’s central repository.
Data Warehouse Schema
All data warehouses are based on a schema, which is a type of blueprint or logical description of how the data is organized. It includes the name and description of the different kinds of records that the warehouse holds. There are three basic models:
In a star schema, data tables are one-dimensional — that is, each table contains data describing a single attribute, such as time, location or units sold.
A snowflake schema is more complex but also takes up less storage space and is easier to maintain. Its data tables are multidimensional; instead of a single attribute, they are subdivided into additional tables that provide related attributes. So, for example, a table on sales might include a location attribute, which is linked to another table that provides further details, such as city and street. The location’s table city entry may also be linked to yet another table, which holds data about the state or province and country where the city is located.
A constellation schema, also known as a galaxy schema, is something of a cross between the star and snowflake in that it can contain data tables that are both one- and multidimensional.
Benefits of a Data Warehouse
The primary or overarching benefit of a data warehouse is that it allows a company to analyze large amounts of assorted types of data and maintain a historical record of it. More specifically, the benefits of a data warehouse include the ability to:
- Consolidate data from many different sources.
- Maintain the highest levels of data quality, consistency and accuracy.
- Perform data analysis without disrupting the organization’s transactional databases and the business operations they support.
- Conduct historical data analyses.
- And, based on the above, help the business make better decisions.
Disadvantages of a Data Warehouse
As many benefits as they offer, data warehouses also have some drawbacks. Some of the chief concerns are:
Data warehouses can be costly to scale and don’t excel at handling raw, unstructured or complex data.
Operating a data warehouse may burden parts of the organization with extra work. The data that’s needed typically has to be generated by the IT team for each line of business or departmental function. This can be as simple as duplicating the data from an existing database, but it can also involve gathering additional data from customers, employees or other sources.
A data warehouse can be a big IT project and suck up a lot of resources. If not done well, it may not get enough use to justify the effort and expense.
Consolidating so much valuable data in one place poses a security risk. A data warehouse that leaks customer data can become a privacy compliance — and public relations — nightmare.
The structure of the data warehouse may limit its flexibility and utility to the organization. The data sets tend to be static and are often days or weeks old by the time they are actually used, plus the way in which the data is organized may restrict the types of queries that can be performed. Also, data warehouses are notoriously difficult to fine-tune for faster processing and queries.
Data Warehouse Examples
Here’s how data warehousing is often used to support business operations in three different industrial sectors:
Within the finance and insurance sector, data warehouses are used to analyze customer and market trends. Two lines of business where data warehouses play a major role are foreign currency exchange (forex) and stock trading, since small variables can lead to huge losses in these markets. For finance and trading applications, data warehouses usually employ real-time data streaming.
In the retail sector, data warehouses are primarily used with BI and forecasting applications. Examples include tracking product performance, determining optimal pricing, evaluating promotional strategies and analyzing customer buying patterns.
Data warehouses have become pervasive throughout the healthcare sector, where they are widely used to forecast treatment outcomes, track and analyze population trends, share data with insurance providers, as well as to collect and analyze research data.
History of Data Warehousing
As computer systems became more ubiquitous and complex, and the amount of data that they processed began to multiply, requirements to store, access and analyze that data became much more demanding. The earliest efforts to warehouse data more efficiently began in response to this and date back to the time when mainframes ruled the data-processing world and microprocessor-based personal computers had yet to be invented.
Here are some of the key milestones in the evolution of the data warehouse:
1960 – In a joint research project, Dartmouth University and General Mills developed some of the earliest concepts for data tables and schemas.
1970 – Nielsen and IRI introduced data marts for retail sales.
1983 – Teradata introduced a database management system specifically designed for decision support.
The contemporary concept of a data warehouse emerged in the late 1980s, when IBMers Paul Murphy and Barry Devlin developed the Business Data Warehouse. However, it is William Inmon who is credited with being the father of the data warehouse for first elaborating on the concept and linking it to the notion of a “Corporate Information Factory.”
Future of Data Warehouses
The future of the data warehouse is in the cloud. Successful outcomes with big data and data analytics are whetting the corporate world’s appetite for more data. By locating it within cloud computing services, a business can cost-effectively scale its data warehouse capacity to keep pace with its ever-growing analytics requirements.
Moreover, a company with its data warehouse in the cloud will no longer have to concern itself with keeping its analytics software up-to-date — a major concern with on-premises data warehouses. That concern will completely evaporate once responsibility is handed off to a service provider. For these and other reasons — including enhanced security and lower start-up costs — cloud-based data warehouse implementations will become de rigueur.
A New Data Warehouse Is Born for Today’s Data-Centric Businesses
Businesses today cannot remain competitive without leveraging their data. Companies of all types and sizes rely on data-driven insights to stay current with their offerings and relevant to their customers. To take full advantage of their data and extract all the insights they can, companies need a lower cost, simpler-to-deploy, easier-to-use, cloud-based data warehouse.
Enter the NetSuite Analytics Warehouse: a new cloud-based data warehouse based on Oracle Autonomous Data Warehouse and Oracle Analytics Cloud technology but optimized for use with NetSuite’s business applications served from the cloud. The NetSuite Analytics Warehouse comes prebuilt to automatically transform and visualize NetSuite application data into formats for the data warehouse, where it can be combined and analyzed together with data from multiple external sources to yield more powerful business insights. It will perform queries quickly and provide increased flexibility for data analysts and business decision makers to slice and dice their data to meet a variety of needs.
As commerce increasingly shifts to the digital realm, businesses must equip everyone from product engineers to sales managers with data insights that help them perform their jobs more effectively and engage in the kind of data analyses that leads to innovative work that pushes a business forward. Otherwise, they will simply fall behind those organizations that do. Thus, well-designed data warehouses that provide the foundation for business intelligence have become a necessity for organizations of all sizes.
Data Warehouse FAQs
What is a data warehouse used for?
A data warehouse can be used to analyze many different types of business data without the limitations of a conventional database. Unlike most relational databases, it can analyze data from multiple sources and extract data from different types of storage systems. It also safeguards the integrity of a company’s data by allowing users to query it without accidentally altering or disturbing it in any way.
What is an example of a data warehouse?
In the retail industry, data warehouses are used for forecasting and to provide business intelligence. Uses include tracking product performance, determining optimal pricing, evaluating promotional strategies and analyzing customer buying patterns.
What is the data warehousing process?
A data warehouse centralizes and consolidates large amounts of data from multiple sources. Over time, it builds a historical record that can be invaluable to data scientists and business analysts. The data stored is of the highest quality and the data warehouse’s records are often considered definitive, serving as an organization’s “single source of truth.”
Many businesses use a combination of databases, data lakes and data warehouses to store and analyze their data. The data may be recorded in their operational databases and then fed to their data warehouses for further analysis.