Beta Draft: 2012-07-16

1 Introduction to Data Warehousing Concepts

As the person responsible for administering, designing, and implementing a data warehouse, you also oversee the overall operation of Oracle data warehousing and maintenance of its efficient performance within your organization.

This section contains the following topics:

About This Guide

Oracle Database 2 Day + Data Warehousing Concepts and Design Guide teaches you how to perform common day-to-day tasks necessary to implement and administer a data warehouse. The goal of this guide is to introduce you to the data warehousing solutions available in Oracle Database. In addition, this guide introduces you to a broad range of the data warehousing solutions available from Oracle.

What This Guide Is Not

Oracle Database 2 Day + Data Warehousing Concepts and Design Guide is not an exhaustive discussion of implementing a data warehouse on Oracle. The objective for this guide is to describe big picture issues in data warehousing, not step-by-step tasks. As the Guide is concept-focused, it does not present code examples.

For complete conceptual information about these features and detailed instructions for using them, see the appropriate Oracle documentation as follows:

What Is a Data Warehouse?

A data warehouse is a database designed to enable business intelligence activities: it exists to help users understand and enhance their organization's performance. Given that every organization faces pressure to improve performance, data warehouses are found everywhere. They can be seen in every ___domain from business to government to nonprofit. Just as they span the spectrum of domains, data warehouses range immensely in their data size. You can find giant multinational firms storing hundreds of terabytes of data, and you can also see local school systems with just gigabytes. Similarly, the user base for data warehouses varies tremendously in size. A large enterprise may have tens of thousands of users, while a small firm has a handful.

The modern world of the internet means that data is shared ever more widely, so enterprises today will often give carefully controlled data warehouse access to customers, partners and suppliers. Therefore the users of a data warehouse may be internal or external, local or on the other side of the globe, and far more numerous than the enterprise's own employees. It is a certainty that data warehouse audiences will grow in size.

To achieve their goal of enhanced business intelligence, the data warehouse works with data collected from multiple sources. The source data may come from internally developed systems, purchased applications, third-party data syndicators and other sources. It may involve transactions, production, marketing, human resources and more. In today's world of big data, the data may be many billions of individual clicks on web sites or the massive data streams from sensors built into complex machinery.

Data warehouses are distinct from online transaction processing (OLTP) systems. With a data warehouse you separate analysis workload from transaction workload. Thus data warehouses are very much read-oriented systems. They have a far higher amount of data reading versus writing and updating. This enables far better analytical performance and avoids impacting your transaction systems. A data warehouse system can be optimized to consolidate data from many sources to achieve a key goal: it becomes your organization's "single source of truth". There is great value in having a consistent source of data that all users can look to; it prevents many disputes and enhances decision-making efficiency.

A data warehouse usually stores many months or years of data to support historical analysis. The data in a data warehouse is typically loaded through an extraction, transformation, and loading (ETL) process from multiple data sources. Modern data warehouses are moving toward an extract, load, transformation (ELT) architecture in which all or most data transformation is performed on the database that hosts the data warehouse. It is important to note that defining the ETL process is a very large part of the design effort of a data warehouse. Similarly, the speed and reliability of ETL operations are the foundation of the data warehouse once it is up and running.

Users of the data warehouse perform data analyses that are often time-related. Examples include consolidation of last year's sales figures, inventory analysis, and profit by product and by customer. But time-focused or not, users want to "slice and dice" their data however they see fit and a well-designed data warehouse will be flexible enough to meet those demands. Users will sometimes need highly aggregated data, and other times they will need to drill down to details. More sophisticated analyses include trend analyses and data mining, which use existing data to forecast trends or predict futures. The data warehouse acts as the underlying engine used by middleware business intelligence environments that serve reports, dashboards and other interfaces to end users.

Although the discussion above has focused on the term "data warehouse", there are two other important terms that need to be mentioned. These are the data mart and the operation data store (ODS).

A data mart serves the same role as a data warehouse, but it is intentionally limited in scope. It may serve one particular department or line of business. The advantage of a data mart versus a data warehouse is that it can be created much faster due to its limited coverage. However, data marts also create problems with inconsistency. It takes tight discipline to keep data and calculation definitions consistent across data mart. This problem has been widely recognized, so data marts exist in two styles. Independent data marts are those which are fed directly from source data. They can turn into islands of inconsistent information. Dependent data marts are fed from an existing data warehouse. Dependent data marts can avoid the problems of inconsistency, but they require that an enterprise-level data warehouse already exist.

Operational data stores exist to support daily operations. The ODS data is cleaned and validated, but it is not historically deep: it may be just the data for the current day. Rather than support the historically rich queries that a data warehouse can handle, the ODS gives data warehouses a place to get access to the most current data, which has not yet been loaded into the data warehouse. The ODS may also be used as a source to load the data warehouse. As data warehousing loading techniques have become more advanced, data warehouses may have less need for ODS as a source for loading data. Instead, constant trickle-feed systems can load the data warehouse in near real time.

The Key Characteristics of a Data Warehouse

The key characteristics of a data warehouse are as follows:

  • Data is structured for simplicity of access and high-speed query performance

  • End users are time-sensitive and desire speed-of-thought response times

  • Large amounts of historical data are used

  • Queries often retrieve large amounts of data, perhaps many thousands of rows

  • Both predefined and ad hoc queries are common

  • The data load involves multiple sources and transformations

In general, fast query performance with high data throughput is the key to a successful data warehouse.

Common Oracle Data Warehousing Tasks

As an Oracle data warehousing administrator or designer, you can expect to be involved in the following tasks:

  • Configuring an Oracle database for use as a data warehouse

  • Designing data warehouses

  • Performing upgrades of the database and data warehousing software to new releases

  • Managing schema objects, such as tables, indexes, and materialized views

  • Managing users and security

  • Developing routines used for the extraction, transformation, and loading (ETL) processes

  • Creating reports based on the data in the data warehouse

  • Backing up the data warehouse and performing recovery when necessary

  • Monitoring the data warehouse's performance and taking preventive or corrective action as required

In a small-to-midsize data warehouse environment, you might be the sole person performing these tasks. In large, enterprise environments, the job is often divided among several DBAs and designers, each with their own specialty, such as database security or database tuning.

These tasks are illustrated in:

Concepts Illustrated in This Guide

This guide illustrates the following concepts:

  1. Describes the Oracle Data Warehouse Reference Architecture.

    See Chapter 2, "Data Warehouse Reference Architecture".

  2. Describes the ideas behind logical design.

    See Chapter 3, "Data Warehousing Logical Design".

  3. Describes the ideas behind physical design.

    See Chapter 4, "Data Warehousing Physical Design".

  4. Define the ideas behind using Exadata.

    See Chapter 5, "Introduction to Exadata".

  5. Describes the ideas behind ODI.

    See Chapter 6, "ODI".

  6. Describes the ideas behind Information Access.

    See Chapter 7, "Information Access".