Beta Draft: 2012-07-16

5 Introduction to Exadata

The topics discussed in this chapter include:

Balanced Hardware Configuration

Regardless of the design or implementation of a data warehouse the initial key to good performance lies in the hardware configuration used. Many data warehouse operations are based upon large table scans and other I/O-intensive operations, which perform vast quantities of random I/Os. In order to achieve optimal performance the hardware configuration must be sized end to end to sustain this level of throughput. This type of hardware configuration is called a balanced system. In a balanced system all components - from the CPU to the disks - are orchestrated to work together to guarantee the maximum possible I/O throughput. This chapter begins by describing the key concepts of balanced systems and then presents information on the Oracle Exadata Database Machine, an engineered system available in multiple models, that has been designed from the start to achieve that balance. The material on the Oracle Exadata Database Machine covers much more than the balanced system topics: it presents the full range of software and hardware features that Exadata provides to bring maximum value to the enterprise.

To create a balanced system you must first understand how much throughput capacity is required for your system. There is no general answer to the question of how much I/O a system requires since it heavily depends on the workload running on the system. Third normal form data warehouses require high throughput since their dominant query type rely on good table scan performance such as hash joins. On the other side, star schema data warehouses require a high rate of random I/O operations because their operations rely more on index access such as bitmap access. Exadata provides both the I/O throughput and the random I/O operations to handle the most demanding third normal form and star schema data warehouses.

Two approaches to sizing a system can help you determine the I/O demand of the application. One is to define a lower and upper bound of the I/O demand. The other is to size the I/O according to the number and speed of cores used in the system.

Begin by considering the lower and upper I/O demand boundaries. A query operation can generally be divided into two parts: reading rows and processing rows. Reading rows is I/O intensive (or buffer cache intensive) while processing rows is CPU intensive. There are queries that are more I/O intensive and some that are more CPU intensive. It is necessary to determine which type of queries dominates the intended workload. I/O intensive queries define the upper boundary of the I/O throughput, while CPU intensive queries define the lower boundary. A typical system's workload will fall anywhere between the two extreme cases. To determine the throughput requirements of a system, it is necessary to analyze whether the workload contains more I/O or CPU intensive queries. This will be influenced by query complexity in terms of join operations. The calculations will also need to take into account the number of concurrent users. Separately from user queries, you need to understand workload created by ETL operations. During these operations write performance is also to be considered. When the workload of a data warehouse is understood, then it is possible to make a rough estimate of the amount of I/O bandwidth which will be required. However, data warehouses often by their very nature have unpredictable and ad hoc workloads. Thus, there is often a need to have a simpler method of estimating I/O requirements. An alternative approach is to size the throughput requirements solely on the number CPU cores in your system and how much throughput each individual CPU or core in your configuration can drive. Both pieces of information can be determined from an existing system. If you are sizing a potential system, you should research the value of MB/second I/O throughput per core. This is an essential planning number for designing a balanced system. All subsequent critical components on the I/O path - the Host Bus Adapters (HBA), network infrastructure, the switches, the disk controllers, and the disks - have to be sized appropriately, and the starting point is knowing MB/sec I/O throughput per core. Enterprise data warehousing on Oracle uses Real Application Cluster (RAC) systems, so our discussion will be in terms of multi-node systems where each node can access all disks.All the following need to be in balance to prevent bottlenecks:

  • CPU cores - quantity and speed impact all the other calculations

  • HBAs - number and speed of HBAs depend on cores

  • Node Interconnect - I/O capacity depends on cores

  • Switches - depend on HBA quantity and speed

  • Disk Controllers - quantity and speed influenced by HBAs

  • Disk Arrays - number and speed influenced by controllers

As your starting point, multiply the number of cores per node by the MB/second I/O per core. That result is the amount of I/O each node can drive. Therefore, each node will need enough HBA resources to support the calculated I/O level. The HBA quantity multiplied by HBA throughput must meet the I/O level the cores can drive.

Memory per core is another important consideration for system configuration. Memory per core needs to be adequate to handle memory-intensive operations such as large sorts. In turn, the memory must be allocated appropriately among the memory pool of the Oracle Database. Exadata systems provide generous amounts of memory, with multiple gigabytes per core, to handle the most demanding tasks.

See Also:

Oracle Database Performance Tuning Guide for further information

Now consider the network switches. An important way to size switches is to consider the I/O required to perform a full table scan using all nodes of your Oracle RAC system. Since each node has one HBA connected to each switch, multiply the number of nodes by the throughput of each HBA to find the total I/O demand on each switch. Every switch should be capable of handling the calculated I/O level from the nodes. On the disk-facing side of the switch, the same throughput is needed. Exadata optimizes performance by connecting servers to storage with InfiniBand technology switches.

The next stage of I/O sizing is the disk arrays: the disk controllers and disks must also be in line with our calculated throughput requirement. There is an important issue of disk size versus I/O throughput. It is tempting to have fewer, larger disks to reduce storage costs. However, fewer disks means fewer disk controllers and less I/O throughput for the disk array. Inadequate disk array throughput is a guarantee of performance bottlenecks. Do not be misled by large values for I/O operations per second (IOPS) or disk capacity (TB). The IOPS and TB figures cannot substitute for the essential value of I/O throughput.

Along with the I/O subsystem throughput, the cluster interconnect is another key area to size for Oracle RAC systems. The interconnect must be capable of I/O throughput equal to the number of cores in the cluster (the sum of cores across all nodes) times the MB/second per core. This large I/O requirement, has important justifications: to avoid bottleneck and to scale linearly for operations involving inter-node parallel execution. Here is the scenario for inter-node parallel execution.

  • Once data is read off of the disks for a given query it will reside in process memory on one of the nodes in the cluster.

  • Should another process on a different node require some or all of that data to complete this query, it will request the data to be passed over the interconnect rather than being read again from disk.

  • If the interconnect bandwidth is not equal to the disk I/O bandwidth it will become a major bottleneck for the query and scalability will be compromised.

The sizing requirement for the interconnect highlights the need to use the highest throughput technology. Currently, InfiniBand technology is a wise choice for the interconnect; other approaches such as Gigabit Ethernet will require many more network interface cards per node to achieve the bandwidth necessary. InfiniBand provides a better solution for large scale systems as it consumes less CPU per message sent/received. Exadata is built on an optimized InfiniBand architecture that is used for both the interconnect and the storage system.

About Orion

Note that I/O validation should occur before an Oracle database is installed and data is loaded. In too many cases, poor I/O performance is only noticed after the database has been created, and often the only way to reconfigure the storage is to rebuild the database. Thus, much time and energy can be saved with a simple I/O performance test when the I/O system is first configured. Orion is a tool that Oracle provides to mimic a typical workload on a database system to calibrate the throughput. Use Orion to verify the maximum achievable throughput, even if a database has already been installed.

The types of supported I/O workloads are as follows:

  • Small and random

  • Large and sequential

  • Large and random

  • Mixed workloads

For each type of workload, Orion can run tests at different levels of I/O load to measure performance metrics such as MB per second, I/O per second, and I/O latency. You can run different I/O simulations depending upon which type of system you plan to build. Examples are the following:

  • Daily workloads when users or applications query the system

  • The data load when users may or may not access the system

  • Index and materialized view builds

  • Backup operations

See Also:

Oracle Database Performance Tuning Guide for further information

About Disk Layout

Once you have confirmed the hardware configuration has been set up as a balanced system that can sustain your required throughput you need to focus on your disk layout.

In effect, the disk layout must also be balanced for the data warehouse needs. One of the key problems seen with existing data warehouse implementations is poor disk design. Often a large Enterprise Data Warehouse (EDW) can be found residing on the same disk array as one or more other applications. This design choice often occurs because the EDW does not generate the number of IOPS needed to saturate the disk array. However, there is a more important consideration than IOPS: the EDW will do fewer, larger I/Os than other applications, and the EDW I/Os will easily exceed the disk array's throughput capabilities in terms of gigabytes per second. Ideally you want your data warehouse to reside on its own storage array(s).

When configuring the storage subsystem for a data warehouse it should be simple, efficient, highly available and very scalable. It should not be complicated or hard to scale out. One of the easiest ways to achieve this is to apply the S.A.M.E. methodology (Stripe and Mirror Everything), which spreads data across the full storage subsystem. S.A.M.E. can be implemented at the hardware level or by using Oracle ASM (Automatic Storage Management). ASM provides file system and volume manager capabilities built into the Oracle database.

The rest of this chapter describes the Oracle Exadata Database Machine, an engineered system available in various models, that reflects these balanced configuration concepts across its design. The material that follows covers much more than balanced configuration since the Oracle Exadata Database Machine offers a very large set of features - both hardware and software - that support outstanding performance. The features make Oracle Exadata Database Machine an excellent platform for all kinds of tasks in the realms of both OLTP and data warehousing.

About Oracle Exadata

Oracle Exadata Database Machine provides an optimal solution for all database workloads, ranging from scan-intensive data warehouse applications to highly concurrent online transaction processing (OLTP) applications. With its combination of smart Oracle Exadata Storage Server Software, complete and intelligent Oracle Database software, and the latest industry-standard hardware components, Oracle Exadata Database Machine delivers extreme performance in a highly-available, highly-secure environment. Oracle provides unique clustering and workload management capabilities so Oracle Exadata Database Machine is well-suited for consolidating multiple databases into a single grid. Delivered as a complete pre-optimized, and pre-configured package of software, servers, and storage, Oracle Exadata Database Machine is fast to implement, and it is ready to tackle your large-scale business applications.

The Oracle Exadata Database Machine is an easy to deploy solution for hosting the Oracle Database that delivers the highest levels of database performance available. The Exadata Database Machine is a "cloud in a box" composed of database servers, Oracle Exadata Storage Servers, an InfiniBand fabric for storage networking and all the other components required to host an Oracle Database. It delivers outstanding I/O and SQL processing performance for online transaction processing (OLTP), data warehousing (DW) and consolidation of mixed workloads. Extreme performance is delivered for all types of database applications by leveraging a massively parallel grid architecture using Real Application Clusters and Exadata storage. Database Machine and Exadata storage delivers breakthrough analytic and I/O performance, is simple to use and manage, and delivers mission-critical availability and reliability.

The Exadata Storage Server is an integral component of the Exadata Database Machine. Extreme performance is delivered by several unique features of the product. Exadata storage provides database aware storage services, such as the ability to offload database processing from the database server to storage, and provides this while being transparent to SQL processing and database applications. Hence just the data requested by the application is returned rather than all the data in the queried tables. Exadata Smart Flash Cache dramatically accelerates Oracle Database processing by speeding I/O operations. The Flash provides intelligent caching of database objects to avoid physical I/O operations and speeds database logging. The Oracle Database on the Database Machine is the first Flash enabled database. Exadata storage provides an advanced compression technology, Hybrid Columnar Compression, that typically provides 10x, and higher, levels of data compression. Exadata compression boosts the effective data transfer by an order of magnitude. The Oracle Exadata Database Machine is the world's most secure database machine. Building on the superior security capabilities of the Oracle Database, the Exadata storage provides the ability to query fully encrypted databases with near zero overhead at hundreds of gigabytes per second. The combination of these, and many other, features of the product are the basis of the outstanding performance of the Exadata Database Machine.

The Exadata Storage Expansion Rack enables the growth of Exadata storage capacity and bandwidth for X2-2 and X2-8 Exadata Database Machines. It is designed for database deployments that require very large amounts of data beyond what is included in an Exadata Database Machine and when additional database analytical processing power is not required. Standard Exadata Storage Servers, and supporting infrastructure, are packaged together in the Exadata Storage Expansion Rack to allow an easy to deploy extension of the Exadata storage configuration in an Exadata Database Machine. All the benefits and capabilities of Exadata storage are available and realized when using an Exadata Storage Expansion Rack.

The Exadata Database Machine has also been designed to work with, or independently of, the Oracle Exalogic Elastic Cloud. The Exalogic Elastic Cloud provides the best platform to run Oracle's Fusion Middleware and Oracle's Fusion applications. The combination of Exadata and Exalogic is a complete hardware and software engineered solution that delivers high-performance for all enterprise applications including Oracle EBusiness Suite, Siebel, and PeopleSoft applications.

Exadata Database Machine Architecture

Figure 5-1 illustrates a simplified schematic of a typical Database Machine Half Rack deployment. Two Oracle Databases, one Real Application Clusters (RAC) database deployed across three database servers and one single-instance database deployed on the remaining database server in the Half Rack, are shown. (Of course all four database servers could be used for a single four node Oracle RAC cluster.) The Oracle RAC database might be a production database and the single-instance database might be for test and development. Both databases are sharing the seven Exadata cells in the Half Rack but they would have separate Oracle homes to maintain software independence. All the components for this configuration – database servers, Exadata cells (storage servers), InfiniBand switches and other support hardware are housed in the Database Machine.

Figure 5-1 Exadata Machine Architecture

Description of Figure 5-1 follows
Description of "Figure 5-1 Exadata Machine Architecture"

The Database Machine uses a state of the art InfiniBand interconnect between the servers and storage. Each database server and Exadata cell has dual port Quad Data Rate InfiniBand connectivity for high availability. Each InfiniBand link provides 40 Gigabits of bandwidth – many times higher than traditional storage or server networks. Further, Oracle's interconnect protocol uses direct data placement (DMA – direct memory access) to ensure very low CPU overhead by directly moving data from the wire to database buffers with no extra data copies being made. The InfiniBand network has the flexibility of a LAN network, with the efficiency of a SAN. By using an InfiniBand network, Oracle ensures that the network will not bottleneck performance. The same InfiniBand network also provides a high performance cluster interconnect for the Oracle Database Real Application Cluster (RAC) nodes.

Oracle Exadata is architected to scale-out to any level of performance. To achieve higher performance and greater storage capacity, additional database servers and Exadata cells are added to the configuration – for example, Half Rack to Full Rack upgrade. As more Exadata cells are added to the configuration, storage capacity and I/O performance increases near linearly. No cell-to-cell communication is ever done or required in an Exadata configuration.

When using Exadata, much SQL processing is offloaded from the database server to the Exadata cells. Exadata enables function shipping from the database instance to the underlying storage in addition to providing traditional block serving services to the database. One of the unique things the Exadata storage does compared to traditional storage is return only the rows and columns that satisfy the database query rather than the entire table being queried. Exadata pushes SQL processing as close to the data (or disks) as possible and gets all the disks operating in parallel. This reduces CPU consumption on the database server, consumes much less bandwidth moving data between database servers and storage servers, and returns a query result set rather than entire tables. Eliminating data transfers and database server workload can greatly benefit data warehousing queries that traditionally become bandwidth and CPU constrained. Eliminating data transfers can also have a significant benefit on online transaction processing (OLTP) systems that often include large batch and report processing operations.

Exadata is totally transparent to the application using the database. The exact same Oracle Database 12c Release 1 that runs on traditional systems runs on the Database Machine – but on Database Machine it runs faster. Existing SQL statements, whether ad hoc or in packaged or custom applications, are unaffected and do not require any modification when Exadata storage is used. The offload processing and bandwidth advantages of the solution are delivered without any modification to the application. All features of the Oracle Database are fully supported with Exadata. Exadata works equally well with single-instance or Real Application Cluster deployments of the Oracle Database. Functionality like Oracle Data Guard, Oracle Recovery Manager (RMAN), Oracle GoldenGate, and other database tools are administered the same, with or without Exadata. Users and database administrators leverage the same tools and knowledge they are familiar with today because they work just as they do with traditional non-Exadata storage.

Database Server Software

Oracle Database 12c Release 1 has been significantly enhanced to take advantage of Exadata storage. The Exadata software is optimally divided between the database servers and Exadata cells. The database servers and Exadata Storage Server Software communicate using the iDB – the Intelligent Database protocol. iDB is implemented in the database kernel and transparently maps database operations to Exadata-enhanced operations. iDB implements a function shipping architecture in addition to the traditional data block shipping provided by the database. iDB is used to ship SQL operations down to the Exadata cells for execution and to return query result sets to the database kernel. Instead of returning database blocks, Exadata cells return only the rows and columns that satisfy the SQL query. Like existing I/O protocols, iDB can also directly read and write ranges of bytes to and from disk so when offload processing is not possible Exadata operates like a traditional storage device for the Oracle Database. But when feasible, the intelligence in the database kernel enables, for example, table scans to be passed down to execute on the Exadata Storage Server so only requested data is returned to the database server.

iDB is built on the industry standard Reliable Datagram Sockets (RDSv3) protocol and runs over InfiniBand. ZDP (Zero-loss Zero-copy Datagram Protocol), a zero-copy implementation of RDS, is used to eliminate unnecessary copying of blocks. Multiple network interfaces can be used on the database servers and Exadata cells. This is an extremely fast low-latency protocol that minimizes the number of data copies required to service I/O operations.

Oracle Automatic Storage Management (ASM) is used as the file system and volume manager for Exadata. ASM virtualizes the storage resources and provides the advanced volume management and file system capabilities of Exadata. Striping database files evenly across the available Exadata cells and disks results in uniform I/O load across all the storage hardware. The ability of ASM to perform non-intrusive resource allocation, and reallocation, is a key enabler of the shared grid storage capabilities of Exadata environments. The disk mirroring provided by ASM, combined with hot swappable Exadata disks, ensure the database can tolerate the failure of individual disk drives. Data is mirrored across cells to ensure that the failure of a cell will not result in loss of data, or inhibit data accessibility. This massively parallel architecture delivers unbounded scalability and high availability.

The Database Resource Manager (DBRM) feature in Oracle Database 11g has been enhanced for use with Exadata. DBRM lets the user define and manage intra- and inter-database I/O bandwidth in addition to CPU, undo, degree of parallelism, active sessions, and the other resources it manages. This allows the sharing of storage between databases without fear of one database monopolizing the I/O bandwidth and impacting the performance of the other databases sharing the storage. Consumer groups are allocated a percent of the available I/O bandwidth and the DBRM ensures these targets are delivered. This is implemented by the database tagging I/O with the associated database and consumer group. This provides the database with a complete view of the I/O priorities through the entire I/O stack. The intra-database consumer group I/O allocations are defined and managed at the database server. The inter-database I/O allocations are defined within the software in the Exadata cell and managed by the I/O Resource Manager. The Exadata cell software ensures that inter-database I/O resources are managed and properly allocated within, and between, databases. Overall, DBRM ensures each database receives its specified amount of I/O resources and user defined SLAs are met.

Two features of the Oracle Database that are offered exclusively on the Exadata Database Machine are the Oracle Database Quality of Service (QoS) Management and the QoS Management Memory Guard features. QoS Management allows system administrators to directly manage application service levels hosted on Oracle Exadata Database Machines. Using a policy-based architecture, QoS Management correlates accurate run-time performance and resource metrics, analyzes this data with its expert system to identify bottlenecks, and produces recommended resource adjustments to meet and maintain performance objectives under dynamic load conditions. Should sufficient resources not be available, QoS will preserve the more business critical objectives at the expense of the less critical ones. In conjunction with Cluster Health Monitor, QoS Management's Memory Guard detects nodes that are at risk of failure due to memory over-commitment. It responds by automatically preventing new connections thus preserving existing workloads and restores connectivity once the sufficient memory is again available.

Exadata Smart Scan Processing

With traditional, non-iDB aware storage, all database intelligence resides in the database software on the server. To illustrate how SQL processing is performed in this architecture, an example of a table scan is shown in Figure 5-2.

Figure 5-2 Exadata Scan Processing

Description of Figure 5-2 follows
Description of "Figure 5-2 Exadata Scan Processing"

The client issues a SELECT statement with a predicate to filter and return only rows of interest. The database kernel maps this request to the file and extents containing the table Oracle White Paper— A Technical Overview of the Oracle Exadata Database Machine and Exadata Storage Server being scanned. The database kernel issues the I/O to read the blocks. All the blocks of the table being queried are read into memory. Then SQL processing is done against the raw blocks searching for the rows that satisfy the predicate. Lastly the rows are returned to the client.

As is often the case with the large queries, the predicate filters out most of the rows read. Yet all the blocks from the table need to be read, transferred across the storage network and copied into memory. Many more rows are read into memory than required to complete the requested SQL operation. This generates a large number of data transfers which consume bandwidth and impact application throughput and response time.

Integrating database functionality within the storage layer of the database stack allows queries, and other database operations, to be executed much more efficiently. Implementing database functionality as close to the hardware as possible, in the case of Exadata at the disk level, can dramatically speed database operations and increase system throughput.

With Exadata storage, database operations are handled much more efficiently. Queries that perform table scans can be processed within Exadata storage with only the required subset of data returned to the database server. Row filtering, column filtering and some join processing (among other functions) are performed within the Exadata storage cells. When this takes place only the relevant and required data is returned to the database server.

Figure 5-3 below illustrates how a table scan operates with Exadata storage.

Figure 5-3 Exadata Scan and Storage

Description of Figure 5-3 follows
Description of "Figure 5-3 Exadata Scan and Storage"

The client issues a SELECT statement with a predicate to filter and return only rows of interest. The database kernel determines that Exadata storage is available and constructs an iDB command representing the SQL command issued and sends it the Exadata storage. The CELLSRV component of the Exadata software scans the data blocks to identify those rows and columns that satisfy the SQL issued. Only the rows satisfying the predicate and the requested columns are read into memory. The database kernel consolidates the result sets from across the Exadata cells. Lastly, the rows are returned to the client.

Smart scans are transparent to the application and no application or SQL changes are required.

The SQL EXPLAIN PLAN statement shows when Exadata smart scan is used. Returned data is fully consistent and transactional and rigorously adheres to the Oracle Database consistent read functionality and behavior. If a cell dies during a smart scan, the uncompleted portions of the smart scan are transparently routed to another cell for completion. Smart scans properly handle the complex internal mechanisms of the Oracle Database including: uncommitted data and locked rows, chained rows, compressed tables, national language processing, date arithmetic, regular expression searches, materialized views and partitioned tables.

The Oracle Database and Exadata server cooperatively execute various SQL statements. Moving SQL processing off the database server frees server CPU cycles and eliminates a massive amount of bandwidth consumption which is then available to better service other requests. SQL operations run faster, and more of them can run concurrently because of less contention for the I/O bandwidth. We will now look at the various SQL operations that benefit from the use of Exadata.

Smart Scan Predicate Filtering

Exadata enables predicate filtering for table scans. Only the rows requested are returned to the database server rather than all rows in a table.

Smart Scan Column Filtering

Exadata provides column filtering, also called column projection, for table scans. Only the columns requested are returned to the database server rather than all columns in a table.

Smart Scan Join Processing

Exadata performs joins between large tables and small lookup tables, a very common scenario for data warehouses with star schemas. This is implemented using Bloom Filters, which are a very efficient probabilistic method to determine whether a row is a member of the desired result set.

Smart Scan Processing of Encrypted Tablespaces and Columns

Smart Scan offload processing of Encrypted Tablespaces (TSE) and Encrypted Columns is supported in Exadata storage. This enables increased performance when accessing the most confidential data in the enterprise.

Storage Indexing

Storage Indexes are a very powerful capability provided in Exadata storage that helps avoid I/O operations. The Exadata Storage Server Software creates and maintains a Storage Index (that is, metadata about the database objects) in the Exadata cell. The Storage Index keeps track of minimum and maximum values of columns for tables stored on that cell. When a query specifies a WHERE clause, but before any I/O is done, the Exadata software examines the Storage Index to determine if rows with the specified column value exist in the cell by comparing the column value to the minimum and maximum values maintained in the Storage Index. If the column value is outside the minimum and maximum range, scan I/O for that query is avoided. Many SQL Operations will run dramatically faster because large numbers of I/O operations are automatically replaced by a few lookups. To minimize operational overhead, Storage Indexes are created and maintained transparently and automatically by the Exadata Storage Server Software.

Offload of Data Mining Model Scoring

Data Mining model scoring is offloaded to Exadata. This makes the deployment of data warehouses on Database Machine an even better and more performant data analysis platform. All data mining scoring functions (for example, prediction_probability) are offloaded to Exadata for processing. This will not only speed warehouse analysis but reduce database server CPU consumption and the I/O load between the database server and Exadata storage.

Other Exadata Smart Scan Processing

Two other database operations that are offloaded to Exadata are incremental database backups and tablespace creation. The speed and efficiency of incremental database backups has been significantly enhanced with Exadata. The granularity of change tracking in the database is much finer when Exadata storage is used. Changes are tracked at the individual Oracle block level with Exadata rather than at the level of a large group of blocks. This results in less I/O bandwidth being consumed for backups and faster running backups.

With Exadata, the create file operation is also executed much more efficiently. For example, when issuing a CREATE TABLESPACE statement, instead of operating synchronously with each block of the new tablespace being formatted in server memory and written to storage, an iDB command is sent to Exadata instructing it to create the tablespace and format the blocks. Host memory usage is reduced and I/O associated with the creation and formatting of the tablespace blocks is offloaded. The I/O bandwidth saved with these operations means more bandwidth is available for other business critical work.

Exadata Smart Memory Scans

Exadata can provide all the performance benefits of in-memory databases, flash, and high performance storage in a single integrated solution. Using Oracle Database 11.2.0.3, one can execute in-memory parallel queries against table data in the buffer cache while simultaneously offloading the query to the Exadata storage, if necessary, to get additional additive performance. While the data throughput of Exadata from a combination of disk and flash is more than sufficient for most applications, applications that can leverage even more data throughput can run at over 200GB/sec using this in-memory query capability. With Hybrid Columnar Compression it is possible to store more table data in-memory and thus get higher effective scan bandwidths. This combination of in-memory parallel query and smart Exadata storage gives Exadata all the benefits of an in-memory solution but without losing the cost and capacity benefits of disk and flash.

About Hybrid Columnar Compression

Compressing data can provide dramatic reduction in the storage consumed for large databases. Exadata provides a very advanced compression capability called Hybrid Columnar Compression (HCC). Hybrid Columnar Compression enables the highest levels of data compression and provides enterprises with tremendous cost-savings and performance improvements due to reduced I/O. Average storage savings can range from 10x to 15x depending on how HCC is used. With average savings of 10x, IT managers can drastically reduce and often eliminate their need to purchase new storage for several years. For example, a 100 terabyte database achieving 10x storage savings would utilize only 10 terabytes of physical storage. With 90 terabytes of storage now available, IT organizations can delay storage purchases for a significant amount of time.

HCC is a new method for organizing data within a database block. As the name implies, this technology utilizes a combination of both row and columnar methods for storing data. This hybrid, or best of both worlds, approach achieves the compression benefits of columnar storage, while avoiding the performance shortfalls of a pure columnar format. A logical construct called the compression unit is used to store a set of Hybrid Columnar-compressed rows. When data is loaded, column values are detached from the set of rows, ordered and grouped together and then compressed. After the column data for a set of rows has been compressed, it is fit into the compression unit.

Smart Scan processing of HCC data is provided and column projection and filtering are performed within Exadata. Queries run directly on Hybrid Columnar Compressed data and do not require the data to be decompressed. Data that is required to satisfy a query predicate does not need to be decompressed, only the columns and rows being returned to the client are decompressed in memory. The decompression process takes place on the Exadata cell in order to maximize performance and offload processing from the database server. Given the typical tenfold compression of Hybrid Columnar Compressed Tables, this effectively increases the I/O rate ten-fold compared to uncompressed data.

Exadata Smart Flash Cache Features

Oracle has implemented a smart flash cache directly in the Oracle Exadata Storage Server. The Exadata Smart Flash Cache holds frequently accessed data in very fast flash storage while most of the data is kept in very cost effective disk storage. This happens automatically without the user having to take any action. The Oracle Flash Cache is smart because it knows when to avoid trying to cache data that will never be reused or will not fit in the cache. The Oracle Database and Exadata storage allow the user to provide directives at the database table, index and segment level to ensure that specific data is retained in flash. Tables can be moved in and out of flash with a simple command, without the need to move the table to different tablespaces, files or LUNs like you would have to do with traditional storage with flash disks.

The Exadata Smart Flash Cache is also used to reduce the latency of log write I/O eliminating performance bottlenecks that might occur due to database logging. The time to commit user transactions is very sensitive to the latency of log writes. Also, many performance critical database algorithms such as space management and index splits are also very sensitive to log write latency. Today Exadata storage speeds up log writes using the battery backed DRAM cache in the disk controller. Writes to the disk controller cache are normally very fast, but they can become slower during periods of high disk I/O. Smart Flash Logging takes advantage of the flash memory in Exadata storage to speed up log writes.

Flash memory has very good average write latency, but it has occasional slow outliers that can be one or two orders of magnitude slower than the average. The idea of the Exadata Smart Logging is to perform redo writes simultaneously to both flash memory and the disk controller cache, and complete the write when the first of the two completes. This literally gives Exadata the best of both worlds. The Smart Flash Logging both improves user transaction response time, and increases overall database throughput for I/O intensive workloads by accelerating performance critical database algorithms.

Smart Flash Logging handles all crash and recovery scenarios without requiring any additional or special administrator intervention beyond what would normally be needed for recovery of the database from redo logs. From a DBA perspective, the system behaves in a completely transparent manner and the DBA need not be concern themselves with the fact that flash is being used as a temporary store for redo. The only behavioral difference will be consistently low latencies for redo log writes.

I/O Resource Management with Exadata

With traditional storage, creating a shared storage grid is hampered by the inability to prioritize the work of the various jobs and users consuming I/O bandwidth from the storage subsystem. The same occurs when multiple databases share the storage subsystem. The DBRM and I/O resource management capabilities of Exadata storage can prevent one class of work, or one database, from monopolizing disk resources and bandwidth and ensures user defined SLAs are met when using Exadata storage. The DBRM enables the coordination and prioritization of I/O bandwidth consumed between databases, and between different users and classes of work. By tightly integrating the database with the storage environment, Exadata is aware of what types of work and how much I/O bandwidth is consumed. Users can therefore have the Exadata system identify various types of workloads, assign priority to these workloads, and ensure the most critical workloads get priority.

In data warehousing, or mixed workload environments, you may want to ensure different users and tasks within a database are allocated the correct relative amount of I/O resources. For example, you may want to allocate 70% of I/O resources to interactive users on the system and 30% of I/O resources to batch reporting jobs. This is simple to enforce using the DBRM and I/O resource management capabilities of Exadata storage.

An Exadata administrator can create a resource plan that specifies how I/O requests should be prioritized. This is accomplished by putting the different types of work into service groupings called Consumer Groups. Consumer groups can be defined by a number of attributes including the username, client program name, function, or length of time the query has been running. Once these consumer groups are defined, the user can set a hierarchy of which consumer group gets precedence in I/O resources and how much of the I/O resource is given to each consumer group. This hierarchy determining I/O resource prioritization can be applied simultaneously to both intra-database operations (that is, operations occurring within a database) and inter-database operations (that is, operations occurring among various databases).

When Exadata storage is shared between multiple databases you can also prioritize the I/O resources allocated to each database, preventing one database from monopolizing disk resources and bandwidth to ensure user defined SLAs are met.

Consolidating multiple databases on to a single Exadata Database Machine is a cost saving solution for customers. With Exadata Storage Server Software 11.2.2.3 and above, the Exadata I/O Resource Manager (IORM) can be used to enable or disable use of flash for the different databases running on the Database Machine. This empowers customers to reserve flash for the most performance critical databases.

In essence, Exadata I/O Resource Manager has solved one of the challenges traditional storage technology does not address: creating a shared grid storage environment with the ability to balance and prioritize the work of multiple databases and users sharing the storage subsystem. Exadata I/O resource management ensures user defined SLAs are met for multiple databases sharing Exadata storage. This ensures that each database or user gets the correct share of disk bandwidth to meet business objectives.

Quality of Service (QoS) Management with Exadata

Oracle Exadata QoS Management is an automated, policy-based product that monitors the workload requests for an entire system. It manages the resources that are shared across applications and adjusts the system configuration to keep the applications running at the performance levels needed by your business. It responds gracefully to changes in system configuration and demand, thus avoiding additional oscillations in the performance levels of your applications.

Oracle Exadata QoS Management monitors the performance of each work request on a target system. It starts to track a work request from the time a work request requests a connection to the database using a database service. The amount of time required to complete a work request, or the response time (also known as the end-to-end response time, or round-trip time), is the time from when the request for data was initiated and when the data request is completed. By accurately measuring the two components of response time (the time spent using resources and the time spent waiting to use resources), QoS Management can quickly detect bottlenecks in the system. It then makes recommendations to reallocate resources to relieve a bottleneck, thus preserving or restoring service levels. System administrators are alerted to the need for this reallocation and it is implemented with a simple button click on the QoS Management dashboard. Full details as to the entire cluster's projected performance impact to this action are also provided. Finally, an audit log of all actions and policy changes is maintained along with historical system performance graphs.

Oracle Exadata QoS Management manages the resources on your system so that:

  • When sufficient resources are available to meet the demand, business-level performance requirements for your applications are met, even if the workload changes.

  • When sufficient resources are not available to meet the demand, Oracle Exadata QoS Management attempts to satisfy the more critical business performance requirements at the expense of less critical performance requirements.

  • When load conditions severely exceed capacity, resources remain available.

Conclusion

Businesses today increasingly need to leverage a unified database platform to enable the deployment and consolidation of all applications onto one common infrastructure. Whether OLTP, DW, or mixed workload, a common infrastructure delivers the efficiencies and reusability the datacenter needs – and provides the reality of grid computing in-house. Building or using custom special purpose systems for different applications is wasteful and expensive. The need to process more data increases every day while corporations are also finding their IT budgets being squeezed. Examining the total cost of ownership (TCO) for IT software and hardware leads to choosing a common high performance infrastructure for deployments of all applications. By incorporating the Exadata based Database Machine into the IT infrastructure, companies will:

  • Accelerate database performance and be able to do much more in the same amount of time.

  • Handle change and growth in scalable and incremental steps by consolidating deployments on to a common infrastructure.

  • Deliver mission-critical data availability and protection.