The topics discussed in this chapter include:
This chapter provides a high-level introduction to Oracle Data Integrator Enterprise Edition. Because the material is intended for readers new to ODI, it covers more than just the data warehousing features of the product and it emphasizes business value rather than technical details.Oracle is a leader in the data integration market, with the industry's most comprehensive fully-integrated offering in data integration, including Oracle Data Integrator Enterprise Edition, Oracle GoldenGate, Oracle Enterprise Data Quality. Oracle's data integration solutions provide continuous access to timely, trusted, and heterogeneous data across the enterprise to support both analytical and operational data integration.
Oracle Data Integrator provides a complete set of components for designing, deploying, and managing data integration processes. Data integration processes move and transform data from source data servers to target data servers, using an Extract-Load-Transform approach that eliminates the need for a transformation engine by delegating all transformations to the source and target data servers.
Capabilities included in Oracle Data Integrator Enterprise Edition 11g:
High performance E-LT capabilities integrated with Oracle GoldenGate enable fast and efficient loading and transformation of real-time data into a data warehouse
Better productivity with mapping wizards. Quick-Editor, generated code simulation and error table management
Strong usability and manageability through integrations to Oracle JDeveloper and Oracle Enterprise Manager
Increased performance by expressing commonality between different versions of the same source application with shortcuts and release tags
Variable tracking feature to determine the actual values of variables and sequences that were used during a session
Enhanced support for online analytical processing (OLAP), SAP and new APIs that can be embedded directly within custom applications
Direct integration into Oracle WebLogic and Oracle Coherence deployments, for high availability and resilience
Invocation of an Oracle Enterprise Data Quality (Datanomic) job
See Also:
Oracle Fusion Middleware Developer's Guide for Oracle Data Integrator for a detailed discussion of ODIThe ODI platform integrates in the broader Fusion Middleware platform and becomes a key component of this stack. ODI provides its run-time components as Java EE applications, enhanced to fully leverage the capabilities of the Oracle WebLogic Application Server or IBM WebSphere Application Server. ODI components include exclusive features for Enterprise-Scale Deployments, high availability, scalability, and hardened security. The run-time architecture is illustrated in Figure 6-1.
Where:
Repository
The repository stores the data integration design-time objects, the run-time objects (data integration scenarios to run) and the sessions corresponding to execution instances of these design-time and run-time objects. The architecture of the repository is designed to allow several separated environments that exchange metadata and scenarios (for example, Development, Test, Maintenance, and Production environments). The repository also acts as a version control system where objects are archived and assigned a version number.
The ODI Repository is composed of one Master Repository and several Work Repositories. Objects developed or configured through the user interfaces are stored in one of these repository types.
Run-time agents
There are the standalone agent or Java EE agents deployed in an application server, which run the sessions. The agent connects the repository as well as the source and target data servers when processing the data integration sessions. The agent also provides a web service for starting and monitoring scenario executions from third-party SOA-enabled applications.
User Interfaces:
ODI Studio
Administrators, Developers, and Operators use the ODI Studio to access the repositories. This Fusion Client Platform (FCP) based UI is used for administering the infrastructure (security and topology), reverse-engineering the metadata, developing projects, scheduling, operating, and monitoring executions. ODI Studio provides four Navigators for managing the different aspects and steps of an ODI integration project; Topology Navigator, Designer Navigator, Operator Navigator, and Security Navigator.
ODI Console
Web services and data services: ODI can be integrated seamlessly in a SOA in several ways:
A dedicated public web service component provides operations to list the contexts and scenarios available. To use operations from this web service, you must first install and configure this component in a Java EE container.
Data Services are specialized
Data Sources Connection Pool: The Data Services generated by ODI do not contain connection information for sources and targets. Instead, they make use of data sources defined with the Web Services container or on the application server. These JDBC data sources contain connection properties required to access data, and must correspond to data servers already defined within the ODI topology.
The data sources are used by ODI Java EE Agents to connect to a data server defined in the ODI topology. This allows the Java EE agent to benefit from the data sources and connection pooling features available on the application server. Connection pooling allows reusing connections across several sessions.
The Smart Export and Import feature is a lightweight and consistent export and import mechanism to move ODI objects between repositories when working with multiple environments such as Development, Quality Assurance, and Production. An object is automatically exported with all its object dependencies and a set of customizable object matching rules and actions is applied during the import.
Oracle Data Integrator Enterprise Edition addresses multiple enterprise data integration needs:
Data Warehousing and Business Intelligence-by executing high-volume, high performance loading of data warehouses, data marts, On Line Analytical Processing (OLAP) cubes, and analytical applications. It transparently handles incremental loads and slowly changes dimensions, manages data integrity and consistency, and analyzes data lineage.
Service-Oriented Architecture-by calling on external services for data integration and by deploying data services and transformation services that can be seamlessly integrated within an SOA infrastructure. It adds support for high volume, high-performance bulk data processing to an existing service-oriented architecture.
Master Data Management (MDM)-by providing a comprehensive data synchronization infrastructure for customers who build their own data hubs, work with packaged MDM solutions, or coordinate hybrid MDM systems with integrated SOA process analytics and Business Process Execution Language (BPEL) compositions.
Migration-by providing efficient bulk load of historical data (including complex transformations) from existing systems to new ones. It continues to seamlessly synchronize data for as long as the two systems coexist.
Oracle Data Integrator Enterprise Edition's Extract, Load, Transform (E-LT) architecture leverages disparate relational database management systems (RDBMS) engines to process and transform the data. This approach optimizes performance and scalability and lowers overall solution costs. Instead of relying on a separate, conventional ETL transformation server, Oracle Data Integrator Enterprise Edition's E-LT architecture generates native code for disparate RDBMS engines (SQL, bulk loader scripts, for example). E-LT architecture extracts data from sources, loads it into a target, and transforms it using the database power. By leveraging existing databases and database expertise, Oracle Data Integrator Enterprise Edition provides unparalleled efficiency and lower cost of ownership. By reducing network traffic and transforming data in the database containing the target tables, E-LT architecture delivers the highest possible performance.
Oracle Data Integrator Enterprise Edition 11g includes a JDeveloper-based integrated development environment (IDE) called the Oracle Data Integrator Enterprise Edition Studio. This client is designed to dramatically increase the developer's productivity and make it easy to implement advanced features in data loading and transformation. Features like "quick-edit" support mass-updates, and intuitive and accessible keyboard navigation. In addition, Oracle Data Integrator Enterprise Edition follows a declarative design model, which simplifies common data integration design and deployment use cases, shortening implementation times. Data integration designers describe source and target data formats and data integration processes. The business user or the developer can focus on describing what to do, not how to do it. Oracle Data Integrator Enterprise Edition generates, deploys and manages the code required to implement those processes across the various source and target systems.
Oracle Data Integrator Enterprise Edition integrates to Oracle Fusion Middleware as a platform. Specifically, Oracle Data Integrator Enterprise Edition provides its run-time components as Java EE applications, enhanced to fully leverage the capabilities of Oracle WebLogic and Oracle Coherence. Oracle Data Integrator Enterprise Edition components include exclusive features for Enterprise-Scale Deployments, high availability, scalability, and hardened security.
High-Availability (HA) and Scalability is fully supported via clustered deployments for Java EE components. Oracle Data Integrator Enterprise Edition components deployed in WebLogic Server benefit from the capabilities of clustering for scalability, including JDBC connection pooling and load balancing. In addition to the cluster-inherited HA capabilities, the run-time agent also supports a connection retry mechanism to transparently recover sessions running in repositories that are stored in HA-capable database engines such as Oracle RAC.
Oracle Data Integrator Enterprise Edition simplifies complex data-centric deployments by improving visibility and control with a unified set of management interfaces. The Oracle Data Integrator Console leverages the Oracle Application Development Framework (ADF) and Ajax Framework for a rich user experience. Using this console, production users can set up an environment, export and import the repositories, manage run-time operations, monitor the sessions, diagnose the errors, browse design-time artifacts and generate lineage reports. In addition, this interface integrates seamlessly with the Oracle Enterprise Manager Fusion Middleware Control and allows administrators to monitor from a single screen not only their data integration components but their other Fusion Middleware components as well.
Knowledge Modules are at the core of the Oracle Data Integrator Enterprise Edition architecture. They make all Oracle Data Integrator Enterprise Edition processes modular, flexible, and extensible. In addition, Oracle Data Integrator Enterprise Edition provides heterogeneous support for third party platforms, data-sources, and data warehousing appliances.
Knowledge Modules implement the actual data flows and define the templates for generating code across the multiple systems involved in each process. Knowledge Modules are generic, because they allow data flows to be generated regardless of the transformation rules. And they are highly specific, because the code they generate and the integration strategy they implement are finely tuned for a given technology.
Oracle Data Integrator Enterprise Edition provides a comprehensive library of Knowledge Modules, which can be tailored to implement existing best practices.
By helping companies capture and reuse technical expertise and best practices, Oracle Data Integrator Enterprise Edition's Knowledge Module framework reduces the cost of ownership. It also enables metadata-driven extensibility of product functionality to meet the most demanding data integration challenges.
Sample of available modules are:
Oracle Database
Generic SQL
Hypersonic SQL
IBM DB2/400, DB2 UDB
Informix
JD Edwards Enterprise One
JMS
Microsoft Access
Microsoft SQL
Netezza
Oracle E-Business Suite
Oracle Enterprise Service Bus
Oracle GoldenGate
Oracle Hyperion Essbase, Financial Management, Planning
Oracle OLAP
Oracle PeopleSoft
Oracle Siebel CRM
SAP ERP & BW
Sybase ASE
Sybase IQ
Teradata
Oracle Data Integrator (ODI) helps facilitate the loading of a Data Warehouse using specialized Knowledge Modules. We note two of these briefly below.
Incremental Update can also be referred to as Type 1 Slowly Changing Dimension (see "About Star Schemas" for information on slowly changing dimensions). Using Incremental Update Knowledge Modules, ODI will automatically perform insert and update operations based on a unique Update key defined by the end users.
Oracle Data Integrator provides Knowledge Modules that implement a Type 2 Slowly Changing Dimension for a Data Warehouse. The Data Integration developers will simply choose this Knowledge Module in a list to implement this complex integration strategy.
These two knowledge modules are just a small example of how ODI helps developers rapidly design data integration processes to load a Data Warehouse without worrying about defining every single step of a complex integration process.
Oracle Data Integrator is often used for populating very large data warehouses. In these use cases, it is common to have thousands of tables being populated using hundreds of ODI scenarios (scenarios are executable objects stored in ODI work repositories). The execution of these scenarios has to be organized in such a way that the data throughput from the sources to the target is the most efficient within the batch window. Load Plans help the user organizing the execution of scenarios in a hierarchy of sequential and parallel steps for these types of use cases.
A Load Plan is an executable object in Oracle Data Integrator that can contain a hierarchy of steps that can be executed conditionally, in parallel or in series. Load Plan allow setting and using variables at multiple levels. They also support exception handling strategies in the event of a scenario ending in error. A Load Plan can be modified in production environments and steps can be enabled or disabled according to the production needs.
Compared to other architectures (manual coding and traditional ETL), ODI mixes the best of both E-LT and Business-rules:
Productivity/Maintenance
The business-rules driven approach delivers greater productivity as developers simply need to concentrate on the "What" without caring about the "How". They define SQL expressions for the business rules, and ODI Knowledge Modules generate the entire set of SQL operations needed to achieve these rules.
When a change needs to be made in operational logic (such as "creating a backup copy of every target table before loading the new records"), it is simply applied in the appropriate Knowledge Module and it automatically impacts the hundreds of interfaces already developed. With a traditional ETL approach, such a change would have necessitated opening every job and manually adding the new steps, increasing the risk of mistakes and inconsistency.
Flexibility and a shallow learning curve are ensured by leveraging the RDBMS' latest features.
With a centralized repository that describes all the metadata of the sources and targets and a single unified and comprehensive graphical interface, maintenance is greatly optimized as cross-references between objects can be queried at any time. This gives the developers and the business users a single entry point for impact analysis and data lineage ("What is used where?", "Which sources populate which targets?" etc.)
In the ODI repository, the topology of the infrastructure is defined in detail, and moving objects between different execution contexts (Development, Testing, QA, Production, etc.) is straightforward. With a powerful version control repository, several teams can work on the same project within different release stages, with guaranteed consistency of deliverables.
With a centralized framework for Data Quality, developers spend less time on defining technical steps, and more time on the specification of data quality rules. This helps to build a consistent and standardized Data Warehouse.
High Performance
The E-LT architecture leverages the power of all the features of in-place databases engines. ODI generates pure set-oriented SQL optimized for each RDBMS which can take advantage of advanced features such as parallel processing or other advanced features.
Native database utilities can be invoked by the ODI Knowledge Modules provided.
When data from the target database is referenced - table lookups for example - it does not need to be extracted from the database, into an engine. It remains where it is, and it is processed by database engine.
Low Cost
Oracle Data Integrator doesn't require a dedicated server. The loads and transformations are carried out by the RDBMS.
With its business-rule driven E-LT architecture, Oracle Data Integrator is a powerful solution for moving beyond the worlds of manual coding and traditional ETL to achieve greater productivity, faster performance and lower cost.