PREPROCESSOR

Caution:

There are security implications to consider when using the PREPROCESSOR clause. See Oracle Database Security Guide for more information.

If the file you want to load contains data records that are not in a format supported by the ORACLE_LOADER access driver, then use the PREPROCESSOR clause to specify a user-supplied preprocessor program that will execute for every data file. Note that the program specification must be enclosed in a shell script if it uses arguments (see the description of "file_spec").

The preprocessor program converts the data to a record format supported by the access driver and then writes the converted record data to standard output (stdout), which the access driver reads as input. The syntax of the PREPROCESSOR clause is as follows:

directory_spec

Specifies the directory object containing the name of the preprocessor program to execute for every data file. The user accessing the external table must have the EXECUTE privilege for the directory object that is used. If directory_spec is omitted, then the default directory specified for the external table is used.

Caution:

For security reasons, Oracle strongly recommends that a separate directory, not the default directory, be used to store preprocessor programs. Do not store any other files in the directory in which preprocessor programs are stored.

The preprocessor program must reside in a directory object, so that access to it can be controlled for security reasons. The OS system manager must create a directory corresponding to the directory object and must verify that OS-user ORACLE has access to that directory. DBAs must ensure that only approved users are allowed access to the directory object associated with the directory path. Although multiple database users can have access to a directory object, only those with the EXECUTE privilege can run a preprocessor in that directory. No existing database user with read-write privileges to a directory object will be able to use the preprocessing feature. DBAs can prevent preprocessors from ever being used by never granting the EXECUTE privilege to anyone for a directory object.

See Also:

Oracle Database SQL Language Reference for information about granting the EXECUTE privilege

file_spec

The name of the preprocessor program. It is appended to the path name associated with the directory object that is being used (either the directory_spec or the default directory for the external table). The file_spec cannot contain an absolute or relative directory path.

If the preprocessor program requires any arguments (for example, gunzip -c), then you must specify the program name and its arguments in an executable shell script (or on Windows operating systems, in a batch (.bat) file). Shell scripts and batch files have certain requirements, as discussed in the following sections.

It is important to verify that the correct version of the preprocessor program is in the operating system directory.

The following is an example of specifying the PREPROCESSOR clause without using a shell or batch file:

SQL> CREATE TABLE xtab (recno varchar2(2000))
     2    ORGANIZATION EXTERNAL (
     3    TYPE ORACLE_LOADER
     4    DEFAULT DIRECTORY data_dir
     5    ACCESS PARAMETERS (
     6    RECORDS DELIMITED BY NEWLINE
     7    PREPROCESSOR execdir:'zcat'
     8    FIELDS (recno char(2000)))
     9    LOCATION ('foo.dat.gz'))
   10    REJECT LIMIT UNLIMITED;
Table created.

Using Shell Scripts With the PREPROCESSOR Clause on Linux Operating Systems

  • The shell script must reside in directory_spec.

  • The full path name must be specified for system commands such as gunzip.

  • The preprocessor shell script must have EXECUTE permissions.

  • The data file listed in the external table LOCATION clause should be referred to by $1.

The following example shows how to specify a shell script on the PREPROCESSOR clause when creating an external table.

SQL> CREATE TABLE xtab (recno varchar2(2000))
     2    ORGANIZATION EXTERNAL (
     3    TYPE ORACLE_LOADER
     4    DEFAULT DIRECTORY data_dir
     5    ACCESS PARAMETERS (
     6    RECORDS DELIMITED BY NEWLINE
     7    PREPROCESSOR execdir:'uncompress.sh'
     8    FIELDS (recno char(2000)))
     9    LOCATION ('foo.dat.gz'))
   10    REJECT LIMIT UNLIMITED;
Table created.

Using Batch Files With The PREPROCESSOR Clause on Windows Operating Systems

  • The batch file must reside in directory_spec.

  • The full path name must be specified for system commands such as gunzip.

  • The preprocessor batch file must have EXECUTE permissions.

  • The first line of the batch file should contain @echo off. The reason for this requirement is that when the batch file is run, the default is to display the commands being executed, which has the unintended side-effect of the echoed commands being treated as input to the external table access driver.

  • To represent the input from the ___location clause, %1 should be used. (Note that this differs from Linux-style shell scripts where the ___location clause is referenced by $1.)

  • A full path should be specified to any executables in the batch file (sed.exe in the following example). Note also that the MKS Toolkit may not exist on all Windows installations so commands such as sed.exe may not be available.

    The batch file used on Windows must have either a .bat or .cmd extension. Failure to do so (e.g. trying to specify the preprocessor script as sed.sh) will result in the following error:

    SQL> select * from foo ;
    select * from foo
    *
    ERROR at line 1:
    
    ORA-29913: error in executing ODCIEXTTABLEFETCH callout
    ORA-29400: data cartridge error
    KUP-04095: preprocessor command
    C:/Temp\sed.sh encountered error
    "CreateProcess Failure for Preprocessor:
    C:/Temp\sed.sh, errorcode: 193

The following is a simple example of using a batch file with the external table PREPROCESSOR option on Windows. In this example a batch file uses the stream editor (sed.exe) utility to perform a simple transformation of the input data.

create table deptxt
(
   deptno char(2),
   dname char(14),
   loc char(13)
)
organization external
(
type ORACLE_LOADER
default directory def_dir1
access parameters
(
   records delimited by newline
   badfile 'deptxt.bad'
   logfile 'deptxt.log'
   preprocessor exec_dir:'sed.bat'
   fields terminated by ','
   missing field values are null
)
   ___location ('deptxt.dat')
)
reject limit unlimited ;

select * from deptxt ;

Where deptxt.dat contains:

10,ACCOUNTING,NEW YORK
20,RESEARCH,DALLAS
30,SALES,CHICAGO
40,OPERATIONS,BOSTON

The preprocessor program sed.bat has the following content:

@echo off
c:/mksnt/mksnt/sed.exe -e 's/BOSTON/CALIFORNIA/' %1

The PREPROCESSOR option passes the input data (deptxt.dat) to sed.bat. If you then select from the deptxt table, the results show that the LOC column in the last row, which used to be BOSTON, is now CALIFORNIA.

SQL> select * from deptxt ;

DE DNAME          LOC
-- -------------- -------------
1a ACCOUNTING     NEW YORK
20 RESEARCH       DALLAS
30 SALES          CHICAGO
40 OPERATIONS     CALIFORNIA

4 rows selected.

Using Parallel Processing with the PREPROCESSOR Clause

External tables treats each data file specified on the LOCATION clause as a single granule. To make the best use of parallel processing with the PREPROCESSOR clause, the data to be loaded should be split into multiple files (granules). This is because external tables limits the degree of parallelism to the number of data files present. For example, if you specify a degree of parallelism of 16, but have only 10 data files, then in effect the degree of parallelism is 10 because 10 slave processes will be busy and 6 will be idle. It is best to not have any idle slave processes. So if you do specify a degree of parallelism, then ideally it should be no larger than the number of data files so that all slave processes are kept busy.

See Also:

Restrictions When Using the PREPROCESSOR Clause

  • The PREPROCESSOR clause is not available on databases that use the Oracle Database Vault feature.

  • The PREPROCESSOR clause does not work in conjunction with the COLUMN TRANSFORMS clause.