Caution:
There are security implications to consider when using the PREPROCESSOR
clause. See Oracle Database Security Guide for more information.
If the file you want to load contains data records that are not in a format supported by the ORACLE_LOADER
access driver, then use the PREPROCESSOR
clause to specify a user-supplied preprocessor program that will execute for every data file. Note that the program specification must be enclosed in a shell script if it uses arguments (see the description of "file_spec").
The preprocessor program converts the data to a record format supported by the access driver and then writes the converted record data to standard output (stdout), which the access driver reads as input. The syntax of the PREPROCESSOR
clause is as follows:
directory_spec
Specifies the directory object containing the name of the preprocessor program to execute for every data file. The user accessing the external table must have the EXECUTE
privilege for the directory object that is used. If directory_spec
is omitted, then the default directory specified for the external table is used.
Caution:
For security reasons, Oracle strongly recommends that a separate directory, not the default directory, be used to store preprocessor programs. Do not store any other files in the directory in which preprocessor programs are stored.
The preprocessor program must reside in a directory object, so that access to it can be controlled for security reasons. The OS system manager must create a directory corresponding to the directory object and must verify that OS-user ORACLE has access to that directory. DBAs must ensure that only approved users are allowed access to the directory object associated with the directory path. Although multiple database users can have access to a directory object, only those with the EXECUTE
privilege can run a preprocessor in that directory. No existing database user with read-write privileges to a directory object will be able to use the preprocessing feature. DBAs can prevent preprocessors from ever being used by never granting the EXECUTE
privilege to anyone for a directory object.
See Also:
Oracle Database SQL Language Reference for information about granting the EXECUTE
privilege
file_spec
The name of the preprocessor program. It is appended to the path name associated with the directory object that is being used (either the directory_spec
or the default directory for the external table). The file_spec
cannot contain an absolute or relative directory path.
If the preprocessor program requires any arguments (for example, gunzip -c
), then you must specify the program name and its arguments in an executable shell script (or on Windows operating systems, in a batch (.bat) file). Shell scripts and batch files have certain requirements, as discussed in the following sections.
It is important to verify that the correct version of the preprocessor program is in the operating system directory.
The following is an example of specifying the PREPROCESSOR
clause without using a shell or batch file:
SQL> CREATE TABLE xtab (recno varchar2(2000)) 2 ORGANIZATION EXTERNAL ( 3 TYPE ORACLE_LOADER 4 DEFAULT DIRECTORY data_dir 5 ACCESS PARAMETERS ( 6 RECORDS DELIMITED BY NEWLINE 7 PREPROCESSOR execdir:'zcat' 8 FIELDS (recno char(2000))) 9 LOCATION ('foo.dat.gz')) 10 REJECT LIMIT UNLIMITED; Table created.
Using Shell Scripts With the PREPROCESSOR Clause on Linux Operating Systems
The shell script must reside in directory_spec
.
The full path name must be specified for system commands such as gunzip
.
The preprocessor shell script must have EXECUTE permissions.
The data file listed in the external table LOCATION
clause should be referred to by $1
.
The following example shows how to specify a shell script on the PREPROCESSOR
clause when creating an external table.
SQL> CREATE TABLE xtab (recno varchar2(2000)) 2 ORGANIZATION EXTERNAL ( 3 TYPE ORACLE_LOADER 4 DEFAULT DIRECTORY data_dir 5 ACCESS PARAMETERS ( 6 RECORDS DELIMITED BY NEWLINE 7 PREPROCESSOR execdir:'uncompress.sh' 8 FIELDS (recno char(2000))) 9 LOCATION ('foo.dat.gz')) 10 REJECT LIMIT UNLIMITED; Table created.
Using Batch Files With The PREPROCESSOR Clause on Windows Operating Systems
The batch file must reside in directory_spec
.
The full path name must be specified for system commands such as gunzip
.
The preprocessor batch file must have EXECUTE permissions.
The first line of the batch file should contain @echo off
. The reason for this requirement is that when the batch file is run, the default is to display the commands being executed, which has the unintended side-effect of the echoed commands being treated as input to the external table access driver.
To represent the input from the ___location clause, %1
should be used. (Note that this differs from Linux-style shell scripts where the ___location clause is referenced by $1
.)
A full path should be specified to any executables in the batch file (sed.exe
in the following example). Note also that the MKS Toolkit may not exist on all Windows installations so commands such as sed.exe
may not be available.
The batch file used on Windows must have either a .bat
or .cmd
extension. Failure to do so (e.g. trying to specify the preprocessor script as sed.sh) will result in the following error:
SQL> select * from foo ; select * from foo * ERROR at line 1: ORA-29913: error in executing ODCIEXTTABLEFETCH callout ORA-29400: data cartridge error KUP-04095: preprocessor command C:/Temp\sed.sh encountered error "CreateProcess Failure for Preprocessor: C:/Temp\sed.sh, errorcode: 193
The following is a simple example of using a batch file with the external table PREPROCESSOR
option on Windows. In this example a batch file uses the stream editor (sed.exe) utility to perform a simple transformation of the input data.
create table deptxt ( deptno char(2), dname char(14), loc char(13) ) organization external ( type ORACLE_LOADER default directory def_dir1 access parameters ( records delimited by newline badfile 'deptxt.bad' logfile 'deptxt.log' preprocessor exec_dir:'sed.bat' fields terminated by ',' missing field values are null ) ___location ('deptxt.dat') ) reject limit unlimited ; select * from deptxt ; Where deptxt.dat contains: 10,ACCOUNTING,NEW YORK 20,RESEARCH,DALLAS 30,SALES,CHICAGO 40,OPERATIONS,BOSTON
The preprocessor program sed.bat
has the following content:
@echo off c:/mksnt/mksnt/sed.exe -e 's/BOSTON/CALIFORNIA/' %1
The PREPROCESSOR
option passes the input data (deptxt.dat
) to sed.bat
. If you then select from the deptxt
table, the results show that the LOC
column in the last row, which used to be BOSTON
, is now CALIFORNIA
.
SQL> select * from deptxt ; DE DNAME LOC -- -------------- ------------- 1a ACCOUNTING NEW YORK 20 RESEARCH DALLAS 30 SALES CHICAGO 40 OPERATIONS CALIFORNIA 4 rows selected.
External tables treats each data file specified on the LOCATION
clause as a single granule. To make the best use of parallel processing with the PREPROCESSOR
clause, the data to be loaded should be split into multiple files (granules). This is because external tables limits the degree of parallelism to the number of data files present. For example, if you specify a degree of parallelism of 16, but have only 10 data files, then in effect the degree of parallelism is 10 because 10 slave processes will be busy and 6 will be idle. It is best to not have any idle slave processes. So if you do specify a degree of parallelism, then ideally it should be no larger than the number of data files so that all slave processes are kept busy.
See Also:
Oracle Database VLDB and Partitioning Guide for more information about granules of parallelism