Missing Value Treatment in Oracle Data Mining

Missing value treatment depends on the algorithm and on the nature of the data (categorical or numerical, sparse or missing at random). Missing value treatment is summarized in the following table.

Note:

Oracle Data Mining performs the same missing value treatment whether or not Automatic Data Preparation is being used.

Table 3-2 Missing Value Treatment by Algorithm

Missing Data EM, GLM, NMF, k-Means, SVD, SVM DT, MDL, NB, OC Apriori

NUMERICAL missing at random

The algorithm replaces missing numerical values with the mean.

For Expectation Maximization (EM), the replacement only occurs in columns that are modeled with Gaussian distributions.

The algorithm handles missing values naturally as missing at random.

The algorithm interprets all missing data as sparse.

CATEGORICAL missing at random

Genelized Linear Models (GLM), Non-Negative Matrix Factorization (NMF), k-Means, and Support Vector Machine (SVM) replaces missing categorical values with the mode.

Singular Value Decomposition (SVD) does not support categorical data.

EM does not replace missing categorical values. EM treats NULLs as a distinct value with its own frequency count.

The algorithm handles missing values naturally as missing random.

The algorithm interprets all missing data as sparse.

NUMERICAL sparse

The algorithm replaces sparse numerical data with zeros.

O-Cluster does not support nested data and therefore does not support sparse data. Decision Tree (DT), Minimum Description Length (MDL), and Naive Bayes (NB) and replace sparse numerical data with zeros.

The algorithm handles sparse data.

CATEGORICAL sparse

All algorithms except SVD replace sparse categorical data with zero vectors. SVD does not support categorical data.

O-Cluster does not support nested data and therefore does not support sparse data. DT, MDL, and NB replace sparse categorical data with the special value DM$SPARSE.

The algorithm handles sparse data.