Base definition of a runnable Consumer. Consumers are responsible for persisting data to disk
Configuration for a deduper process
Parses config settings which define the format for outputting csv data. Note that the default file extension for
a csv output is 'txt' and the default delimiter is a comma
creates and writes out duplicate data to csv target. duplicate target is configured in config
creates and writes out hash values found in a deduper process to a csv defined in config
Defines output information for csv target data based on the jndi name and jndi context
parser for jndi entries which are configured for csv output. Parses the values found in config
create and writes out "deduped" data to csv target. target is configured in config
dedupes data based on config settings
Consumer for processing and persisting target data, IE "deduped" data
Consumer for processing and persisting duplicate data
summary of a dedupe operation, with total recordCount, hashColumns used, columnsFound in the actual source
query, dupeCount, distinctDupeCount, and dupes found in the dedupe process.
Consumer for processing and persisting MD-5 hashes data
represents a simple duplicate value found by deduper.
definition for writing out duplicate data to a target flat file or sql table
Settings for the Execution Service timeout. Once a deduper has published all data to the blocking queues of all
consumers, consumers will have a dynamic timeout set (defaults to 60 seconds) for all consumers to finish persisting
data
A utility library for file operations
Utility class for hashing methods
definition for writing out hash values of rows found in source data
represents hashed data created by deduper
A hash source jndi entity. This is used when configuring a specific set of existing hashes to "dedupe" against
Defines output information for target data based on the jndi name and jndi context
reprsentation of a sample of data showing the comma-delimited sampleString and the associated sampleHash for that
sample string
A source jndi entity
creates a sql table for persisting duplicate data. This is configured using the dupesJndi
contained in the associated context
creates a sql table for persisting hashed data rows. This is configured using the hashJndi
contained in the associated context
Defines output information for sql duplicate data based on the jndi name and jndi context
Defines output information for sql hash data based on the jndi name and jndi context
Defines output information for sql target data based on the jndi name and jndi context
create and writes out "deduped" data to a sql table. targetName is the table name in the javax.sql.DataSource
configured in the targetJndi for the associated context. varcharPadding is a number of extra bytes which can
be configured if the target needs larger varchar fields than were extracted by the source.
definition for writing out deduped data to a target flat file or sql table
base definition for writing out output data