Match Mode
Luden supports batching of incoming events for data warehouse and cloud files storage destinations. This feature allows to reduce the number of requests to the destination, improve performance, and for some data warehouses reduce processing costs.
Luden collects incoming events in batches files for the period tied to log files rotation period. Then it runs uploader jobs that reads incoming files and loads events to the destination using the most effective way. Read more about batch file Directories structure.
Configuration
Batch mode can be configured for each destination separately in the destinations
section of the server's YAML configuration file using mode
parameter it is enabled by default for all supporting destinations.
In Configurator UI 'Mode' selector is available in destination setup form.
Pipeline
First, an event is being written in
events/incoming
directory to the current log fileLog files are being rotated once in N (=5 by default) minutes and processed in a separate thread
Log files processing. Get all unprocessed logs (all files in
events/incoming
that not in process)Multiplex records to destinations
For each destination: evaluate
table_name_template
expression to get a destination table name. If the result is an empty string, skip this destination. If evaluation failed, the event is written toevents/failed
For each destination/table pair:
Check status in the status file of the log. If a pair has been processed, ignore it
Apply LookupEnrichment step
Apply Transformation and MappingStep (get BatchHeader)
Maintain up-to date BatchHeader in memory. If a new field appears add it with type to BatchHeader
On type conflict: apply Type Promotion
Once batch objects and BatchHeader are prepared, proceed to table patching. For each destination/table pair:
Map BatchHeader into Table structure with SQL column types depends on the destination type and primary keys (if configured)
Get Table structure from the destination
Acquire destination lock (using distributed lock service)
Compare two Table structures (from the destination and from data)
Maintain primary key (if configured)
If a column is missing run ALTER TABLE
If a column is present in the table, but missing in BatchHeader - ignore
Release destination lock
Depend on a destination bulk insert objects to destination with explicit typecast (if it is configured in JavaScript Transformation) or write them with json/csv serialization to cloud storage and execute destination load command
On success update log status file and mark destination/table pair as OK (mark is as FAILED) otherwise. If all pairs are marked as OK, rotate the log file to
events/archive
Last updated