Datameer Artifact Retention Policy
Better understand the retention policy options for Datameer artifacts.
The retention policy in Datameer allows to configure the following parameters:
- Keep last N results (regardless of their age).
- Purge results older than N days.
- Purge results older than N days, but keep last N results.
- Never delete historical data.
- ExportOnly (for Workbooks).
A corresponding configuration is stored in the
dap_job_configuration table under the columns
The possible combinations of values in these columns (respectively) are:
N / NULL - Keep last N results (Purge results older than N days is empty).
NULL / N - Purge results older than N days (Keep last N results is empty).
N / N - Purge results older than N days, but keep last N results.
NULL / NULL - Never remove historical data.
0 / 1 - Export Only.
The query allows to view all artifacts tighter with their retention policy and use the WHERE clause to filter desired results.
SELECT dap_job_configuration.id ConfID, dap_file.name Name, CASE dap_file.extension WHEN 'IMPORT_LINK_JOB_EXTENSION' THEN 'Data Link' WHEN 'IMPORT_JOB_EXTENSION' THEN 'Import Job' WHEN 'WORKBOOK_EXTENSION' THEN 'Workbook' WHEN 'EXPORT_JOB_EXTENSION' THEN 'Export Job' END Type, permission.owner Owner, dap_file.creation_date CreationTime, dap_job_configuration.min_keep_count KeptResults, dap_job_configuration.expire_time_days PurgeResultsAfter, dap_file.id FileID FROM dap_job_configuration JOIN dap_file ON dap_job_configuration.dap_file__id = dap_file.id JOIN permission ON dap_file.permission_fk = permission.id;
With the option
Append with sliding time window you could set the following retention policies:
- Use only the
Expire afterfield. Datameer will keep records ingested by an import job during last N days/weeks/month, older records will be removed. For example, in case you set
5 daysand will run the import job daily that ingests 10 records every day, Datameer will keep only 50 recent records each time. At the moment, this option is impacted by a bug reported under DAP-36437 (workaround is set
Keep last N resultsparameter together with
- Use only the
Keep last N resultsfield. Datameer will keep records ingested by the import job during last N executions, regardless of the time. For example, in case you set
Keep last N resultsto
5, Datameer will keep data imported during last 5 job executions, regardless whether the job has run 5 times in 1 hour or 1 week. Please note that executions that don't import any records are also being considered. If there would be no new data added into the source table and the ImportJob will be executed 5 times, no records will be stored and this moment.
- Use both
Keep last N resultsparameters at the same time. This gives an additional flexibility and allows to ensure that N results will still be stored, even if some of them are expired, e.g. in case you pause at the ingestion but still want to use previously ingested data.