Datameer Artifact Retention Policy

Goal

Better understand the retention policy options for Datameer artifacts.

Learn 

The retention policy in Datameer allows to configure the following parameters:

  1. Keep last N results (regardless of their age).
  2. Purge results older than N days.
  3. Purge results older than N days, but keep last N results.
  4. Never delete historical data.
  5. ExportOnly (for Workbooks).


A corresponding configuration is stored in the  dap_job_configuration table under the columns min_keep_count and expire_time_days.

The possible combinations of values in these columns (respectively) are:

N / NULL - Keep last N results (Purge results older than N days is empty).
NULL / N - Purge results older than N days (Keep last N results is empty).
N / N - Purge results older than N days, but keep last N results.
NULL / NULL - Never remove historical data.
0 / 1 - Export Only.


The query allows to view all artifacts tighter with their retention policy and use the WHERE clause to filter desired results.

SELECT
dap_job_configuration.id ConfID,
dap_file.name Name,
CASE dap_file.extension
WHEN 'IMPORT_LINK_JOB_EXTENSION' THEN 'Data Link'
WHEN 'IMPORT_JOB_EXTENSION' THEN 'Import Job'
WHEN 'WORKBOOK_EXTENSION' THEN 'Workbook'
WHEN 'EXPORT_JOB_EXTENSION' THEN 'Export Job'
END Type,
permission.owner Owner,
dap_file.creation_date CreationTime,
dap_job_configuration.min_keep_count KeptResults,
dap_job_configuration.expire_time_days PurgeResultsAfter,
dap_file.id FileID
FROM
dap_job_configuration
JOIN dap_file ON dap_job_configuration.dap_file__id = dap_file.id
JOIN permission ON dap_file.permission_fk = permission.id;

 

With the option Append with sliding time window you could set the following retention policies:

  • Use only the Expire after field. Datameer will keep records ingested by an import job during last N days/weeks/month, older records will be removed. For example, in case you set Expire after to 5 days and will run the import job daily that ingests 10 records every day, Datameer will keep only 50 recent records each time. At the moment, this option is impacted by a bug reported under DAP-36437 (workaround is set Keep last N results parameter together with Expire after).

  • Use only the Keep last N results field. Datameer will keep records ingested by the import job during last N executions, regardless of the time. For example, in case you set Keep last N results to 5, Datameer will keep data imported during last 5 job executions, regardless whether the job has run 5 times in 1 hour or 1 week. Please note that executions that don't import any records are also being considered. If there would be no new data added into the source table and the ImportJob will be executed 5 times, no records will be stored and this moment.

  • Use both Expire after and Keep last N results parameters at the same time. This gives an additional flexibility and allows to ensure that N results will still be stored, even if some of them are expired, e.g. in case you pause at the ingestion but still want to use previously ingested data.