Pruning
ESS provides a pruning feature to perform hard deletes of soft-deleted resources and orphan data.
The pruning process operates in multiple iterations with configurable batch sizes. This approach processes smaller batches of resources per iteration, reducing peak system load and memory consumption while allowing fine-grained control over the total processing time through the maximum iterations parameter.
Pruning (Hard Deletes)
Prune consists of two Kubernetes CronJobs :
One to delete “prunable” resources. Prunable resources are resources that have been marked for deletion (i.e., soft-deleted) and are past their
INRUPT_STORAGE_PRUNE_RETENTION_WINDOW.One to delete orphan data. Orphan data are resource data without associated metadata.
Important Pruning operations may negatively affect performance. If possible, schedule the CronJob to run at times when you can minimize its impact.
The batch iteration approach helps manage performance impact:
Smaller batch sizes reduce per-iteration resource consumption
The maximum iterations parameter controls total processing time per job run
If the maximum iterations limit is reached, a warning is logged and remaining items are processed in subsequent scheduled runs
To configure the Prune jobs, see Modify Prune Configuration .
Configuration
Configuration to Prune Soft-Deleted Resources
For soft-deleted resources, Prune has the following configurations:
INRUPT_STORAGE_PRUNE_RETENTION_WINDOWRequired.
Determines which soft-deleted resources are “prunable”.
Specify the value in a format supported by Java Duration.parse() method.
INRUPT_STORAGE_PRUNE_PRUNABLE_BATCH_SIZERequired.
Limits the number of results returned per iteration when querying the metadata.
Set to an integer value.
INRUPT_STORAGE_PRUNE_ORPHAN_BATCH_SIZERequired.
Set to
0when pruning soft-deleted resources.
INRUPT_STORAGE_PRUNE_MAX_ITERATIONSOptional.
Maximum number of batch iterations to execute per prune job run.
Default:
100. Kubernetes deployment default:1000.Set to an integer value.
INRUPT_STORAGE_PRUNE_ITERATION_DELAY_MSOptional.
Millisecond delay between iterations to reduce system load.
Default:
20.Set to
0to disable the delay.
INRUPT_STORAGE_PRUNE_PRE_COMPLETION_DELAY_MSOptional.
Delay in milliseconds before process completion to allow metrics scraping.
Default:
10000(10 seconds).
COM_INRUPT_STORAGE_METADATA_JDBC_CONNECTIONLIMITER_OPENCONNECTION_TIMEOUT_VALUERequired.
Determines how long to keep the connection to the metadata database open.
Set to an integer value. Adjust the value to accommodate changes in
To configure the Prune jobs, see Modify Prune Configuration
Configuration to Prune Orphan Data
For orphan data, Prune has the following configurations:
INRUPT_STORAGE_PRUNE_RETENTION_WINDOWOptional.
No impact on pruning orphan data.
INRUPT_STORAGE_PRUNE_PRUNABLE_BATCH_SIZERequired.
Set to
0when pruning orphan data.
INRUPT_STORAGE_PRUNE_ORPHAN_BATCH_SIZERequired.
Determines the maximum number of data identifiers selected per iteration by Prune during orphan data cleanup.
Set to an integer value.
INRUPT_STORAGE_PRUNE_MAX_ITERATIONSOptional.
Maximum number of batch iterations to execute per prune job run.
Default:
100. Kubernetes deployment default:1000.Set to an integer value.
Shared with soft-deleted resource pruning configuration.
INRUPT_STORAGE_PRUNE_ITERATION_DELAY_MSOptional.
Millisecond delay between iterations to reduce system load.
Default:
20.Set to
0to disable the delay.Shared with soft-deleted resource pruning configuration.
INRUPT_STORAGE_PRUNE_PRE_COMPLETION_DELAY_MSOptional.
Delay in milliseconds before process completion to allow metrics scraping.
Default:
10000(10 seconds).Shared with soft-deleted resource pruning configuration.
COM_INRUPT_STORAGE_METADATA_JDBC_CONNECTIONLIMITER_OPENCONNECTION_TIMEOUT_VALUERequired.
Determines how long to keep the connection to the metadata database open.
Adjust the value to accommodate change to
INRUPT_STORAGE_PRUNE_ORPHAN_BATCH_SIZE.
To configure the Prune jobs, see Modify Prune Configuration
Observability
Logging for Pruning jobs share a consistent pattern where the messageId has the prefix STORAGEPRUNE .
For pruning jobs, the additional fields include:
an
mdc(managed diagnostic context) field that can be used for correlation;various pruning metrics.
The following lists the various pruning metrics that appear in the log messages (listed by the messageId ):
INFO-Level Messages
STORAGEPRUNE000001(Pruning start process)FieldDescriptionretentionWindowThe value of the
configured retention windowin milliseconds.STORAGEPRUNE000002(Prune process completion)Logged when the entire pruning process completes.
STORAGEPRUNE000003(Pruner completed)FieldDescriptionprunerThe name of the pruner that completed (e.g., StorageMetadataPruner, StorageOrphanPruner, JdbcViewDefinitionPersistence, JdbcViewBindingPersistence).
countThe total number of items pruned by this pruner.
WARNING-Level Messages
STORAGEPRUNE000004(Maximum iterations reached)FieldDescriptionprunerThe name of the pruner that reached the iteration limit.
maxIterationsThe configured
maximum iterationsvalue.This warning indicates that more items may remain to be pruned. They will be processed in subsequent scheduled job runs.
Prune emits Prometheus metrics with the following labeled names.
All of the following are prefixed with application_com_inrupt_storage_prune_.
{type="retentionWindow"}
The value of the configured retention window.
{type="prunableBatchSize"}
The value of the configured prunable batch size.
{type="orphanBatchSize"}
The value of the configured orphan batch size.
{type="resource", status="prunable"}
The number of Solid resource metadata entries found to fall outside the retention window.
{type="data", status="prunable"}
The number of Solid resource data entries found to belong to metadata entries that fall outside the retention window.
{type="data", status="listed"}
The number of Solid resource data entries listed from S3.
{type="data", status="orphan"}
The number of Solid resource data entries (out of the total listed) found to lack corresponding listed metadata entries.
{type="data", status="deleted"}
The number of Solid resource data entries deleted from S3.
Pruner_findPrunable
Time (in milliseconds) taken to find prunable resource metadata (i.e., metadata associated with soft-deleted resources).
Pruner_listData
Time (in milliseconds) taken to list resource data.
Pruner_findOrphans
Time (in milliseconds) taken to identify orphan resource data.
Pruner_deleteData
Time (in milliseconds) taken to delete resource data.
Pruner_pruneMetadata
Time (in milliseconds) taken to delete resource metadata.
When OpenTelemetry is configured, the application emits a single span named prune with the following attributes.
retentionWindowMilliseconds
The value of the configured retention window.
prunableBatchSize
The value of the configured prunable batch size.
orphanBatchSize
The value of the configured orphan batch size.
resourceCount
The number of Solid resource metadata entries found to fall outside the retention window.
dataCount
The number of Solid resource data entries found to belong to metadata entries that fall outside the retention window.
Last updated