Pruning#
Starting in version 2.1, ESS provides a pruning feature to perform hard deletes of soft-deleted resources and orphan data.
Pruning (Hard Deletes)#
Prune consists of two Kubernetes CronJobs :
One to delete “prunable” resources. Prunable resources are resources that have been marked for deletion (i.e., soft-deleted) and are past their
INRUPT_STORAGE_PRUNE_RETENTION_WINDOW
.One to delete orphan data. Orphan data are resource data without associated metadata.
Important
Pruning operations may negatively affect performance. If possible, schedule the CronJob to run at times when you can minimize its impact. To configure the Prune jobs, see Modify Prune Configuration.
Configuration#
Configuration to Prune Soft-Deleted Resources#
For soft-deleted resources, Prune has the following configurations:
INRUPT_STORAGE_PRUNE_RETENTION_WINDOW
.Required.
Determines which soft-deleted resources are “prunable”.
Specify the value in a format supported by Java Duration.parse() method.
INRUPT_STORAGE_PRUNE_PRUNABLE_BATCH_SIZE
Required.
Limits the number of results returned when querying the metadata.
Set to an integer value.
INRUPT_STORAGE_PRUNE_ORPHAN_BATCH_SIZE
Required.
Set to
0
when pruning soft-deleted resources.
COM_INRUPT_STORAGE_METADATA_JDBC_CONNECTIONLIMITER_OPENCONNECTION_TIMEOUT_VALUE
Required.
Determines how long to keep the connection to the metadata database open.
Set to an integer value. Adjust the value to accommodate changes in
To configure the Prune jobs, see Modify Prune Configuration.
Configuration to Prune Orphan Data#
For orphan data, Prune has the following configurations:
INRUPT_STORAGE_PRUNE_RETENTION_WINDOW
.Optional.
No impact on pruning orphan data.
INRUPT_STORAGE_PRUNE_PRUNABLE_BATCH_SIZE
Required.
Set to
0
when pruning orphan data.
INRUPT_STORAGE_PRUNE_ORPHAN_BATCH_SIZE
Required.
Determines the maximum number of data identifiers selected by Prune during orphan data cleanup.
Set to an integer value.
COM_INRUPT_STORAGE_METADATA_JDBC_CONNECTIONLIMITER_OPENCONNECTION_TIMEOUT_VALUE
Required.
Determines how long to keep the connection to the metadata database open.
Adjust the value to accommodate change to
INRUPT_STORAGE_PRUNE_ORPHAN_BATCH_SIZE
.
To configure the Prune jobs, see Modify Prune Configuration.
Observability#
Pruning jobs uses JSON logging and emits the following info
log entries:
Starting the pruning logic
retentionWindowMilliseconds
The value of the
configured retention window
.prunableBatchSize
The value of the
configured prunable batch size
.orphanBatchSize
The value of the
configured orphan batch size
.s3AccessKeyId
The access key Id used to connect to S3.
s3BucketName
The bucket name used when retrieving S3 objects.
s3Region
The region used when retrieving S3 objects.
Found prunable resource metadata
resourceCount
The number of Solid resource metadata entries found to fall outside the retention window.
dataCount
The number of Solid resource data entries found to belong to metadata entries that fall outside the retention window.
durationMilliseconds
Time taken to find prunable resource metadata.
Got some resource data identifiers
dataCount
The number of Solid resource data entries listed from S3.
durationMilliseconds
Time taken to list resource data.
Checked metadata persistence for orphan resource data identifiers
dataCount
The number of Solid resource data entries (out of the total listed) found to lack corresponding listed metadata entries.
durationMilliseconds
Time taken to identify orphan resource data.
Deleted resource data
dataCount
The number of Solid resource data entries deleted from S3
durationMilliseconds
Time taken to delete resource data.
Pruned resource metadata
durationMilliseconds
Time taken to delete resource metadata.
Prune emits Prometheus metrics with the following labeled names.
All of the following are prefixed with application_com_inrupt_storage_prune_
.
|
The value of the |
|
The value of the |
|
The value of the |
|
The number of Solid resource metadata entries found to fall outside the retention window. |
|
The number of Solid resource data entries found to belong to metadata entries that fall outside the retention window. |
|
The number of Solid resource data entries listed from S3. |
|
The number of Solid resource data entries (out of the total listed) found to lack corresponding listed metadata entries. |
|
The number of Solid resource data entries deleted from S3. |
|
Time (in milliseconds) taken to find prunable resource metadata (i.e., metadata associated with soft-deleted resources). |
|
Time (in milliseconds) taken to list resource data. |
|
Time (in milliseconds) taken to identify orphan resource data. |
|
Time (in milliseconds) taken to delete resource data. |
|
Time (in milliseconds) taken to delete resource metadata. |
When OpenTelemetry is configured, the application emits a single
span named prune
with the following attributes.
|
The value of the |
|
The value of the |
|
The value of the |
|
The number of Solid resource metadata entries found to fall outside the retention window. |
|
The number of Solid resource data entries found to belong to metadata entries that fall outside the retention window. |