Pruning#
Starting in version 2.1, ESS provides a pruning feature to perform hard deletes of soft-deleted resources and orphan data.
Pruning (Hard Deletes)#
Prune consists of two Kubernetes CronJobs :
One to delete “prunable” resources. Prunable resources are resources that have been marked for deletion (i.e., soft-deleted) and are past their
INRUPT_STORAGE_PRUNE_RETENTION_WINDOW
.One to delete orphan data. Orphan data are resource data without associated metadata.
Important
Pruning operations may negatively affect performance. If possible, schedule the CronJob to run at times when you can minimize its impact. To configure the Prune jobs, see Modify Prune Configuration.
Configuration#
Configuration to Prune Soft-Deleted Resources#
For soft-deleted resources, Prune has the following configurations:
INRUPT_STORAGE_PRUNE_RETENTION_WINDOW
.Required.
Determines which soft-deleted resources are “prunable”.
Specify the value in a format supported by Java Duration.parse() method.
INRUPT_STORAGE_PRUNE_PRUNABLE_BATCH_SIZE
Required.
Limits the number of results returned when querying the metadata.
Set to an integer value.
INRUPT_STORAGE_PRUNE_ORPHAN_BATCH_SIZE
Required.
Set to
0
when pruning soft-deleted resources.
COM_INRUPT_STORAGE_METADATA_JDBC_CONNECTIONLIMITER_OPENCONNECTION_TIMEOUT_VALUE
Required.
Determines how long to keep the connection to the metadata database open.
Set to an integer value. Adjust the value to accommodate changes in
To configure the Prune jobs, see Modify Prune Configuration.
Configuration to Prune Orphan Data#
For orphan data, Prune has the following configurations:
INRUPT_STORAGE_PRUNE_RETENTION_WINDOW
.Optional.
No impact on pruning orphan data.
INRUPT_STORAGE_PRUNE_PRUNABLE_BATCH_SIZE
Required.
Set to
0
when pruning orphan data.
INRUPT_STORAGE_PRUNE_ORPHAN_BATCH_SIZE
Required.
Determines the maximum number of data identifiers selected by Prune during orphan data cleanup.
Set to an integer value.
COM_INRUPT_STORAGE_METADATA_JDBC_CONNECTIONLIMITER_OPENCONNECTION_TIMEOUT_VALUE
Required.
Determines how long to keep the connection to the metadata database open.
Adjust the value to accommodate change to
INRUPT_STORAGE_PRUNE_ORPHAN_BATCH_SIZE
.
To configure the Prune jobs, see Modify Prune Configuration.
Observability#
As part of the ESS 2.2 Logging Enhancements, logging for Pruning
jobs share the consistent pattern where the messageId
has the
prefix STORAGEPRUNE
.
{
"timestamp":<value>,
"sequence":<value>,
"loggerClassName":<value>,
"loggerName":<value>,
"level":<value>,
"message": "STORAGEPRUNE<number>: <description>",
"threadName":<value>,
"threadId":<value>,
"hostName":<value>,
"processName":<value>,
"processId":<value>,
"messageId": "STORAGEPRUNE<number>"
// additional relevant fields, if any
}
For pruning jobs, the additional fields include:
an
mdc
(managed diagnostic context) field that can be used for correlation;various pruning metrics.
The following lists the various pruning metrics that appear in
the INFO
level log messages (listed by the messageId
):
STORAGEPRUNE000001
(associated with the pruning start process)Field
Description
retentionWindowMilliseconds
The value of the
configured retention window
.prunableBatchSize
The value of the
configured prunable batch size
.orphanBatchSize
The value of the
configured orphan batch size
.STORAGEPRUNE000002
(associated with finding prunable objects process)resourceCount
The number of Solid resource metadata entries found to fall outside the retention window.
contentCount
The number of Solid resource data entries found to belong to metadata entries that fall outside the retention window.
durationMilliseconds
Time taken to find prunable resource metadata.
STORAGEPRUNE000005
(associated with finding prunable resource data in persistence)resultCount
The number of Solid resource data entries listed from S3.
durationMilliseconds
Time taken to list resource data.
STORAGEPRUNE000007
(associated with finding prunable orphan data in persistence)resultCount
The number of orphan data entries listed from S3.
durationMilliseconds
Time taken to list orphan data.
STORAGEPRUNE000009
(associated with pruning/deletion process from persistence)resultCount
The number of Solid resource data entries deleted from S3
durationMilliseconds
Time taken to delete resource data.
STORAGEPRUNE000010
(associated with pruning/deletion process from metadata)durationMilliseconds
Time taken to delete resource metadata.
Prune emits Prometheus metrics with the following labeled names.
All of the following are prefixed with application_com_inrupt_storage_prune_
.
|
The value of the |
|
The value of the |
|
The value of the |
|
The number of Solid resource metadata entries found to fall outside the retention window. |
|
The number of Solid resource data entries found to belong to metadata entries that fall outside the retention window. |
|
The number of Solid resource data entries listed from S3. |
|
The number of Solid resource data entries (out of the total listed) found to lack corresponding listed metadata entries. |
|
The number of Solid resource data entries deleted from S3. |
|
Time (in milliseconds) taken to find prunable resource metadata (i.e., metadata associated with soft-deleted resources). |
|
Time (in milliseconds) taken to list resource data. |
|
Time (in milliseconds) taken to identify orphan resource data. |
|
Time (in milliseconds) taken to delete resource data. |
|
Time (in milliseconds) taken to delete resource metadata. |
When OpenTelemetry is configured, the application emits a single
span named prune
with the following attributes.
|
The value of the |
|
The value of the |
|
The value of the |
|
The number of Solid resource metadata entries found to fall outside the retention window. |
|
The number of Solid resource data entries found to belong to metadata entries that fall outside the retention window. |