Pruning#

Starting in version 2.1, ESS provides a pruning feature to perform hard deletes of soft-deleted resources and orphan data.

Pruning (Hard Deletes)#

Prune consists of two Kubernetes CronJobs :

  • One to delete “prunable” resources. Prunable resources are resources that have been marked for deletion (i.e., soft-deleted) and are past their INRUPT_STORAGE_PRUNE_RETENTION_WINDOW.

  • One to delete orphan data. Orphan data are resource data without associated metadata.

Important

Pruning operations may negatively affect performance. If possible, schedule the CronJob to run at times when you can minimize its impact. To configure the Prune jobs, see Modify Prune Configuration.

Configuration#

Configuration to Prune Soft-Deleted Resources#

For soft-deleted resources, Prune has the following configurations:

To configure the Prune jobs, see Modify Prune Configuration.

Configuration to Prune Orphan Data#

For orphan data, Prune has the following configurations:

To configure the Prune jobs, see Modify Prune Configuration.

Observability#

Pruning jobs uses JSON logging and emits the following info log entries:

  • Starting the pruning logic

    retentionWindowMilliseconds

    The value of the configured retention window.

    prunableBatchSize

    The value of the configured prunable batch size.

    orphanBatchSize

    The value of the configured orphan batch size.

    s3AccessKeyId

    The access key Id used to connect to S3.

    s3BucketName

    The bucket name used when retrieving S3 objects.

    s3Region

    The region used when retrieving S3 objects.

  • Found prunable resource metadata

    resourceCount

    The number of Solid resource metadata entries found to fall outside the retention window.

    dataCount

    The number of Solid resource data entries found to belong to metadata entries that fall outside the retention window.

    durationMilliseconds

    Time taken to find prunable resource metadata.

  • Got some resource data identifiers

    dataCount

    The number of Solid resource data entries listed from S3.

    durationMilliseconds

    Time taken to list resource data.

  • Checked metadata persistence for orphan resource data identifiers

    dataCount

    The number of Solid resource data entries (out of the total listed) found to lack corresponding listed metadata entries.

    durationMilliseconds

    Time taken to identify orphan resource data.

  • Deleted resource data

    dataCount

    The number of Solid resource data entries deleted from S3

    durationMilliseconds

    Time taken to delete resource data.

  • Pruned resource metadata

    durationMilliseconds

    Time taken to delete resource metadata.