# Pruning

ESS provides a pruning feature to perform hard deletes of [soft-deleted resources](https://docs.inrupt.com/ess/pod-resources#resource-deletion) and [orphan data](https://docs.inrupt.com/ess/pod-resources#modification-of-resource-content).

The pruning process operates in multiple iterations with configurable batch sizes. This approach processes smaller batches of resources per iteration, reducing peak system load and memory consumption while allowing fine-grained control over the total processing time through the maximum iterations parameter.

## Pruning (Hard Deletes)

Prune consists of two [Kubernetes CronJobs](https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/) :

* [One](https://docs.inrupt.com/ess/services/service-pod-management/service-pod-storage#pruning-soft-deleted-resources) to delete “prunable” resources. Prunable resources are resources that have been marked for deletion (i.e., soft-deleted) and are past their [**`INRUPT_STORAGE_PRUNE_RETENTION_WINDOW`**](https://docs.inrupt.com/ess/services/service-pod-management/service-pod-storage#inrupt_storage_prune_retention_window) .
* [One](https://docs.inrupt.com/ess/services/service-pod-management/service-pod-storage#pruning-orphan-data) to delete orphan data. Orphan data are resource data without associated metadata.

{% hint style="warning" %}
**Important**\
Pruning operations may negatively affect performance. If possible, schedule the CronJob to run at times when you can minimize its impact.

The batch iteration approach helps manage performance impact:

* Smaller batch sizes reduce per-iteration resource consumption
* The maximum iterations parameter controls total processing time per job run
* If the maximum iterations limit is reached, a warning is logged and remaining items are processed in subsequent scheduled runs

To configure the Prune jobs, see [Modify Prune Configuration](https://docs.inrupt.com/ess/latest/installation/customize-configurations/customization-pod-maintenance/modify-prune) .
{% endhint %}

## Configuration

### Configuration to Prune Soft-Deleted Resources

For [soft-deleted resources](https://docs.inrupt.com/ess/pod-resources#resource-deletion), Prune has the following configurations:

* [**`INRUPT_STORAGE_PRUNE_RETENTION_WINDOW`**](https://docs.inrupt.com/ess/services/service-pod-management/service-pod-storage#inrupt_storage_prune_retention_window)
  * Required.
  * Determines which soft-deleted resources are “prunable”.
  * Specify the value in a format supported by Java [Duration.parse()](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/time/Duration.html#parse\(java.lang.CharSequence\)) method.
* [**`INRUPT_STORAGE_PRUNE_PRUNABLE_BATCH_SIZE`**](https://docs.inrupt.com/ess/services/service-pod-management/service-pod-storage#inrupt_storage_prune_prunable_batch_size)
  * Required.
  * Limits the number of results returned per iteration when querying the metadata.
  * Set to an integer value.
* [**`INRUPT_STORAGE_PRUNE_ORPHAN_BATCH_SIZE`**](https://docs.inrupt.com/ess/services/service-pod-management/service-pod-storage#inrupt_storage_prune_orphan_batch_size)
  * Required.
  * Set to **`0`** when pruning soft-deleted resources.
* [**`INRUPT_STORAGE_PRUNE_MAX_ITERATIONS`**](https://docs.inrupt.com/ess/services/service-pod-management/service-pod-storage#inrupt_storage_prune_max_iterations)
  * Optional.
  * Maximum number of batch iterations to execute per prune job run.
  * Default: `100`. Kubernetes deployment default: `1000`.
  * Set to an integer value.
* [**`INRUPT_STORAGE_PRUNE_ITERATION_DELAY_MS`**](https://docs.inrupt.com/ess/services/service-pod-management/service-pod-storage#inrupt_storage_prune_iteration_delay_ms)
  * Optional.
  * Millisecond delay between iterations to reduce system load.
  * Default: `20`.
  * Set to `0` to disable the delay.
* [**`INRUPT_STORAGE_PRUNE_PRE_COMPLETION_DELAY_MS`**](https://docs.inrupt.com/ess/services/service-pod-management/service-pod-storage#inrupt_storage_prune_pre_completion_delay_ms)
  * Optional.
  * Delay in milliseconds before process completion to allow metrics scraping.
  * Default: `10000` (10 seconds).
* [**`COM_INRUPT_STORAGE_METADATA_JDBC_CONNECTIONLIMITER_OPENCONNECTION_TIMEOUT_VALUE`**](https://docs.inrupt.com/ess/services/service-pod-management/service-pod-storage#com_inrupt_storage_metadata_jdbc_connectionlimiter_openconnection_timeout_value)
  * Required.
  * Determines how long to keep the connection to the metadata database open.
  * Set to an integer value. Adjust the value to accommodate changes in
    * [**`INRUPT_STORAGE_PRUNE_RETENTION_WINDOW`**](https://docs.inrupt.com/ess/services/service-pod-management/service-pod-storage#inrupt_storage_prune_retention_window)
    * [**`INRUPT_STORAGE_PRUNE_PRUNABLE_BATCH_SIZE`**](https://docs.inrupt.com/ess/services/service-pod-management/service-pod-storage#inrupt_storage_prune_prunable_batch_size)

To configure the Prune jobs, see [Modify Prune Configuration](https://docs.inrupt.com/ess/latest/installation/customize-configurations/customization-pod-maintenance/modify-prune)

### Configuration to Prune Orphan Data

For [orphan data](https://docs.inrupt.com/ess/pod-resources#modification-of-resource-content), Prune has the following configurations:

* [**`INRUPT_STORAGE_PRUNE_RETENTION_WINDOW`**](https://docs.inrupt.com/ess/services/service-pod-management/service-pod-storage#inrupt_storage_prune_retention_window)
  * Optional.
  * No impact on pruning orphan data.
* [**`INRUPT_STORAGE_PRUNE_PRUNABLE_BATCH_SIZE`**](https://docs.inrupt.com/ess/services/service-pod-management/service-pod-storage#inrupt_storage_prune_prunable_batch_size)
  * Required.
  * Set to **`0`** when pruning orphan data.
* [**`INRUPT_STORAGE_PRUNE_ORPHAN_BATCH_SIZE`**](https://docs.inrupt.com/ess/services/service-pod-management/service-pod-storage#inrupt_storage_prune_orphan_batch_size)
  * Required.
  * Determines the maximum number of data identifiers selected per iteration by Prune during [orphan data](https://docs.inrupt.com/ess/pod-resources#modification-of-resource-content) cleanup.
  * Set to an integer value.
* [**`INRUPT_STORAGE_PRUNE_MAX_ITERATIONS`**](https://docs.inrupt.com/ess/services/service-pod-management/service-pod-storage#inrupt_storage_prune_max_iterations)
  * Optional.
  * Maximum number of batch iterations to execute per prune job run.
  * Default: `100`. Kubernetes deployment default: `1000`.
  * Set to an integer value.
  * Shared with soft-deleted resource pruning configuration.
* [**`INRUPT_STORAGE_PRUNE_ITERATION_DELAY_MS`**](https://docs.inrupt.com/ess/services/service-pod-management/service-pod-storage#inrupt_storage_prune_iteration_delay_ms)
  * Optional.
  * Millisecond delay between iterations to reduce system load.
  * Default: `20`.
  * Set to `0` to disable the delay.
  * Shared with soft-deleted resource pruning configuration.
* [**`INRUPT_STORAGE_PRUNE_PRE_COMPLETION_DELAY_MS`**](https://docs.inrupt.com/ess/services/service-pod-management/service-pod-storage#inrupt_storage_prune_pre_completion_delay_ms)
  * Optional.
  * Delay in milliseconds before process completion to allow metrics scraping.
  * Default: `10000` (10 seconds).
  * Shared with soft-deleted resource pruning configuration.
* [**`COM_INRUPT_STORAGE_METADATA_JDBC_CONNECTIONLIMITER_OPENCONNECTION_TIMEOUT_VALUE`**](https://docs.inrupt.com/ess/services/service-pod-management/service-pod-storage#com_inrupt_storage_metadata_jdbc_connectionlimiter_openconnection_timeout_value)
  * Required.
  * Determines how long to keep the connection to the metadata database open.
  * Adjust the value to accommodate change to [**`INRUPT_STORAGE_PRUNE_ORPHAN_BATCH_SIZE`**](https://docs.inrupt.com/ess/services/service-pod-management/service-pod-storage#inrupt_storage_prune_orphan_batch_size).

To configure the Prune jobs, see [Modify Prune Configuration](https://docs.inrupt.com/ess/latest/installation/customize-configurations/customization-pod-maintenance/modify-prune)

## Observability

{% tabs %}
{% tab title="Default Logging" %}
{% hint style="warning" %}
Logging for Pruning jobs share a consistent pattern where the **`messageId`** has the prefix **`STORAGEPRUNE`** .
{% endhint %}

```json
{
  "timestamp":<value>,
  "sequence":<value>,
  "loggerClassName":<value>,
  "loggerName":<value>,
  "level":<value>,
  "message": "STORAGEPRUNE<number>: <description>",
  "threadName":<value>,
  "threadId":<value>,
  "hostName":<value>,
  "processName":<value>,
  "processId":<value>,
  "messageId": "STORAGEPRUNE<number>"
  // additional relevant fields, if any
}
```

For pruning jobs, the additional fields include:

* an **`mdc`** (managed diagnostic context) field that can be used for correlation;
* various pruning metrics.

The following lists the various pruning metrics that appear in the log messages (listed by the **`messageId`** ):

**INFO-Level Messages**

* **`STORAGEPRUNE000001`** (Pruning start process)

  | Field                 | Description                                                                                                                                                                                                                                                                                                                           |
  | --------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
  | **`retentionWindow`** | The value of the [**`configured retention window`**](https://docs.inrupt.com/ess/services/service-pod-management/service-pod-storage#inrupt_storage_prune_retention_window) in milliseconds.                                                                                                                                          |
  | **`batchSize`**       | The value of the configured batch size (either [**`prunable`**](https://docs.inrupt.com/ess/services/service-pod-management/service-pod-storage#inrupt_storage_prune_prunable_batch_size) or [**`orphan`**](https://docs.inrupt.com/ess/services/service-pod-management/service-pod-storage#inrupt_storage_prune_orphan_batch_size)). |
* **`STORAGEPRUNE000002`** (Prune process completion)

  Logged when the entire pruning process completes.
* **`STORAGEPRUNE000003`** (Pruner completed)

  | Field        | Description                                                                                                                                          |
  | ------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------- |
  | **`pruner`** | The name of the pruner that completed (e.g., StorageMetadataPruner, StorageOrphanPruner, JdbcViewDefinitionPersistence, JdbcViewBindingPersistence). |
  | **`count`**  | The total number of items pruned by this pruner.                                                                                                     |

**WARNING-Level Messages**

* **`STORAGEPRUNE000004`** (Maximum iterations reached)

  | Field               | Description                                                                                                                                                           |
  | ------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
  | **`pruner`**        | The name of the pruner that reached the iteration limit.                                                                                                              |
  | **`maxIterations`** | The configured [**`maximum iterations`**](https://docs.inrupt.com/ess/services/service-pod-management/service-pod-storage#inrupt_storage_prune_max_iterations) value. |

  This warning indicates that more items may remain to be pruned. They will be processed in subsequent scheduled job runs.
  {% endtab %}

{% tab title="Prometheus" %}
Prune emits Prometheus metrics with the following labeled names.

All of the following are prefixed with **`application_com_inrupt_storage_prune_`**.

<table data-header-hidden><thead><tr><th width="345.00390625"></th><th></th></tr></thead><tbody><tr><td><strong><code>{type="retentionWindow"}</code></strong></td><td>The value of the <a href="../../services/service-pod-management/service-pod-storage#inrupt_storage_prune_retention_window"><strong><code>configured retention window</code></strong></a>.</td></tr><tr><td><strong><code>{type="prunableBatchSize"}</code></strong></td><td>The value of the <a href="../../services/service-pod-management/service-pod-storage#inrupt_storage_prune_prunable_batch_size"><strong><code>configured prunable batch size</code></strong></a>.</td></tr><tr><td><strong><code>{type="orphanBatchSize"}</code></strong></td><td>The value of the <a href="../../services/service-pod-management/service-pod-storage#inrupt_storage_prune_orphan_batch_size"><strong><code>configured orphan batch size</code></strong></a>.</td></tr><tr><td><strong><code>{type="resource", status="prunable"}</code></strong></td><td>The number of Solid resource metadata entries found to fall outside the retention window.</td></tr><tr><td><strong><code>{type="data", status="prunable"}</code></strong></td><td>The number of Solid resource data entries found to belong to metadata entries that fall outside the retention window.</td></tr><tr><td><strong><code>{type="data", status="listed"}</code></strong></td><td>The number of Solid resource data entries listed from S3.</td></tr><tr><td><strong><code>{type="data", status="orphan"}</code></strong></td><td>The number of Solid resource data entries (out of the total listed) found to lack corresponding listed metadata entries.</td></tr><tr><td><strong><code>{type="data", status="deleted"}</code></strong></td><td>The number of Solid resource data entries deleted from S3.</td></tr><tr><td><strong><code>Pruner_findPrunable</code></strong></td><td>Time (in milliseconds) taken to find prunable resource metadata (i.e., metadata associated with soft-deleted resources).</td></tr><tr><td><strong><code>Pruner_listData</code></strong></td><td>Time (in milliseconds) taken to list resource data.</td></tr><tr><td><strong><code>Pruner_findOrphans</code></strong></td><td>Time (in milliseconds) taken to identify orphan resource data.</td></tr><tr><td><strong><code>Pruner_deleteData</code></strong></td><td>Time (in milliseconds) taken to delete resource data.</td></tr><tr><td><strong><code>Pruner_pruneMetadata</code></strong></td><td>Time (in milliseconds) taken to delete resource metadata.</td></tr></tbody></table>
{% endtab %}

{% tab title="OpenTelemetry" %}
When OpenTelemetry is configured, the application emits a single span named **`prune`** with the following attributes.

<table data-header-hidden><thead><tr><th width="258.303955078125"></th><th></th></tr></thead><tbody><tr><td><strong><code>retentionWindowMilliseconds</code></strong></td><td>The value of the <a href="../../services/service-pod-management/service-pod-storage#inrupt_storage_prune_retention_window"><strong><code>configured retention window</code></strong></a>.</td></tr><tr><td><strong><code>prunableBatchSize</code></strong></td><td>The value of the <a href="../../services/service-pod-management/service-pod-storage#inrupt_storage_prune_prunable_batch_size"><strong><code>configured prunable batch size</code></strong></a>.</td></tr><tr><td><strong><code>orphanBatchSize</code></strong></td><td>The value of the <a href="../../services/service-pod-management/service-pod-storage#inrupt_storage_prune_orphan_batch_size"><strong><code>configured orphan batch size</code></strong></a>.</td></tr><tr><td><strong><code>resourceCount</code></strong></td><td>The number of Solid resource metadata entries found to fall outside the retention window.</td></tr><tr><td><strong><code>dataCount</code></strong></td><td>The number of Solid resource data entries found to belong to metadata entries that fall outside the retention window.</td></tr></tbody></table>
{% endtab %}
{% endtabs %}
