# Purger Application

{% hint style="success" %}
Added in version 2.3.0
{% endhint %}

The Purger CLI Application is introduced in ESS 2.3.0. It can be used as part of a workflow for deleting user data from ESS. This enables organizations using ESS to comply with legislative requirements, such as GDPR/CCPA and the right to have personal data deleted.

{% hint style="danger" %}
**Warning**

The Purger application will permanently delete a user’s data so the operator must take great care to restrict access to the workflow which uses it.
{% endhint %}

### Purging User Data

ESS’ Purger application allows an operator to delete all of a user’s data. This service receives input from files provided by the operator; it does not expose an HTTP API.

### Purging Process

The Purger application orchestrates the process of sending Purge Requests to each of the services configured as **`purgeable`**. The process starts by validating the request and only progresses if all services report that the request is valid. The application waits until all the services have completed the purge process before responding with the relevant exit code.

The default list of services configured to be purged in a standard ESS deployment is shown below. For each service, there is a description of what would be purged and how it would validate the request.

| Service                                                                                                           | Purged Data                                                             | Validation                                                                   |
| ----------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------- | ---------------------------------------------------------------------------- |
| [Solid OIDC Broker Service](https://docs.inrupt.com/ess/2.5/services/service-oidc)                                | Data related to this WebID such as client credentials is deleted.       | The WebID must be issued by this service.                                    |
| [WebID Service](https://docs.inrupt.com/ess/2.5/services/service-webid)                                           | The WebID Profile Document is deleted.                                  | The WebID must be hosted on this service.                                    |
| [Pod Provisioning Service](https://docs.inrupt.com/ess/2.5/services/service-pod-management/service-pod-provision) | Metadata and resources within each storage are deleted.                 | All storages are hosted on this service and their data subject is the WebID. |
| [Authorization Service](https://docs.inrupt.com/ess/2.5/services/service-authorization)                           | All access controls applying to each storage are deleted.               | Not required.                                                                |
| [Query Service](https://docs.inrupt.com/ess/2.5/services/service-query)                                           | All index entries associated with each storage are deleted.             | Not required.                                                                |
| [Access Grant Service](https://docs.inrupt.com/ess/2.5/services/service-access-grant)                             | All credentials where the WebID is the subject are revoked and deleted. | Not required.                                                                |

### Output

<table><thead><tr><th width="119">Output</th><th>Description</th></tr></thead><tbody><tr><td><strong>Exit code</strong></td><td><strong><code>0</code></strong><br>All the purges listed in the input file completed successfully.<br><br><strong><code>1</code></strong><br>At least one of the listed purges could not be completed (invalid requests, runtime error…).</td></tr><tr><td><strong>Logs</strong></td><td>All the logs are stored in the <strong><code>com.inrupt.purger</code></strong> package.</td></tr><tr><td><strong>Audit</strong></td><td>An audit event is fired when each Purge Request starts being processed and when its processing completes, either successfully or with an error.</td></tr></tbody></table>

Purgeable services have dedicated configuration entries to control some of their purge behaviors. Please refer to each purgeable service configuration documentation for details.

## Setting up and running a job

The Purger application can be run as a Kubernetes Job or CronJob.

{% hint style="warning" %}
**Important**

No data will be deleted until a job running the Purger application is part of an ESS cluster deployment. Making the Purge Request available to the cluster is necessary but not sufficient requirement for the purge to take place.
{% endhint %}

Here is an example of what the Kubernetes Job definition file might look like:

```http
---
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

resources:
  - ../../../release/ess/deployment/kubernetes/bases/ess-purger-job/

patches:
  - target:
      kind: Job
      name: ess-purger-job
    patch: |-
      - op: replace
        path: /metadata/name
        # Next line should define a unique name for the job
        value: ess-purger-job-name

secretGenerator:
  - name: purge-requests
    behavior: replace
    files:
      - purge-requests.jsonl
```

This assumes a **`purge-requests.jsonl`** file is available to the job. See [Purge Request Format](#purge-request-format) for an example format.

In addition, the following must be set in the parent component:

```http
apiVersion: kustomize.config.k8s.io/v1alpha1
kind: Component

components:
  - ../../release/ess/deployment/kubernetes/bases/ess-purger-job/replacements/

resources:
  - purge-data/
```

### Purge Request Format

A Purge Request is represented as a line in a [JSONL](https://jsonlines.org/) file. Each line should be formatted as follows:

```json
{ "webid": "<the WebID to purge>", "storages": ["<a storage URI>", "<another storage URI...>"] }
```

The operator is responsible for generating this file. This involves determining a user’s WebID and identifying the URIs of the storages associated with it. All storages must be included so that none are orphaned once the WebID is deleted.

Purge Request validation rules:

* The user identified by the WebID must be the data subject of every provided storage.
* Both the **`webid`** and the **`storages`** fields must be present.
* The **`storages`** list may be empty.
* The values in **`webid`** and the **`storages`** list must be valid, absolute URIs.

The file may contain multiple Purge Requests.

If any Purge Request in the input file is malformed or invalid, none of the purges listed in the file are attempted and the purger returns an exit code of 1.

### Backup Processing and Purging Data

The ESS Purger does not require any changes to an established backup process. The service is idempotent, allowing purge requests to be submitted repeatedly for the same set of WebID(s) and Storage(s), even if a partial restore of ESS data has been performed.

Operators must retain a history of all purge requests submitted since the last backup so they can be replayed in the event of a backup restore operation. Failure to replay purge requests submitted after the last backup will result in data being restored into the live system, nullifying prior purge requests.

**Recommendation**: Perform a backup of all ESS data prior to submitting a purge request.

**Recommendation**: During a restore operation limit ingress to only allow access to the Purger endpoints until the purge history has been successfully replayed and all **`https://purger.{ESS Domain}/purge/status/{id}`** return a status of **`COMPLETED`**.

## Configuration

As part of the [installation process](https://docs.inrupt.com/ess/2.5/installation), Inrupt provides base Kustomize overlays and associated files that require deployment-specific configuration inputs.

The following configuration options are available for the service and may be set as part of [updating the inputs for your deployment](https://docs.inrupt.com/ess/installation#step-1-initialize-the-installation-directory). The Inrupt-provided base Kustomize overlays may be using updated configuration values that differ from the default values.

{% hint style="info" %}
**Note** Whitespaces are **preserved** when parsing comma-delimited lists (i.e., the parsed string values are not trimmed). For example, when parsed, **`"value1, value2,value3 "`** results in **`"value1"`** , **`" value2"`** , **`"value3 "`**.
{% endhint %}

### Required

#### INRUPT\_PURGER\_INPUT\_FILE\_PATH

Specifies the path of the input file where the Purge Requests are described.

#### INRUPT\_PURGER\_PHASES\_{phase}\_PRIORITY

This is used to determine the priority order of the purger phases that will be completed first. Phases with lower numbers will be purged before those with higher numbers. Each phase must have a unique priority and the service will not start if the config does not conform to this rule.

#### INRUPT\_PURGER\_PHASES\_{phase}*SERVICES*{service}\_ENDPOINT

The URL of the internal purgeable service for a phase. Multiple services can be included in a phase.

Multiple purgeable services can be configured using indexed properties. For example:

```
INRUPT_PURGER_PHASES_PHASE1_SERVICES_OPENID_ENDPOINT=https://ess-openid
INRUPT_PURGER_PHASES_PHASE1_SERVICES_WEBID_ENDPOINT=https://ess-webid
INRUPT_PURGER_PHASES_PHASE1_PRIORITY=1
INRUPT_PURGER_PHASES_PHASE1_DELAY=PT5M
INRUPT_PURGER_PHASES_PHASE2_SERVICES_PROVISION_ENDPOINT=https://ess-pod-provision
INRUPT_PURGER_PHASES_PHASE2_SERVICES_AUTHORIZATION_ENDPOINT=https://ess-authorization-acp
INRUPT_PURGER_PHASES_PHASE2_PRIORITY=2

etc.
```

**Customizing the job purge sequence**

The Purger application ships with a default purging sequence applicable to a standard ESS deployment. Operators may need to override the default configuration to remove purgeable service configurations inapplicable to their particular deployments.

The default purging sequence is defined from environment variables passed to a kustomize **`configMapGenerator`** named **`purge-job-phases-config`**.

In order to override the default configuration, an operator needs to apply a new kustomization to their ESS deployment replacing this **`configMapGenerator`**. The replacement should point to an environment containing the appropriate variables to configure the purging sequence according to the services that are effectively present in their specific deployment.

```

---
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

configMapGenerator:
  - name: purge-job-phases-config
    behavior: replace
    envs:
      - purge-phases.env
```

The following is an example list of environment variables not including the WebID and OpenID services, which would be applicable to an ESS deployment configured to use external services for WebID and user identity management.

```
INRUPT_PURGER_PHASES_PHASE1_SERVICES_PROVISION_ENDPOINT=https://ess-pod-provision
INRUPT_PURGER_PHASES_PHASE1_SERVICES_AUTHORIZATION_ENDPOINT=https://ess-authorization-acp
INRUPT_PURGER_PHASES_PHASE1_PRIORITY=1
INRUPT_PURGER_PHASES_PHASE2_SERVICES_NOTIFICATION_ENDPOINT=https://ess-notification
INRUPT_PURGER_PHASES_PHASE2_PRIORITY=2
INRUPT_PURGER_PHASES_PHASE3_SERVICES_QUERY_ENDPOINT=https://ess-fragments-query
INRUPT_PURGER_PHASES_PHASE3_SERVICES_ACCESSGRANTS_ENDPOINT=https://ess-verifiable-credentials
INRUPT_PURGER_PHASES_PHASE3_PRIORITY=3
```

It is also possible to override single values with a **`configMapGenerator`** directly including literals:

```

---
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

configMapGenerator:
- name: purge-job-phases-config
  behavior: merge
  envs:
    - INRUPT_PURGER_PHASES_PHASE1_DELAY: PT5S
```

### Optional Configuration

#### INRUPT\_PURGER\_MAX\_CONCURRENT\_REQUESTS

Default: **`30`**

Max count of Purge Requests being processed at the same time. The Purge Requests will be batched by this size.

#### INRUPT\_PURGER\_NOTIFY\_EVERY

Default: **`PT30S`**

Internal config setting the rate at which the purger will notify listening processes of status updates.

#### INRUPT\_PURGER\_PHASES\_{phase}\_DELAY

This optional config is used to introduce a time delay after a purge phase. For example; to allow access tokens from Openid to expire before moving to the next purge phase, this can be set to the access token life span.

#### INRUPT\_PURGER\_POLL\_EVERY

Default: **`PT5S`**

Rate at which the purger will check the ongoing purge statuses.

#### INRUPT\_PURGER\_TIMEOUT

Default: **`PT180M`**

Timeout for an individual purge task. Beyond this time, the purge will be considered failed.

#### QUARKUS\_LOG\_LEVEL

Default: **`INFO`**

Logging level.

### Kafka Configuration

#### INRUPT\_KAFKA\_AUDITV1EVENTSENCRYPTED\_CIPHER\_PASSWORD

The strong cipher key to use when running auditing with encrypted messages.

#### INRUPT\_KAFKA\_AUDITV1EVENTSPRODUCERENCRYPTED\_CIPHER\_PASSWORD

The strong cipher key to use when running auditing with encrypted messages over the **`auditv1eventsproducerencrypted`** topic.

#### KAFKA\_BOOTSTRAP\_SERVERS

Default: **`localhost:9092`**

Comma-delimited list of Kafka broker servers for use by ESS services, including this service.

Setting [**`KAFKA_BOOTSTRAP_SERVERS`**](#kafka_bootstrap_servers) configures ESS to use the same Kafka instance(s) for all its Kafka [message channels](https://quarkus.io/guides/kafka#kafka-configuration) (e.g., **`solidresource`** and **`auditv1out`** message channels). This service uses the **`auditv1out`** message channel.

{% hint style="info" %}
**Note** Inrupt-provided overlays default to using [**`KAFKA_BOOTSTRAP_SERVERS`**](#kafka_bootstrap_servers). To use a different Kafka instance for the **`auditv1out`** channel, use specific [message channel](https://quarkus.io/guides/kafka#kafka-configuration) configuration.
{% endhint %}

See also ESS’ Kafka Configuration.

### Service Configuration Logging

#### INRUPT\_LOGGING\_CONFIGURATION\_PREFIX\_ALLOW

Default: inrupt A comma-separated list of configuration property prefixes (case-sensitive) that determine which configurations are logged:

* If the list is empty, **NO** configuration property is logged.
* If a configuration property starts with a listed prefix (**case-sensitive**), the configuration property and its value are logged **unless** the configuration also matches a prefix in [**`INRUPT_LOGGING_CONFIGURATION_PREFIX_DENY`**](#inrupt_logging_configuration_prefix_deny) (which acts as a filter on [**`INRUPT_LOGGING_CONFIGURATION_PREFIX_ALLOW`**](#inrupt_logging_configuration_prefix_allow) list).

  As such, if the configuration matches prefix in both [**`INRUPT_LOGGING_CONFIGURATION_PREFIX_ALLOW`**](#inrupt_logging_configuration_prefix_allow) and [**`INRUPT_LOGGING_CONFIGURATION_PREFIX_DENY`**](#inrupt_logging_configuration_prefix_deny), the [**`INRUPT_LOGGING_CONFIGURATION_PREFIX_DENY`**](#inrupt_logging_configuration_prefix_deny) takes precedence and the configuration is not logged. For example, if **`inrupt.`** is an allow prefix, but **`inrupt.kafka.`** is a deny prefix, all configurations that start with **`inrupt.kafka.`** are excluded from the logs.

When specifying the prefixes, you can specify the prefixes using one of two formats:

* using dot notation (e.g., **`inrupt.foobar.`**), or
* using the [MicroProfile Config environmental variables conversion value](https://quarkus.io/guides/config-reference#environment-variables) (e.g., **`INRUPT_FOOBAR_`**).

{% hint style="danger" %}
**Warning**

Use the same format for **both** [**`INRUPT_LOGGING_CONFIGURATION_PREFIX_ALLOW`**](#inrupt_logging_configuration_prefix_allow) and [**`INRUPT_LOGGING_CONFIGURATION_PREFIX_DENY`**](#inrupt_logging_configuration_prefix_deny).

For example, if you change the format of [**`INRUPT_LOGGING_CONFIGURATION_PREFIX_ALLOW`**](#inrupt_logging_configuration_prefix_allow), change the format of [**`INRUPT_LOGGING_CONFIGURATION_PREFIX_DENY`**](#inrupt_logging_configuration_prefix_deny) as well.
{% endhint %}

{% hint style="info" %}
**Tip**

To avoid allowing more than desired configurations, specify as much of the prefix as possible. If the prefix specifies the complete prefix term, include the term delineator. For example:

* If using dot-notation, if you want to match configuration properties of the form **`foobar.<xxxx>...`**, specify **`foobar.`** (including the dot **`**.`**) instead of, for example, **`foo`** or **`foobar`**.
* If using converted form, if you want to match configuration properties of the form **`FOOBAR_<XXXX>...`**, specify **`FOOBAR_`** (including the underscore **`_`**) instead of, for example, **`FOO`** or **`FOOBAR`**.
  {% endhint %}

#### INRUPT\_LOGGING\_CONFIGURATION\_PREFIX\_DENY

Default: inrupt.kafka A comma-separated list of configuration name prefixes (case-sensitive) that determines which configurations (that would otherwise match the [**`INRUPT_LOGGING_CONFIGURATION_PREFIX_ALLOW`**](#inrupt_logging_configuration_prefix_allow)) are not logged. That is, [**`INRUPT_LOGGING_CONFIGURATION_PREFIX_DENY`**](#inrupt_logging_configuration_prefix_deny) acts as a filter on [**`INRUPT_LOGGING_CONFIGURATION_PREFIX_ALLOW`**](#inrupt_logging_configuration_prefix_allow). For example:

* If **`foobar.`** is an allowed prefix, to suppress **`foobar.private.<anything>`**, you can specify **`foobar.private.`** to the deny list.
* If **`foobar.`** is **not** an allowed prefix, no property starting with **`foobar.`** is logged. As such, you do not need to specify **`foobar.private`** to the deny list.

When specifying the prefixes, you can specify the prefixes using one of two formats:

* using dot notation (e.g., **`inrupt.foobar.`**), or
* using the [MicroProfile Config environmental variables conversion value](https://quarkus.io/guides/config-reference#environment-variables) (e.g., **`INRUPT_FOOBAR_`**).

{% hint style="danger" %}
**Warning**

Use the same format for **both** [**`INRUPT_LOGGING_CONFIGURATION_PREFIX_ALLOW`**](#inrupt_logging_configuration_prefix_allow) and [**`INRUPT_LOGGING_CONFIGURATION_PREFIX_DENY`**](#inrupt_logging_configuration_prefix_deny).

For example, if you change the format of [**`INRUPT_LOGGING_CONFIGURATION_PREFIX_ALLOW`**](#inrupt_logging_configuration_prefix_allow), change the format of [**`INRUPT_LOGGING_CONFIGURATION_PREFIX_DENY`**](#inrupt_logging_configuration_prefix_deny) as well.
{% endhint %}

### Log Redaction

#### INRUPT\_LOGGING\_REDACTION\_NAME\_ACTION

Default: **`REPLACE`** Type of the redaction to perform. Supported values are:

<table><thead><tr><th width="133.19744873046875">Action</th><th>Description</th></tr></thead><tbody><tr><td><strong><code>REPLACE</code></strong></td><td>Default. Replaces the matching text with a specified replacement.</td></tr><tr><td><strong><code>PLAIN</code></strong></td><td>Leaves the matching field unprocessed. Only available if the redaction target is a field (i.e., <strong><code>INRUPT_LOGGING_REDACTION_{NAME}_FIELD</code></strong>).</td></tr><tr><td><strong><code>DROP</code></strong></td><td>Suppresses the matching field. Only available if the redaction target is a field (i.e., <strong><code>INRUPT_LOGGING_REDACTION_{NAME}_FIELD</code></strong>).</td></tr><tr><td><strong><code>PRIORITIZE</code></strong></td><td>Changes the log level of the matching message.</td></tr><tr><td><strong><code>SHA256</code></strong></td><td>Replaces the matching text with its hash.</td></tr></tbody></table>

* If the action is **`REPLACE`** (*default*), see also **`INRUPT_LOGGING_REDACTION_{NAME}_REPLACEMENT`**.
* If the action is to **`PRIORITIZE`**, see also **`INRUPT_LOGGING_REDACTION_{NAME}_LEVEL`**.

For more information on log redaction, see [Logging Redaction](https://docs.inrupt.com/ess/2.5/administration/logging/logging-redaction).

#### INRUPT\_LOGGING\_REDACTION\_NAME\_ENABLED

Default: **`true`**

A boolean that determines whether the redaction configurations with the specified **`INRUPT_LOGGING_REDACTION_{NAME}_`** prefix is enabled.

For more information on log redaction, see [Logging Redaction](https://docs.inrupt.com/ess/2.5/administration/logging/logging-redaction).

#### INRUPT\_LOGGING\_REDACTION\_NAME\_EXCEPTION

Fully qualified name of the exception class to match in the log messages (includes inner exception). Configure to target an exception message class.

For more information on log redaction, see [Logging Redaction](https://docs.inrupt.com/ess/2.5/administration/logging/logging-redaction).

#### INRUPT\_LOGGING\_REDACTION\_NAME\_FIELD

Exact name of the field to match in the log messages. Configure to target a specific log message field for redaction.

For more information on log redaction, see [Logging Redaction](https://docs.inrupt.com/ess/2.5/administration/logging/logging-redaction).

#### INRUPT\_LOGGING\_REDACTION\_NAME\_LEVEL

A new log level to use for the log message if the **`INRUPT_LOGGING_REDACTION_{NAME}_ACTION`** is **`PRIORITIZE`**.

#### INRUPT\_LOGGING\_REDACTION\_NAME\_PATTERN

A regex (see [Java regex pattern](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/regex/Pattern.html#sum)) to match in the log messages. Configure to target log message text that matches a specified pattern.

For more information on log redaction, see [Logging Redaction](https://docs.inrupt.com/ess/2.5/administration/logging/logging-redaction).

#### INRUPT\_LOGGING\_REDACTION\_NAME\_REPLACEMENT

Replacement text to use if the **`INRUPT_LOGGING_REDACTION_{NAME}_ACTION`** is **`REPLACE`**.

If unspecified, defaults to **`[REDACTED]`**.

For more information on log redaction, see [Logging Redaction](https://docs.inrupt.com/ess/2.5/administration/logging/logging-redaction).

## Additional Information

See also [Quarkus Configuration Options](https://quarkus.io/guides/all-config).
