Purger Service

Added in version 2.4.0

The Purger Service was introduced in ESS 2.4.0. It can be used as part of a workflow for deleting user data and Pods from ESS. This helps organizations using ESS comply with legislative requirements such as GDPR/CCPA and the right to have personal data deleted.

Warning

The Purger Service will permanently delete a user’s data so the operator must take great care to restrict access to it.

Purging User Data

The Purger Service allows an operator to delete all of a user’s data. This service exposes HTTP endpoints which can be called by trusted operator agents.

Purging Process

The Purger Service orchestrates the process of sending purge requests to each of the services configured as purgeable. The process starts by validating the request and only continues if all services report that the request is valid. The purge process starts asynchronously and is only complete when all the purgeable services have completed their purge successfully.

The default list of services configured to be purged in a standard ESS deployment is shown below. For each service, there is a description of what would be purged and how it would validate the request.

Service

Purged Data

Validation

Access Grant Service

All credentials where the WebID is the subject are revoked and deleted.

Not required.

Authorization Service

All access controls applying to each Storage are deleted.

Not required.

Pod Provisioning Service

Metadata and resources within each Storage are deleted.

All Storages are hosted on this service and their data subject is the WebID.

Query Service

All index entries associated with each Storage are deleted.

Not required.

Solid OIDC Broker Service

Data related to this WebID such as client credentials is deleted.

The WebID must be issued by this service.

WebID Service

The WebID Profile Document is deleted.

The WebID must be hosted on this service.

Purger Service Endpoints

By default, the Purger Service runs from the following root URL:

https://purger.<ESS Domain>

The Purger Service consists of the following endpoints:

Endpoint

Description

/purge

Start an async purge of a user’s data and return a Location header of the purge status.

/purge/status/{id}

Check status of a purge. The purge is complete when the status in the response is COMPLETED.

Start Purge

The Purger Service provides an endpoint which starts an async process that will purge all user data associated with an agent from ESS.

Input

Endpoint

https://purger.{ESS Domain}/purge

Method

POST

Authorization

An access token for a trusted agent that is allowed to call the purge endpoints.

Content-Type

application/json

Payload

A JSON object containing information about the agent (WebID and Storages) to be purged. The WebID of the agent and a list of all the Storages where the agent is the data subject. All Storages must be supplied.

The purge request payload is represented as JSON.

{
  "webid": "<the WebID to purge>",
  "storages": ["<a Storage URI>", "<another Storage URI...>"]
}

The operator is responsible for generating this file. This involves determining a user’s WebID and identifying the URIs of the Storages associated with it. All Storages must be included so that none are orphaned once the WebID is deleted.

Purge request validation rules

The user identified by the WebID must be the data subject of every provided Storage.
Both the webid and the storages fields must be present.
The storages list may not be empty.
The values in webid and the storages list must be valid, absolute URIs.

Example request:

POST /purge HTTP/1.1
Host: purger.example.com
Authorization: Bearer xxxxxxxx
Content-Type: application/json

{
  "webid":"https://id.{ESS Domain}/alice",
  "storages":[
      "https://storage.{ESS Domain}/ead20d575gf8/",
      "https://storage.{ESS Domain}/b506cb130798/"
  ]
}

Output

If the purge request is valid it initiates the purge process asynchronously and the client will receive a 201 response with a Location header containing the status URI for this purge.

Example response:

HTTP/1.1 201 Created
Content-Length: 0
Location: https://purger.{ESS Domain}/purge/status/b8ca941a-7b18-4458-85e1-5e14cb9dbb0f

This endpoint is idempotent so a client can make the exact same purge request multiple times and it will behave in the same way as the initial purge request.

Check Purge Status

The Purger Service provides an endpoint that allows a client to determine when the purge has been completed.

Input

Endpoint

https://purger.{ESS Domain}/purge/status/{id}

Method

GET

Authorization

An access token for a trusted agent that is allowed to call the purge endpoints.

Example request:

GET /purge/status/xyz HTTP/1.1
Host: purger.example.com
Authorization: Bearer xxxxxxxx

Output

The response from this endpoint will be the status of the purge. The status field can be IN_PROGRESS, COMPLETED or FAILED.

IN_PROGRESS indicates the client can continue to poll this endpoint until it changes to COMPLETED or FAILED.
COMPLETED indicates that the purge task has successfully concluded across all services.
FAILED indicates that one or more services were unable to complete the purge. ESS will create log and audit entries to indicate the nature of the problem (e.g. a timeout or other unexpected error) for further investigation.

Example response:

HTTP/1.1 200 OK
Content-Type: application/json

{
  "id":"b8ca941a-7b18-4458-85e1-5e14cb9dbb0f",
  "webid":"https://id.{ESS Domain}/alice",
  "storages":[
      "https://storage.{ESS Domain}/ead20d575gf8/",
      "https://storage.{ESS Domain}/b506cb130798/"
  ]
  "status":"COMPLETED",
  "modified":"2025-02-27T10:36:21.036125360Z"
}

Backup Processing and Purging Data

The Purger Service does not require any changes to an established backup process.

Operators must retain a history of all purge requests submitted since the last backup so they can be replayed in the event of a backup restore operation. Failure to replay purge requests submitted after the last backup will result in data being restored into the live system, nullifying prior purge requests.

Recommendations

Perform a backup of all ESS data prior to submitting a purge request.
During a restore operation limit ingress to only allow access to the Purger endpoints until the purge

history has been successfully replayed and all https://purger.{ESS Domain}/purge/status/{id} return a status of COMPLETED.

Configuration

As part of the installation process, Inrupt provides base Kustomize overlays and associated files that require deployment-specific configuration inputs.

The following configuration options are available for the service and may be set as part of updating the inputs for your deployment. The Inrupt-provided base Kustomize overlays may be using updated configuration values that differ from the default values.

Required

INRUPT_PURGER_PHASES_{phase}_PRIORITY

This is used to determine the priority order of the purger phases that will be completed first. Phases with lower numbers will be purged before those with higher numbers. Each phase must have a unique priority and the service will not start if the config does not conform to this rule.

INRUPT_PURGER_PHASES_{phase}_SERVICES_{service}_ENDPOINT

The URL of the internal purgeable service for a phase. Multiple services can be included in a phase.

Multiple purgeable services can be configured using indexed properties. For example:

INRUPT_PURGER_PHASES_PHASE1_SERVICES_OPENID_ENDPOINT=https://ess-openid
INRUPT_PURGER_PHASES_PHASE1_SERVICES_WEBID_ENDPOINT=https://ess-webid
INRUPT_PURGER_PHASES_PHASE1_PRIORITY=1
INRUPT_PURGER_PHASES_PHASE1_DELAY=PT5M
INRUPT_PURGER_PHASES_PHASE2_SERVICES_PROVISION_ENDPOINT=https://ess-pod-provision
INRUPT_PURGER_PHASES_PHASE2_SERVICES_AUTHORIZATION_ENDPOINT=https://ess-authorization-acp
INRUPT_PURGER_PHASES_PHASE2_PRIORITY=2

etc.

Customizing the service purge sequence

The Purger Service ships with a default purging sequence applicable to a standard ESS deployment. Operators may need to override the default configuration to remove purgeable service configurations inapplicable to their particular deployments.

The default purging sequence is defined from environment variables passed to a kustomize configMapGenerator named purge-service-phases-config.

In order to override the default configuration, an operator needs to apply a new kustomization to their ESS deployment replacing this configMapGenerator. The replacement should point to an environment containing the appropriate variables to configure the purging sequence according to the services that are effectively present in their specific deployment.

---
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

configMapGenerator:
  - name: purge-service-phases-config
    behavior: replace
    envs:
      - purge-phases.env

The following is an example list of environment variables not including the WebID and OpenID services, which would be applicable to an ESS deployment configured to use external services for WebID and user identity management.

INRUPT_PURGER_PHASES_PHASE1_SERVICES_PROVISION_ENDPOINT=https://ess-pod-provision
INRUPT_PURGER_PHASES_PHASE1_SERVICES_AUTHORIZATION_ENDPOINT=https://ess-authorization-acp
INRUPT_PURGER_PHASES_PHASE1_PRIORITY=1
INRUPT_PURGER_PHASES_PHASE2_SERVICES_NOTIFICATION_ENDPOINT=https://ess-notification
INRUPT_PURGER_PHASES_PHASE2_PRIORITY=2
INRUPT_PURGER_PHASES_PHASE3_SERVICES_QUERY_ENDPOINT=https://ess-fragments-query
INRUPT_PURGER_PHASES_PHASE3_SERVICES_ACCESSGRANTS_ENDPOINT=https://ess-verifiable-credentials
INRUPT_PURGER_PHASES_PHASE3_PRIORITY=3

It is also possible to override single values with a configMapGenerator directly including literals:

---
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

configMapGenerator:
- name: purge-service-phases-config
  behavior: merge
  envs:
    - INRUPT_PURGER_PHASES_PHASE1_DELAY: PT5S

Optional Configuration

INRUPT_PURGER_BATCH_SIZE

Default: 100

This config is used internally for the batch size for cleaning up completed purge data and processing stale in-progress purges.

INRUPT_PURGER_CLEANUP_TASK_EVERY

Default: PT5H

This config is used internally for the scheduled task that cleans up completed purge data.

INRUPT_PURGER_IN_PROGRESS_TIMEOUT_SECONDS

Default: 120

This config is used internally for finding stale in-progress purges that have not been progressing and need to be processed.

INRUPT_PURGER_PROCESS_TASK_EVERY

Default: PT5M

This config is used internally for the scheduled task that processes stale in-progress purges.

INRUPT_PURGER_STATUS_RETENTION_WINDOW

Default: P2D

This config is used internally for the retention of completed purge data.

INRUPT_PURGER_NOTIFY_EVERY

Default: PT30S

Internal config setting the rate at which the purger will notify listening processes of status updates.

INRUPT_PURGER_PHASES_{phase}_DELAY

This optional config is used to introduce a time delay after a purge phase. For example; to allow access tokens from Openid to expire before moving to the next purge phase, this can be set to the access token life span.

INRUPT_PURGER_POLL_EVERY

Default: PT5S

Rate at which the purger will check the ongoing purge statuses.

INRUPT_PURGER_TIMEOUT

Default: PT180M

Timeout for an individual purge task. Beyond this time, the purge will be considered failed.

QUARKUS_LOG_LEVEL

Default: INFO

Logging level.

Authorization Configuration

Only trusted agents should be allowed to access the purge endpoints. Care must be taken by the operator when setting who has access to these endpoints.

The Purger Service communicates with the Authorization Service to determine who has access. The following static access configurations must be set on the ESS Authorization Service.

The following configuration properties require an operator-defined name to replace {policy} to group together related configuration properties. Any number of named sets of properties can be configured by an operator.

INRUPT_AUTHORIZATION_STATIC_{policy}_AGENT_ALLOW_LIST

A comma separated list of agents (WebIDs) that are allowed to access the protected resources for the named policy.

INRUPT_AUTHORIZATION_STATIC_{policy}_CLIENT_ALLOW_LIST

A comma separated list of clients an agent is allowed to use to access the protected resources for the named policy.

INRUPT_AUTHORIZATION_STATIC_{policy}_ISSUER_ALLOW_LIST

A comma separated list of issuers of access tokens that an agent is allowed to use to access the protected resources for the named policy.

INRUPT_AUTHORIZATION_STATIC_{policy}_MODES

A comma separated list of access modes (C,R,U,D) applied to the resources for the named policy.

This must be set to: C,R which give create and read access to the purge endpoints.

INRUPT_AUTHORIZATION_STATIC_{policy}_RESOURCES

A comma separated list of resources protected by the named policy.

This must be set to the purge and status endpoints: https://purger.<ESS Domain>/purge,https://purger.<ESS Domain>/purge/status/.+

Note: The regex pattern match for the status endpoint is needed as its URI will end with a UUID.

Multiple authorizations with different combinations. For example:

INRUPT_AUTHORIZATION_STATIC_P1_RESOURCES=https://purger.<ESS Domain>/purge,https://purger.<ESS Domain>/purge/status/.+
INRUPT_AUTHORIZATION_STATIC_P1_MODES=C,R
INRUPT_AUTHORIZATION_STATIC_P1_AGENT_ALLOW_LIST=https://id.example.com/operator
INRUPT_AUTHORIZATION_STATIC_P1_CLIENT_ALLOW_LIST=4adb0d86-b5f0-40ab-8349-153626374fa2
INRUPT_AUTHORIZATION_STATIC_P1_ISSUER_ALLOW_LIST=https://openid.example.com

INRUPT_AUTHORIZATION_STATIC_P2_RESOURCES=https://purger.<ESS Domain>/purge,https://purger.<ESS Domain>/purge/status/.+
INRUPT_AUTHORIZATION_STATIC_P2_MODES=C,R
INRUPT_AUTHORIZATION_STATIC_P2_AGENT_ALLOW_LIST=https://id.example.com/admin1,https://id.example.com/admin2
INRUPT_AUTHORIZATION_STATIC_P2_CLIENT_ALLOW_LIST=https://app.example.com/workflow
INRUPT_AUTHORIZATION_STATIC_P2_ISSUER_ALLOW_LIST=https://openid.example.com

Kafka Configuration

INRUPT_KAFKA_AUDITV1EVENTSENCRYPTED_CIPHER_PASSWORD

The strong cipher key to use when running auditing with encrypted messages.

INRUPT_KAFKA_AUDITV1EVENTSPRODUCERENCRYPTED_CIPHER_PASSWORD

The strong cipher key to use when running auditing with encrypted messages over the auditv1eventsproducerencrypted topic.

KAFKA_BOOTSTRAP_SERVERS

Default: localhost:9092

Comma-delimited list of Kafka broker servers for use by ESS services, including this service.

Setting KAFKA_BOOTSTRAP_SERVERS configures ESS to use the same Kafka instance(s) for all its Kafka message channels (e.g., solidresource and auditv1out message channels). This service uses the auditv1out message channel.

Service Configuration Logging

Note Whitespaces are preserved when parsing comma-delimited lists (i.e., the parsed string values are not trimmed). For example, when parsed, "value1, value2,value3 " results in "value1" , " value2" , "value3 " .

INRUPT_LOGGING_CONFIGURATION_PREFIX_ALLOW

Default: inrupt

A comma-separated list of configuration property prefixes (case-sensitive) that determine which configurations are logged:

If the list is empty, NO configuration property is logged.
If a configuration property starts with a listed prefix (case-sensitive), the configuration property and its value are logged unless the configuration also matches a prefix in INRUPT_LOGGING_CONFIGURATION_PREFIX_DENY (which acts as a filter on INRUPT_LOGGING_CONFIGURATION_PREFIX_ALLOW list).
As such, if the configuration matches prefix in both INRUPT_LOGGING_CONFIGURATION_PREFIX_ALLOW and INRUPT_LOGGING_CONFIGURATION_PREFIX_DENY, the INRUPT_LOGGING_CONFIGURATION_PREFIX_DENY takes precedence and the configuration is not logged. For example, if inrupt. is an allow prefix, but inrupt.kafka. is a deny prefix, all configurations that start with inrupt.kafka. are excluded from the logs.

When specifying the prefixes, you can specify the prefixes using one of two formats:

using dot notation (e.g., inrupt.foobar.), or
using the MicroProfile Config environmental variables conversion value (e.g., INRUPT_FOOBAR_).

Use the same format for both INRUPT_LOGGING_CONFIGURATION_PREFIX_ALLOW and INRUPT_LOGGING_CONFIGURATION_PREFIX_DENY.

For example, if you change the format of INRUPT_LOGGING_CONFIGURATION_PREFIX_ALLOW, change the format of INRUPT_LOGGING_CONFIGURATION_PREFIX_DENY as well.

Tip

To avoid allowing more than desired configurations, specify as much of the prefix as possible. If the prefix specifies the complete prefix term, include the term delineator. For example:

If using dot-notation, if you want to match configuration properties of the form foobar.<xxxx>..., specify foobar. (including the dot **.) instead of, for example, foo or foobar.
If using converted form, if you want to match configuration properties of the form FOOBAR_<XXXX>..., specify FOOBAR_ (including the underscore _) instead of, for example, FOO or FOOBAR.

INRUPT_LOGGING_CONFIGURATION_PREFIX_DENY

Default: inrupt.kafka

A comma-separated list of configuration name prefixes (case-sensitive) that determines which configurations (that would otherwise match the INRUPT_LOGGING_CONFIGURATION_PREFIX_ALLOW) are not logged. That is, INRUPT_LOGGING_CONFIGURATION_PREFIX_DENY acts as a filter on INRUPT_LOGGING_CONFIGURATION_PREFIX_ALLOW. For example:

If foobar. is an allowed prefix, to suppress foobar.private.<anything>, you can specify foobar.private. to the deny list.
If foobar. is not an allowed prefix, no property starting with foobar. is logged. As such, you do not need to specify foobar.private to the deny list.

When specifying the prefixes, you can specify the prefixes using one of two formats:

using dot notation (e.g., inrupt.foobar.), or
using the MicroProfile Config environmental variables conversion value (e.g., INRUPT_FOOBAR_).

Use the same format for both INRUPT_LOGGING_CONFIGURATION_PREFIX_ALLOW and INRUPT_LOGGING_CONFIGURATION_PREFIX_DENY.

For example, if you change the format of INRUPT_LOGGING_CONFIGURATION_PREFIX_ALLOW, change the format of INRUPT_LOGGING_CONFIGURATION_PREFIX_DENY as well.

Log Redaction

INRUPT_LOGGING_REDACTION_NAME_ACTION

Default: REPLACE

Type of the redaction to perform. Supported values are:

Action

Description

REPLACE

Default. Replaces the matching text with a specified replacement.

PLAIN

Leaves the matching field unprocessed. Only available if the redaction target is a field (i.e., INRUPT_LOGGING_REDACTION_{NAME}_FIELD).

DROP

Suppresses the matching field. Only available if the redaction target is a field (i.e., INRUPT_LOGGING_REDACTION_{NAME}_FIELD).

PRIORITIZE

Changes the log level of the matching message.

SHA256

Replaces the matching text with its hash.

If the action is REPLACE (default), see also INRUPT_LOGGING_REDACTION_{NAME}_REPLACEMENT.
If the action is to PRIORITIZE, see also INRUPT_LOGGING_REDACTION_{NAME}_LEVEL.