Purger Application#

Added in version 2.3.0.

The Purger CLI Application is introduced in ESS 2.3.0. It can be used as part of a workflow for deleting user data from ESS. This enables organizations using ESS to comply with legislative requirements, such as GDPR/CCPA and the right to have personal data deleted.

Warning

The Purger application will permanently delete a user’s data so the operator must take great care to restrict access to the workflow which uses it.

Purging User Data#

ESS’ Purger application allows an operator to delete all or part of a user’s data. This service receives input from files provided by the operator; it does not expose an HTTP API.

Purging Process#

The Purger application orchestrates the process of sending Purge Requests to each of the services configured as purgeable. The process starts by validating the request and only progresses if all services report that the request is valid. The application waits until all the services have completed the purge process before responding with the relevant exit code.

The default list of services configured to be purged in a standard ESS deployment is shown below. For each service, there is a description of what would be purged and how it would validate the request.

Service

Purged Data

Validation

Solid OIDC Broker Service

Data related to this WebID such as client credentials is deleted.

The WebID must be issued by this service.

WebID Service

The WebID Profile Document is deleted.

The WebID must be hosted on this service.

Pod Provisioning Service

Metadata and resources within each storage are deleted.

All storages are hosted on this service and their data subject is the WebID.

Authorization Service

All access controls applying to each storage are deleted.

Not required.

Query Service

All index entries associated with each storage are deleted.

Not required.

Access Grant Service

All credentials where the WebID is the subject are revoked and deleted.

Not required.

Output#

The Purger runs as a job in the ESS cluster. It has multiple output streams:

Output

Description

Exit code

Code

Description

0

All the purges listed in the input file completed successfully.

1

At least one of the listed purges could not be completed (invalid requests, runtime error…).

Logs

All the logs are stored in the com.inrupt.purger package.

Audit

An audit event is fired when each Purge Request starts being processed and when its processing completes, either successfully or with an error.

Purgeable services have dedicated configuration entries to control some of their purge behaviors. Please refer to each purgeable service configuration documentation for details.

Setting up and running a job#

The Purger application can be run as a Kubernetes Job or CronJob.

Important

No data will be deleted until a job running the Purger application is part of an ESS cluster deployment. Making the Purge Request available to the cluster is necessary but not sufficient requirement for the purge to take place.

Here is an example of what the Kubernetes Job definition file might look like:

---
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

resources:
  - ../../../release/ess/deployment/kubernetes/bases/ess-purger/

patches:
  - target:
      kind: Job
      name: ess-purger-job
    patch: |-
      - op: replace
        path: /metadata/name
        # Next line should define a unique name for the job
        value: ess-purger-job-name

secretGenerator:
  - name: purge-requests
    behavior: replace
    files:
      - purge-requests.jsonl

This assumes a purge-requests.jsonl file is available to the job. See Purge Request Format for an example format.

In addition, the following must be set in the parent component:

apiVersion: kustomize.config.k8s.io/v1alpha1
kind: Component

components:
  - ../../release/ess/deployment/kubernetes/bases/ess-purger/replacements/

resources:
  - purge-data/

Purge Request Format#

A Purge Request is represented as a line in a JSONL file. Each line should be formatted as follows:

{ "webid": "<the WebID to purge>", "storages": ["<a storage URI>", "<another storage URI...>"] }

The operator is responsible for generating this file. This involves determining a user’s WebID and identifying the URIs of the storages associated with it. All storages must be included so that none are orphaned once the WebID is deleted.

Purge Request validation rules:

  • The user identified by the WebID must be the data subject of every provided storage.

  • Both the webid and the storages fields must be present.

  • The storages list may be empty.

  • The values in webid and the storages list must be valid, absolute URIs.

The file may contain multiple Purge Requests.

If any Purge Request in the input file is malformed or invalid, none of the purges listed in the file are attempted and the purger returns an exit code of 1.

Configuration#

As part of the installation process, Inrupt provides base Kustomize overlays and associated files that require deployment-specific configuration inputs.

The following configuration options are available for the service and may be set as part of updating the inputs for your deployment. The Inrupt-provided base Kustomize overlays may be using updated configuration values that differ from the default values.

Required#

INRUPT_PURGER_INPUT_FILE_PATH#

Specifies the path of the input file where the Purge Requests are described.

Multiple purgeable services can be configured using indexed properties. For example:

INRUPT_PURGER_PURGEABLE_OPENID_ENDPOINT=https://ess-openid/purge
INRUPT_PURGER_PURGEABLE_OPENID_PRIORITY=1

INRUPT_PURGER_PURGEABLE_WEBID_ENDPOINT=https://ess-webid/purge
INRUPT_PURGER_PURGEABLE_WEBID_PRIORITY=1

INRUPT_PURGER_PURGEABLE_STORAGE_ENDPOINT=https://ess-pod-provision/purge
INRUPT_PURGER_PURGEABLE_STORAGE_PRIORITY=2

etc.
INRUPT_PURGER_PURGEABLE_{index}_ENDPOINT#

The URL of purge endpoint for a service. The index is an alphanumeric label which ties together the endpoint and priority config items.

INRUPT_PURGER_PURGEABLE_{index}_PRIORITY#

This is used to determine the order in which the purger will submit purge requests. Services with lower numbers will be purged before those with higher numbers. There will be no predetermined order for services with the same priority. The index is an alphanumeric label which ties together the endpoint and priority config items.

The Purger application ships with a default purging sequence applicable to a default ESS deployment. Operators may need to override the default configuration to remove service purge configurations inapplicable to their particular deployments.

Kafka Configuration#

INRUPT_KAFKA_AUDITV1EVENTSENCRYPTED_CIPHER_PASSWORD#

The strong cipher key to use when running auditing with encrypted messages.

Added in version 2.1.5.

INRUPT_KAFKA_AUDITV1EVENTSPRODUCERENCRYPTED_CIPHER_PASSWORD#

The strong cipher key to use when running auditing with encrypted messages over the auditv1eventsproducerencrypted topic.

Added in version 2.2.0.

KAFKA_BOOTSTRAP_SERVERS#

Default: localhost:9092

Comma-delimited list of Kafka broker servers for use by ESS services, including this service.

Setting KAFKA_BOOTSTRAP_SERVERS configures ESS to use the same Kafka instance(s) for all its Kafka message channels (e.g., solidresource and auditv1out message channels). This service uses the auditv1out message channel.

Note

Inrupt-provided overlays default to using KAFKA_BOOTSTRAP_SERVERS.

To use a different Kafka instance for the auditv1out channel, use specific message channel configuration.

See also ESS’ Kafka Configuration.

Optional Configuration#

INRUPT_PURGER_INTER_PRIORITY_LEVEL_DELAY#

Default: PT5M

Set a delay to be applied after running a purge request against one group of services at the same priority level, before running a group at the next priority level.

INRUPT_PURGER_MAX_CONCURRENT_REQUESTS#

Default: 30

Max count of Purge Requests being processed at the same time. The Purge Requests will be batched by this size.

INRUPT_PURGER_POLL_EVERY#

Default: PT5S

Rate at which the purger will check the ongoing purge statuses.

INRUPT_PURGER_TIMEOUT#

Default: PT180M

Timeout for an individual purge task. Beyond this time, the purge will be considered failed.

QUARKUS_LOG_LEVEL#

Default: INFO

Logging level.

Service Configuration Logging#

INRUPT_LOGGING_CONFIGURATION_PREFIX_ALLOW#

Default: inrupt

A comma-separated list of configuration property prefixes (case-sensitive) that determine which configurations are logged:

When specifying the prefixes, you can specify the prefixes using one of two formats:

Warning

Use the same format for both INRUPT_LOGGING_CONFIGURATION_PREFIX_ALLOW and INRUPT_LOGGING_CONFIGURATION_PREFIX_DENY.

For example, if you change the format of INRUPT_LOGGING_CONFIGURATION_PREFIX_ALLOW, change the format of INRUPT_LOGGING_CONFIGURATION_PREFIX_DENY as well.

Tip

To avoid allowing more than desired configurations, specify as much of the prefix as possible. If the prefix specifies the complete prefix term, include the term delineator. For example:

  • If using dot-notation, if you want to match configuration properties of the form foobar.<xxxx>..., specify foobar. (including the dot .) instead of, for example, foo or foobar.

  • If using converted form, if you want to match configuration properties of the form FOOBAR_<XXXX>..., specify FOOBAR_ (including the underscore _) instead of, for example, FOO or FOOBAR.

Added in version 2.2.0.

INRUPT_LOGGING_CONFIGURATION_PREFIX_DENY#

Default: inrupt.kafka

A comma-separated list of configuration name prefixes (case-sensitive) that determines which configurations (that would otherwise match the INRUPT_LOGGING_CONFIGURATION_PREFIX_ALLOW) are not logged. That is, INRUPT_LOGGING_CONFIGURATION_PREFIX_DENY acts as a filter on INRUPT_LOGGING_CONFIGURATION_PREFIX_ALLOW. For example:

  • If foobar. is an allowed prefix, to suppress foobar.private.<anything>, you can specify foobar.private. to the deny list.

  • If foobar. is not an allowed prefix, no property starting with foobar. is logged. As such, you do not need to specify foobar.private to the deny list.

When specifying the prefixes, you can specify the prefixes using one of two formats:

Warning

Use the same format for both INRUPT_LOGGING_CONFIGURATION_PREFIX_ALLOW and INRUPT_LOGGING_CONFIGURATION_PREFIX_DENY.

For example, if you change the format of INRUPT_LOGGING_CONFIGURATION_PREFIX_ALLOW, change the format of INRUPT_LOGGING_CONFIGURATION_PREFIX_DENY as well.

Added in version 2.2.0.

Log Redaction#

INRUPT_LOGGING_REDACTION_NAME_ACTION#

Default: REPLACE

Type of the redaction to perform. Supported values are:

Action

Description

REPLACE

Default. Replaces the matching text with a specified replacement.

PLAIN

Leaves the matching field unprocessed. Only available if the redaction target is a field (i.e., INRUPT_LOGGING_REDACTION_{NAME}_FIELD).

DROP

Suppresses the matching field. Only available if the redaction target is a field (i.e., INRUPT_LOGGING_REDACTION_{NAME}_FIELD).

PRIORITIZE

Changes the log level of the matching message.

SHA256

Replaces the matching text with its hash.

  • If the action is REPLACE (default), see also INRUPT_LOGGING_REDACTION_{NAME}_REPLACEMENT.

  • If the action is to PRIORITIZE, see also INRUPT_LOGGING_REDACTION_{NAME}_LEVEL.

For more information on log redaction, see Logging Redaction.

Added in version 2.2.0.

INRUPT_LOGGING_REDACTION_NAME_ENABLED#

Default: true

A boolean that determines whether the redaction configurations with the specified INRUPT_LOGGING_REDACTION_{NAME}_ prefix is enabled.

For more information on log redaction, see Logging Redaction.

Added in version 2.2.0.

INRUPT_LOGGING_REDACTION_NAME_EXCEPTION#

Fully qualified name of the exception class to match in the log messages (includes inner exception). Configure to target an exception message class.

For more information on log redaction, see Logging Redaction.

Added in version 2.2.0.

INRUPT_LOGGING_REDACTION_NAME_FIELD#

Exact name of the field to match in the log messages. Configure to target a specific log message field for redaction.

For more information on log redaction, see Logging Redaction.

Added in version 2.2.0.

INRUPT_LOGGING_REDACTION_NAME_LEVEL#

A new log level to use for the log message if the INRUPT_LOGGING_REDACTION_{NAME}_ACTION is PRIORITIZE.

Added in version 2.2.0.

INRUPT_LOGGING_REDACTION_NAME_PATTERN#

A regex (see Java regex pattern) to match in the log messages. Configure to target log message text that matches a specified pattern.

For more information on log redaction, see Logging Redaction.

Added in version 2.2.0.

INRUPT_LOGGING_REDACTION_NAME_REPLACEMENT#

Replacement text to use if the INRUPT_LOGGING_REDACTION_{NAME}_ACTION is REPLACE.

If unspecified, defaults to [REDACTED].

For more information on log redaction, see Logging Redaction.

Added in version 2.2.0.

Additional Information#

See also Quarkus Configuration Options.