Backup and restore in Amazon EKS
n Pentaho Data Catalog deployments running on Amazon Elastic Kubernetes Service (EKS), administrators can configure and manage backups to protect critical system data and metadata. The backup and restore framework helps ensure business continuity by enabling recovery of Data Catalog components, such as PostgreSQL, MongoDB, OpenSearch, FE-Workers, and Kubernetes objects.
Data Catalog supports multiple storage options for storing backup data:
Amazon Simple Storage Service (S3) for scalable, cloud-based backups.
Amazon Elastic Block Store (EBS) and Amazon Elastic File System (EFS) for persistent storage within the Amazon EKS cluster.
This section includes detailed procedures to:
Configure a backup in Amazon EKS
In Data Catalog deployments running on Amazon EKS, administrators can configure automated or manual backups for key Data Catalog components. The configuration specifies which services to back up, how often backups run, and where backup data is stored. You can store backups in Amazon Simple Storage Service (Amazon S3), Amazon Elastic Block Store (Amazon EBS), or Amazon Elastic File System (Amazon EFS). Data
Catalog supports multiple storage configurations that let you choose how backups are created and managed. Depending on your environment, you can either use an existing PersistentVolumeClaim (PVC) or let Helm automatically create and manage the PVC during deployment. After setup, backups run automatically through a CronJob in Amazon EKS or can be triggered manually when needed. Retention policies, backup frequency, and storage locations are defined in the Helm configuration.
Configure a backup using Amazon S3 with the existing PVC
In Data Catalog deployments running on Amazon EKS, administrators can store backup data in Amazon S3 using a pre-existing PersistentVolumeClaim (PVC). This configuration allows you to use an existing PVC that is already linked to an S3 bucket through the Amazon S3 Container Storage Interface (CSI) driver. By referencing this PVC in the backup configuration, Data Catalog writes backup data directly to the configured S3 bucket.
When using an existing PVC for S3 storage, ensure that the PVC and its associated StorageClass are correctly configured with the AWS S3 CSI driver and the target S3 bucket.
Perform the following steps to configure a backup using Amazon S3 with the existing PVC:
Before you begin
Verify that the Amazon S3 CSI driver is installed in your Amazon EKS cluster.
Ensure that an S3 bucket is available for storing backup data.
Confirm that the PersistentVolumeClaim (PVC) for S3 is pre-created and bound to the S3 StorageClass.
Verify that the PDC namespace and Helm deployment are accessible.
Ensure that worker nodes have the required IAM permissions to access the S3 bucket.
Locate the
custom-values.yamlfile used for your PDC Helm deployment.
Procedure
Open the custom-values.yaml file for your PDC deployment in a text editor.
Add or update the following backup configuration block:
Save the configuration file.
Apply the configuration to the Amazon EKS cluster.
or
Verify that the backup CronJob is created in the EKS cluster.
Review the CronJob details to confirm the schedule, storage configuration, and PVC reference.
The CronJob specification should reference the PVC name s3-pdc-backup-pvc.
Example: S3 StorageClass and PersistentVolume
The underlying PV must have S3 specifications, such as bucket-name and aws-region and PV and PVC size must match to 'backup.persistence.size'.
Result
Data Catalog is configured to store backups in Amazon S3 using the existing PVC. The backup CronJob runs automatically according to the configured schedule and writes backup files directly to the S3 bucket linked with the PVC.
Configure a backup using Amazon S3 with the Helm-managed PVC
In Data Catalog deployments running on Amazon EKS, administrators can configure backups to use Amazon S3 through a Helm-managed PersistentVolumeClaim (PVC). In this configuration, the Data Catalog Helm chart automatically creates the PVC and connects it to the S3 bucket using the Amazon S3 Container Storage Interface (CSI) driver. This method simplifies setup because the PVC does not need to be created manually before deployment.
The Amazon S3 CSI driver must be installed in the EKS cluster, and the specified StorageClass must be compatible with the S3 driver.
Perform the following steps to configure a backup using Amazon S3 with the Helm-managed PVC:
Before you begin
Verify that the Amazon S3 CSI driver is installed in the Amazon EKS cluster.
Ensure that an S3 bucket is available and accessible to the EKS worker nodes.
Confirm that Helm 3.0 or later and kubectl are installed.
Verify that the PDC namespace is accessible.
Identify or create a StorageClass compatible with S3.
Confirm that the
custom-values.yamlfile for your Helm deployment is available for editing.
Procedure
Open the
custom-values.yamlfile used for your PDC Helm deployment.Add or update the following backup configuration block:
In this case, if the customer wants the PVC to be created by Helmfile, the storageClass and volumeName must be pre-existing and specified in the configuration, as shown above.
Save the configuration file.
Apply the configuration to the Amazon EKS cluster.
or
Verify that the backup CronJob is created successfully.
Review the CronJob details to confirm that the schedule and the storageClass reference match your configuration.
Verify that the Helm deployment automatically created the backup PVC.
Example: S3 StorageClass and PersistentVolume
The underlying PV must have S3 specifications, such as bucket-name and aws-region and PV and PVC size must match to 'backup.persistence.size'.
Result
The PDC backup configuration is updated to use Amazon S3 with a Helm-managed PVC. When the backup CronJob runs, it automatically mounts the PVC and stores all backup files directly in the configured S3 bucket.
Configure a backup using Amazon EBS or EFS with the existing PVC
In Data Catalog deployments running on Amazon EKS, administrators can configure backups to use Amazon EBS or Amazon EFS through an existing PersistentVolumeClaim (PVC). This configuration allows you to use a pre-created PVC that points to an EBS or EFS volume already available in your Amazon EKS cluster. The PDC backup process writes all backup data to this PVC, which is mounted as persistent storage within the cluster.
Perform the following steps to configure a backup using Amazon EBS or EFS with the existing PVC:
Before you begin
Verify that the EBS or EFS StorageClass is configured in your Amazon EKS cluster.
Ensure that a PersistentVolumeClaim (PVC) is pre-created and bound to the desired EBS or EFS volume.
Confirm that the PDC namespace and Helm deployment are accessible.
Ensure that you have Helm 3.0 or later and kubectl installed.
Locate the
custom-values.yamlfile used for your PDC Helm deployment.
Procedure
Open the
custom-values.yamlfile for your PDC deployment in a text editor.Add or update the following backup configuration block:
In this case, if the customer has their own PVC, the name of the PVC must be specified in the configuration as shown above.
Save the configuration file.
Apply the configuration to the Amazon EKS cluster.
or
Verify that the backup CronJob is created in the EKS cluster.
Review the CronJob details to confirm the schedule, PVC reference, and component backup targets.
The CronJob should reference the existing PVC pdc-backup-pvc.
Verify that the PVC is correctly mounted and available in the cluster.
Example: EBS or EFS PersistentVolume and PVC
Result
The PDC backup configuration is updated to use Amazon EBS or Amazon EFS storage through the specified existing PVC. When the backup CronJob runs, it stores all backup files on the mounted persistent volume, enabling quick recovery from local cluster storage.
Configure a backup using Amazon EBS or EFS with Helm-managed PVC
In Data Catalog deployments running on Amazon EKS, administrators can configure backups to use Amazon EBS or Amazon EFS through a Helm-managed PersistentVolumeClaim (PVC). In this configuration, the Helm deployment automatically creates and manages the PVC based on the provided StorageClass configuration. This approach is recommended when administrators prefer automated storage management and do not want to manually create PVCs before deployment.
Perform the following steps to configure a backup using Amazon EBS or EFS with Helm-managed PVC:
Before you begin
Verify that the EBS or EFS StorageClass is configured in your Amazon EKS cluster.
Confirm that Helm 3.0 or later and kubectl are installed.
Ensure that the PDC namespace and Helm deployment are accessible.
Verify that the
custom-values.yamlfile used for the PDC Helm deployment is available.Ensure that the EBS volume or EFS mount target is accessible from the cluster nodes.
Procedure
Open the
custom-values.yamlfile used for your PDC Helm deployment.Add or update the following backup configuration block:
In this case, if the customer wants the PVC to be created by Helmfile, the
storageClassandvolumeNamemust be pre-existing and specified in the configuration, as shown above. The volumeName field is optional and can be left empty if you want Helm to automatically assign one.Save the configuration file.
Apply the configuration to the Amazon EKS cluster.
or
Verify that the backup CronJob is created successfully.
Review the CronJob details to confirm that the schedule, StorageClass, and volume configuration are correctly referenced.
Verify that the Helm deployment automatically created the backup PVC.
Example: EBS or EFS StorageClass and PersistentVolume
Result
The PDC backup configuration is updated to use Amazon EBS or Amazon EFS with a Helm-managed PVC. When the backup CronJob runs, it automatically mounts the newly created PVC and stores backup data on the corresponding EBS or EFS volume.
Configure backup targets
In Data Catalog deployments running on Amazon EKS, administrators can control which PDC components are included in each backup. Backup targets represent the core services and configuration objects that store catalog metadata, application settings, and operational data.
Each backup target corresponds to a specific PDC service or metadata store. You can include or exclude services as needed and optionally define individual Kubernetes objects.
PostgreSQL
Stores configuration and metadata for user management, settings, and workflows.
MongoDB
Stores data asset, profiling, and relationship metadata collected from source systems.
OpenSearch
Stores indexed metadata used for catalog search, glossary, and lineage visualization.
FE-Workers
Stores dictionaries, patterns, and system-defined data used for data profiling and discovery.
Objects
Stores Kubernetes objects such as Secrets and ConfigMaps used by PDC services. You can define these objects by specifying the kind (for example, secret, configmap) and name (for example, cat-key).
You can define these backup targets in the Helm configuration to enable or disable backups for specific components at deployment time. This flexibility allows administrators to back up only the required services, exclude external databases, or include custom Kubernetes objects that need to be preserved during recovery.
Perform the following steps to configure backup targets:
Before you begin
Verify that you have access to the PDC Helm deployment and the
custom-values.yamlfile.Confirm that Helm 3.0 or later and kubectl are installed on the administrator workstation.
Ensure that the backup configuration for your selected storage type (Amazon S3, EBS, or EFS) is already defined.
Identify which components and objects you want to include in the backup.
Procedure
Open the
custom-values.yamlfile for your PDC deployment in a text editor.Locate the backup configuration block under the pdc-backup section.
Define the backup targets by setting the enabled parameter to true or false for each service:
Note:
You can list multiple Kubernetes objects under the object section. Common examples include:
kind: secret,name: cat-keykind: configmap,name: pdc-settingskind: secret,name: pdc-licensekind: configmap,name: jobserver-config
Enable
FE-WorkersandObjectsbackup only if these components or resources are part of your recovery plan. For external databases such as Amazon Aurora PostgreSQL, setpostgres.enabledtofalseand manage backups externally.
Save the configuration file.
Apply the configuration to the Amazon EKS cluster.
or
Verify that the backup CronJob includes the selected targets.
The job definition lists only the enabled components and specified objects as backup targets.
Result
Backup targets are configured successfully. When the backup CronJob runs, it includes only the enabled components and any defined Kubernetes objects, and stores their backups in the configured storage location.
Run a backup in Amazon EKS
In Data Catalog deployments running on Amazon EKS, administrators can perform both automated and manual backups of key Data Catalog components. Each backup captures data and configuration from PostgreSQL, MongoDB, OpenSearch, FE-Workers, and related Kubernetes objects.
After you apply the backup configuration, a CronJob is automatically created in the Amazon EKS cluster. The CronJob runs daily at midnight by default. You can also trigger a manual backup at any time, for example, before performing an upgrade or configuration change.
Perform the following steps to run a backup in Amazon EKS:
Before you begin
Before you run a backup, make sure the following requirements are met:
The Data Catalog backup CronJob is configured in the Amazon EKS cluster.
kubectl and Helm are installed and configured to access the cluster.
You have administrator access to the PDC namespace.
The configured storage backend, Amazon S3 storage, Amazon EBS volumes, or Amazon EFS file systems, is accessible from the cluster.
Procedure
Verify that the backup CronJob exists in the PDC namespace.
The CronJob named
pdc-backupshould be listed.Check the CronJob schedule.
The default schedule is
0 0 * * *, which runs daily at midnight.Trigger a manual backup when needed.
View all backup jobs in the PDC namespace.
View backup logs for each component.
Each log confirms whether the backup completed successfully for that component.
Verify backup files in Amazon S3 storage.
The command lists all backup folders organized by service and timestamp.
Create a temporary pod to verify backup files in Amazon EBS volumes or Amazon EFS file systems. Save the following YAML as backup-checker.yaml:
Replace
$<customer-artifactory>with the actual artifactory path, like ECR or any private artifactory.Apply the pod specification.
List backup files inside the pod.
The command lists all backup folders by component and timestamp.
Delete the temporary pod after verification.
Result
The backup job completes successfully and stores the data in the configured Amazon S3 bucket or Amazon EBS or Amazon EFS persistent volume. The CronJob continues to run automatically according to the defined schedule. Container logs confirm that all components were backed up successfully.
Verify backups in Amazon EKS
In Data Catalog deployments running on Amazon EKS, administrators can verify that backup jobs are running successfully and that backup files are stored correctly in the configured storage backend. Verifying backups ensures that the scheduled or manual backup operations complete without errors and that data for all Data Catalog components is available for recovery when needed.
Data Catalog supports multiple storage options for backup data. The verification steps differ depending on the storage backend used in your deployment:
Amazon S3 storage: Backups are written to an S3 bucket, and verification is performed by inspecting the bucket contents and checking job logs. For more information, see Verify backups in Amazon S3 storage.
Amazon EBS volumes or Amazon EFS file systems: Backups are written directly to a persistent volume claim (PVC) mounted in the EKS cluster, and verification involves inspecting files stored inside the PVC. For more information, see Verify backups in Amazon EBS volumes or Amazon EFS file systems.
Verify backups in Amazon S3 storage
In Data Catalog deployments running on Amazon EKS with Amazon S3 as the backup storage, administrators can verify that backups are successfully created and stored in the configured S3 bucket. Verification ensures that the pdc-backup CronJob is running correctly, that each backup job completes successfully, and that the backup data for all Data Catalog components is available in S3.
Perform the following steps to verify the backups in Amazon S3 storage:
Before you begin
Make sure the following requirements are met:
Data Catalog backups are configured to use Amazon S3 in the Helm configuration file.
kubectl and AWS CLI are installed and configured.
The AWS credentials or IAM role attached to the Amazon EKS worker nodes provide access to the Amazon S3 bucket.
You have the Amazon S3 bucket name used for storing Data Catalog backups.
You have administrator access to the PDC namespace in the Amazon EKS cluster.
Procedure
Check that the backup CronJob exists in the PDC namespace.
The pdc-backup CronJob should appear in the list.
Verify that the most recent backup job completed successfully.
The Completed status indicates that the backup job finished without errors.
Check the logs of each backup container to confirm completion.
Each container log should display a “Backup completed successfully” message for its corresponding component.
Verify that new backup folders are created in the S3 bucket.
The command lists backup folders grouped by component and timestamp. Confirm that the latest timestamp corresponds to the last backup job run.
Drill down into a component folder to verify detailed backup files.
Each directory should contain files such as .pgdump, .tar.gz, or .yaml representing backed-up data.
Verify that backup timestamps in S3 align with the CronJob schedule. For example, if the schedule is set to midnight (0 0 * * *), confirm that new backup folders appear daily at approximately that time.
Optionally, download and inspect one backup file to confirm data integrity.
The file size and timestamp confirm that the dump file was generated during the latest backup run.
Result
The backups are verified successfully in Amazon S3 storage. Each Data Catalog component’s data is available in the S3 bucket, and the folder structure reflects the latest backup job timestamp. The CronJob and job logs confirm that all backup operations completed without errors.
Verify backups in Amazon EBS volumes or Amazon EFS file systems
In Data Catalog deployments running on Amazon EKS, administrators can verify backups stored on Amazon EBS or Amazon EFS volumes. These backups are written directly to a persistent volume claim (PVC) mounted in the EKS cluster. Verification ensures that backup jobs run successfully, that backup files are created in the /backups directory of the PVC, and that each Data Catalog component is included in the backup.
Perform the following steps to verify backups in Amazon EBS volumes or Amazon EFS file systems:
Before you begin
Make sure the following requirements are met:
Backups are configured to use Amazon EBS or Amazon EFS in the Helm configuration file.
The Data Catalog backup CronJob is running in the Amazon EKS cluster.
kubectl is installed and configured to access the Amazon EKS cluster.
You have administrator access to the PDC namespace.
You have the PersistentVolumeClaim (PVC) name used for storing backups.
Procedure
Check that the
pdc-backupCronJob exists in the PDC namespace.The CronJob named
pdc-backupshould appear in the list.Verify that the most recent backup job completed successfully.
The Completed status confirms that the backup job ran without errors.
Review the logs for each backup container to confirm successful completion.
Each log should confirm that the backup completed successfully for that component.
Create a temporary verification pod to inspect backup files in the PVC. Save the following YAML file as
backup-verifier.yaml.Replace
$<customer-artifactory>with the actual artifactory path, like ECR or any private artifactory.Apply the pod specification to the EKS cluster.
Connect to the verification pod.
List the backup folders stored in the mounted PVC.
Backup directories should be organized by timestamp and contain subfolders for PostgreSQL, MongoDB, OpenSearch, FE-Workers, and Kubernetes objects.
Verify that backup folders are updated according to the CronJob schedule. Confirm that a new folder exists for each backup cycle (for example, daily if the schedule is
0 0 * * *).Exit the pod session after verification.
Delete the temporary verification pod.
Result
The backup files are verified successfully in the Amazon EBS volumes or Amazon EFS file systems persistent volume. Backup folders for each Data Catalog component are available under the /backups directory, organized by timestamp. The job status and logs confirm that the backup CronJob is running successfully in the EKS cluster.
Verify retention in Amazon EKS
In Data Catalog deployments running on Amazon EKS, administrators can verify that backup retention policies are working correctly. Retention ensures that older backups are automatically deleted or archived based on the configured duration, preventing unnecessary storage consumption and maintaining compliance with data governance requirements.
Retention behavior depends on the type of storage used for backups:
Amazon EBS volumes or Amazon EFS file systems: Retention is managed through the Data Catalog configuration parameters defined in the
custom-values.yamlfile. Thebackup.retention.dayssetting specifies how long backups are retained before being automatically deleted.Amazon S3: Retention is managed externally through AWS S3 lifecycle policies, which automatically delete or transition older backups according to the lifecycle rules defined in the bucket.
Restore data from backup in Amazon EKS
In Data Catalog deployments running on Amazon EKS, administrators can restore data and configurations from previously created backups. Restoring data helps recover Data Catalog components after system failures, data corruption, or configuration issues. PDC supports restoration from two storage types:
Amazon S3, where backups are stored in S3 buckets.
Amazon EBS or Amazon EFS, where backups are stored in persistent volume claims (PVCs) inside the EKS cluster.
Each Data Catalog component, PostgreSQL, MongoDB, OpenSearch, FE-Workers, and Kubernetes objects, has its own restore procedure. Administrators can restore individual services or the complete Data Catalog environment, depending on the recovery requirement.
Before performing any restore procedure, stop all active Data Catalog processes that connect to the target databases or services to prevent conflicts.
Restore from Amazon S3 storage
When backups are stored in Amazon S3, each Data Catalog component must be restored separately from the data in the Amazon S3 bucket. The following guides describe how to download backup files, connect to service pods, and restore data for each component.
Restore PostgreSQL data from Amazon S3 Learn how to drop existing PostgreSQL databases, restore data using
.pgdumpfiles, and verify database creation.Restore MongoDB Data from Amazon S3 Learn how to unpack MongoDB backup files, run mongorestore, and confirm successful restoration.
Restore OpenSearch data from Amazon S3 Learn how to use the OpenSearch restore script to restore indexes and restart services.
Restore FE-Workers data from Amazon S3 Learn how to extract and copy FE-Worker backup files, including dictionaries and patterns, to the appropriate directories.
Restore Kubernetes objects from Amazon S3 Learn how to restore Kubernetes secrets and configuration files using YAML manifests stored in the S3 bucket.
Restore PostgreSQL data from Amazon S3
In Data Catalog deployments running on Amazon EKS, administrators can restore PostgreSQL data from backups stored in Amazon S3. PostgreSQL stores configuration and metadata for Data Catalog, so restoring it is a critical step in recovering the environment after data loss or system failure.
Before restoring PostgreSQL data, stop all Data Catalog services that connect to the database to avoid conflicts during restoration.
Perform the following steps to restore PostgreSQL data from Amazon S3 storage:
Before you begin
Make sure the following requirements are met:
The PostgreSQL backup is available in the Amazon S3 bucket.
AWS CLI and kubectl are installed and configured to access the Amazon EKS cluster.
You have the following information:
The Amazon S3 bucket name and the timestamp of the backup you want to restore.
The PostgreSQL pod name and PDC namespace.
The PostgreSQL username and password.
The PostgreSQL pod is in a Running state.
Procedure
Download the PostgreSQL backup files from the S3 bucket.
Drop existing databases in PostgreSQL.
Restore the PostgreSQL database from the downloaded dump file.
Verify the restore by listing all databases.
Result
The PostgreSQL data is restored successfully from the backup stored in Amazon S3 storage. After the PostgreSQL service restarts, all related Data Catalog databases are available and ready for use.
Restore MongoDB Data from Amazon S3
In Data Catalog deployments running on Amazon EKS, administrators can restore MongoDB data from backups stored in Amazon S3. MongoDB stores operational and user metadata for Data Catalog, so restoring it is an essential step in recovering a functional catalog environment.
Before restoring MongoDB data, stop all Data Catalog services that connect to the database to avoid conflicts during restoration.
Perform the following steps to restore MongoDB data from Amazon S3 storage:
Before you begin
Make sure the following requirements are met:
The MongoDB backup files are available in the Amazon S3 bucket.
AWS CLI and kubectl are installed and configured to access the Amazon EKS cluster.
You have the following information:
The Amazon S3 bucket name and timestamp of the backup you want to restore.
The MongoDB pod name and PDC namespace.
The MongoDB username and password.
The MongoDB pod is in the Running state.
kubectl get pods -n <PDC_NAMESPACE> | grep mongo
Procedure
Download the MongoDB backup files from the S3 bucket.
Restore the MongoDB data to the cluster.
Verify the restore by listing databases.
After restoring from the existing backup, it is necessary to restart the licensing-api deployment for the data to take effect.
Result
The MongoDB data is restored successfully from the backup stored in Amazon S3. All operational and user metadata for PDC is available once the MongoDB service restarts and reconnects to the application.
Restore OpenSearch data from Amazon S3
In Data Catalog deployments running on Amazon EKS, administrators can restore OpenSearch data from backups stored in Amazon S3. OpenSearch stores indexed metadata used for search and discovery in PDC. Restoring OpenSearch ensures that catalog search results, entity references, and metadata associations are available after a recovery or redeployment.
Before performing the restore, stop any PDC services that query OpenSearch to prevent indexing conflicts.
Perform the following steps to import the data from Amazon S3 storage into the OpenSearch service running in the Amazon EKS cluster:
Before you restore, make sure curl and jq are installed.
Procedure
Download the OpenSearch backup files from the S3 bucket.
Create an opensearch_restore.sh file with the below content. Replace variables,
<LOCAL_PATH>/<TIMESTAMP>and<PDC_NAMESPACE>)Give executable permission to
opensearch_restore.shfile.Execute the following script.
Verify that all indexes are restored.
Restart the OpenSearch deployment to apply the restored data.
Result
OpenSearch data is restored successfully from the backup stored in Amazon S3. All indexed metadata used for search and discovery in Data Catalog is available once the OpenSearch service restarts and completes indexing.
Restore FE-Workers data from Amazon S3
In Data Catalog deployments running on Amazon EKS, administrators can restore FE-Workers data from backups stored in Amazon S3 storage. The FE-Workers component stores system-defined data patterns, dictionaries, and processed datasets that are essential for profiling and data analysis within Data Catalog. Restoring FE-Workers ensures that these reference files are recovered and available for downstream data discovery and governance tasks.
Stop any active Data Catalog jobs or services that access FE-Workers data before performing the restore to prevent file-level conflicts.
Perform the following steps to restore FE-Workers data from Amazon S3 storage:
Before you begin
Make sure the following requirements are met:
The FE-Workers backup files are available in the Amazon S3 bucket.
AWS CLI and kubectl are installed and configured to access your Amazon EKS cluster.
You have the Amazon S3 bucket name and the timestamp of the backup information.
Procedure
Download the FE-Workers backup files from the S3 bucket.
Restore the FE-Workers data to the target pod.
Verify that files are extracted successfully.
Result
FE-Workers data is restored successfully from the backup stored in Amazon S3. All dictionaries, system-defined patterns, and processed datasets are available in the FE-Workers container and ready for use by the PDC application.
Restore Kubernetes objects from Amazon S3
In Data Catalog deployments running on Amazon EKS, administrators can restore Kubernetes objects such as Secrets and ConfigMaps from backups stored in Amazon S3. These objects contain configuration data and credentials required for Data Catalog components to operate correctly. Restoring Kubernetes objects ensures that secure keys, connection information, and application configuration are recovered after a cluster rebuild or configuration loss.
Perform the following steps to restore Kubernetes objects from Amazon S3:
Before you begin
Make sure the following requirements are met:
The Kubernetes object backup files are available in the Amazon S3 bucket.
AWS CLI and kubectl are installed and configured to access your Amazon EKS cluster.
You have the following information:
The Amazon S3 bucket name and timestamp of the backup.
The PDC namespace where the secrets must be restored.
You have cluster administrator privileges in the Amazon EKS cluster.
Procedure
Download the object backup files from the Amazon S3 bucket.
Restore the Kubernetes objects from the downloaded YAML files.
Verify the restored Kubernetes secrets.
Result
Kubernetes objects are restored successfully from the backup stored in Amazon S3. All restored objects are re-applied to the specified PDC namespace, ensuring that the required credentials and configuration settings are available for Data Catalog services.
Restore from Amazon EBS volumes or Amazon EFS file systems
In Data Catalog deployments running on Amazon EKS, administrators can restore backup data stored in Amazon EBS or Amazon EFS volumes. When Data Catalog backups are configured to use persistent storage, all backup files are stored in a PersistentVolumeClaim (PVC) that remains available within the EKS cluster.
Restoration from EBS or EFS storage allows administrators to recover component data such as PostgreSQL databases, MongoDB collections, OpenSearch indexes, FE-Workers data, and Kubernetes objects directly from the cluster without downloading backup files externally.
Each Data Catalog component has its own restore procedure that runs from within the EKS cluster. Select the appropriate guide based on the component you want to restore.
Restore PostgreSQL data from Amazon EBS volumes or Amazon EFS file systems Restore PostgreSQL databases using the psql command from backup files available in the mounted PVC.
Restore MongoDB data from Amazon EBS volumes or Amazon EFS file systems Restore MongoDB collections using the mongorestore utility from the backup data stored in the PVC.
Restore OpenSearch data from Amazon EBS volumes or Amazon EFS file systems Restore OpenSearch indexes, mappings, and aliases using the provided restore script executed within a temporary restore pod.
Restore FE-Workers data from Amazon EBS volumes or Amazon EFS file systems Restore FE-Workers dictionaries, patterns, and system-defined data by extracting archived backups into the FE-Workers PVC.
Restore Kubernetes objects from Amazon EBS volumes or Amazon EFS file systems Restore Kubernetes Secrets and ConfigMaps by applying YAML manifests backed up to the PVC.
Restore PostgreSQL data from Amazon EBS volumes or Amazon EFS file systems
In Data Catalog deployments running on Amazon EKS, administrators can restore PostgreSQL data from backups stored in Amazon EBS or Amazon EFS. When Data Catalog backups are configured to use persistent storage, backup data is written directly to a PersistentVolumeClaim (PVC) in the EKS cluster. You can restore PostgreSQL data by creating a temporary restore pod that mounts the same PVC and running PostgreSQL commands to import data from the backup files.
Perform the following steps to restore data from PostgreSQL:
Before you begin
Make sure the following requirements are met:
The backup data exists in the /backups/postgres/ directory of the PVC used for Data Catalog backups.
kubectl is installed and configured to access the Amazon EKS cluster.
The PostgreSQL service is running in the same PDC namespace.
You have identified the PVC name, PDC namespace, and PostgreSQL credentials.
All active PDC services that connect to PostgreSQL are stopped before the restore process begins.
Procedure
Save the following pod configuration as
pg-restore.yaml.Replace
$<customer-artifactory>with the actual artifactory path, like ECR or any private artifactory.Apply the pod specification in the EKS cluster.
Verify that the restore pod is running in the specified namespace.
Access the restore pod.
List the available backup files in the mounted directory.
The directory should contain a file such as postgres_full_<TIMESTAMP>.pgdump.
Set the PostgreSQL password as an environment variable.
Drop existing PostgreSQL databases to avoid conflicts during restoration.
Restore the PostgreSQL database from the backup file.
Verify that the databases are restored successfully.
The restored databases should appear in the list.
Exit the restore pod.
Delete the temporary restore pod after the restore process is complete.
Result
PostgreSQL data is restored successfully from the Amazon EBS or Amazon EFS storage used for Data Catalog backups. The restored databases are available and accessible once the PostgreSQL service restarts and reconnects with the PDC application.
Restore MongoDB data from Amazon EBS volumes or Amazon EFS file systems
In Data Catalog deployments running on Amazon EKS, administrators can restore MongoDB data from backups stored in Amazon EBS or Amazon EFS. When Data Catalog backups are configured to use persistent storage, backup data is stored in a PersistentVolumeClaim (PVC) in the EKS cluster. You can restore MongoDB data by creating a temporary restore pod that mounts the same PVC and importing the data using the mongorestore utility.
Perform the following steps to restore the MongoDB data from backups:
Before you begin
Make sure the following requirements are met:
The backup files exist in the /backups/mongodb/ directory of the PersistentVolumeClaim (PVC) used for Data Catalog backups.
kubectl is installed and configured to access your Amazon EKS cluster.
The MongoDB service is running in the same PDC namespace.
You have identified the PVC name, PDC namespace, and MongoDB credentials.
All active PDC services that connect to MongoDB are stopped before restoring data.
Procedure
Save the following pod configuration as
mongo-restore.yaml.Replace
$<customer-artifactory>with the actual artifactory path, like ECR or any private artifactory.Apply the pod specification to the EKS cluster.
Verify that the restore pod is running.
Access the restore pod.
List the available backup files in the mounted directory.
The directory should contain MongoDB backup folders or BSON files representing each database.
Restore the MongoDB data from the backup.
This command drops existing collections and restores data from the specified backup directory.
Verify that the data has been restored successfully.
The restored databases should appear in the list.
Exit the restore pod.
Delete the temporary restore pod.
Restart the licensing-api deployment to apply the restored data.
Result
MongoDB data is restored successfully from the Amazon EBS or Amazon EFS storage used for Data Catalog backups. All MongoDB collections are recovered, and the licensing-api deployment is refreshed to reflect the restored data.
Restore FE-Workers data from Amazon EBS volumes or Amazon EFS file systems
In Data Catalog deployments running on Amazon EKS, administrators can restore FE-Workers data from backups stored in Amazon EBS or Amazon EFS. When Data Catalog backups are configured to use persistent storage, FE-Workers data, including patterns, dictionaries, and temporary profiling results, is stored in a PersistentVolumeClaim (PVC). You can restore this data by creating a temporary restore pod that mounts both the backup PVC and the FE-Workers data PVC, then extracting the backup files into the target directory.
Perform the following steps to restore FE-Workers data from backups stored in Amazon EBS or Amazon EFS.
Before you begin
Make sure the following requirements are met:
The backup files exist in the
/backups/fe-workers/directory of the backup PersistentVolumeClaim (PVC).kubectl is installed and configured to access the Amazon EKS cluster.
You have identified the PVC name used for the backup and the PVC name used for FE-Workers data.
The PDC namespace is correct.
All active FE-Worker jobs or services are stopped before the restore is performed.
Procedure
Save the following pod configuration as
fe-worker-restore.yaml.Replace
$<customer-artifactory>with the actual artifactory path, like ECR or any private artifactory.Apply the restore pod specification to the EKS cluster.
Verify that the restore pod is running.
Access the restore pod.
List the available FE-Workers backup files.
The directory should contain an archive file such as fe-worker-backup-<TIMESTAMP>.tar.gz.
Extract the FE-Workers backup files into the target directory.
Verify that the files have been extracted successfully.
The directory should include data folders such as patterns-systemdefined, dictionaries-en, and data.
Exit the restore pod.
Delete the temporary restore pod.
Result
FE-Workers data is restored successfully from the Amazon EBS or Amazon EFS storage used for Data Catalog backups. The restored dictionaries, patterns, and data files are available in the FE-Workers data directory and ready for use by the PDC application.
Restore Kubernetes objects from Amazon EBS volumes or Amazon EFS file systems
In Data Catalog deployments running on Amazon EKS, administrators can restore Kubernetes objects such as Secrets and ConfigMaps from backups stored in Amazon EBS or Amazon EFS. When Data Catalog backups are configured to use persistent storage, these objects are saved in a PersistentVolumeClaim (PVC) in the EKS cluster. You can restore Kubernetes objects by creating a temporary restore pod that mounts the same PVC and applies the backed-up manifests.
Perform the following steps to restore Kubernetes objects, such as Secrets and ConfigMaps, from backups stored in Amazon EBS or Amazon EFS.
Before you begin
Make sure the following requirements are met:
The object backup files exist in the
/backups/objects/directory of the backup PersistentVolumeClaim (PVC).kubectlis installed and configured to access the Amazon EKS cluster.The
pdc-backup-saservice account is configured with permissions to create and update Kubernetes objects.You have identified the PDC namespace and the PVC name used for storing the backup.
You have cluster administrator access to apply
SecretsandConfigMaps.
Procedure
Save the following pod configuration as
object-restore.yaml.Replace
$<customer-artifactory>with the actual artifactory path, like ECR or any private artifactory.Apply the pod specification to the EKS cluster.
Verify that the restore pod is running.
Access the restore pod.
List the available Kubernetes object backup files.
The directory should contain YAML manifest files for
SecretsorConfigMaps, such assecret_cat-key_<TIMESTAMP>.yaml.Apply the backed-up object manifests to restore them in the cluster.
Verify that the objects have been restored.
The restored secret (for example,
cat-key) should appear in the list.Exit the restore pod.
Delete the temporary restore pod.
Result
Kubernetes Secrets and ConfigMaps are restored successfully from the Amazon EBS or Amazon EFS storage used for Data Catalog backups. The restored objects are available in the PDC namespace, allowing Data Catalog components to access their required configuration and credentials.
Restore OpenSearch data from Amazon EBS volumes or Amazon EFS file systems
In Data Catalog deployments running on Amazon EKS, administrators can restore backup data stored in Amazon EBS Volumes or Amazon EFS File Systems. When Data Catalog backups use persistent storage, all backup files are stored in a PersistentVolumeClaim (PVC) that remains available in the Amazon EKS cluster. Each Data Catalog component can be restored individually by creating a temporary restore pod that mounts the same PVC used during the backup process.
Restoring from Amazon EBS or Amazon EFS allows administrators to recover component data, such as PostgreSQL databases, MongoDB collections, OpenSearch indexes, FE-Workers data, and Kubernetes objects, directly within the cluster, without downloading backup files externally.
Use the same PVC that was used for backups. Restoring from an incorrect PVC can lead to missing or outdated search indexes. The restore process requires the jq utility in the container to process JSON data.
Before you begin
Confirm that backup files exist in the
/backups/opensearch/directory of the backup PVC.Verify that kubectl is installed and configured to access the Amazon EKS cluster.
Ensure that the OpenSearch service is running in the same namespace.
Identify the PVC name used for the backup and the PDC namespace.
Confirm that the jq package is available in the container image (
PDC_TOOLBOX:debian-12).
Perform the following steps to OpenSearch data from Amazon EBS Volumes or Amazon EFS file systems:
Save the following pod configuration as
opensearch-restore.yaml.Replace
$<customer-artifactory>with the actual artifactory path, like ECR or any private artifactory.Apply the restore pod specification to the EKS cluster.
Verify that the restore pod is running.
Create the OpenSearch restore script locally and save it as opensearch_restore.sh. This script automates restoring all OpenSearch indexes from the PVC backup directory.
Assign executable permissions to the restore script.
Copy the restore script to the OpenSearch restore pod.
Access the restore pod.
Navigate to the script directory.
Run the OpenSearch restore script.
The script restores OpenSearch indexes, mappings, and data from the backup directory and automatically re-creates aliases. It processes indexes in chunks, using parallel ingestion for large data sets to speed up restoration.
Confirm that the indexes are restored successfully.
The list should display all PDC-related indexes, such as
pdc_entity,pdc_policy, andpdc_glossary.Exit the restore pod.
Delete the temporary restore pod after completing the process.
Result
OpenSearch data is restored successfully from the Amazon EBS or Amazon EFS storage used for Data Catalog backups. All indexes, mappings, and aliases are re-created, and search functionality is available in Data Catalog once the OpenSearch service completes synchronization.
Last updated

