Using automated data tiering

As an admin or user, you can use Data Optimizer to perform data tiering with the Hadoop Storage Policies feature. You can automatically move tagged files from active Hadoop nodes to cloud-backed archive locations, for example, S3-backed archive locations, resulting in space and potential cost savings.

Workflow

Your Hadoop admin or user creates the policy in the JSON configuration file.
Using the command line, run Data Optimizer's Policy Manager command to parse the policy configuration file, which:
1. Scans files in the path specified by the policy and filters files based on the designated retention time.
2. Fetches files not accessed since the designated number of days.
3. Tags filtered files with a storage policy (COLD, WARM, and HOT).
4. Moves the targeted files from Hadoop nodes into Pentaho Data Optimizer (PDSO) volumes.

PreviousRunning and stopping Data Optimizer in Cloudera NextPerform automated data tiering

Last updated 4 months ago

Was this helpful?