Configure HDFS to use the Data Optimizer volume

Before HDFS datanodes can begin tiering blocks to the Data Optimizer volume, you must configure the HDFS datanodes to see the Data Optimizer volume and to recognize Data Optimizer as an ARCHIVE volume type. If you deployed Data Optimizer to some but not all datanodes, then you will need to create a new configuration group for the datanodes running Data Optimizer volumes.

From the Cloudera dashboard, navigate to HDFS > Configuration.
If you configured a subset of data nodes with Data Optimizer, create a new HDFS configuration group with the following steps:
1. In the top of the HDFS Configs window, open the Config Group drop-down menu and click Manage Config Groups.
  The Manage HDFS Configuration Groups dialog box opens.
2. From the list of existing configuration groups, select the HDFS configuration group you want to copy.
  Although this selection is typically the Default group, your selected group may differ if you are already using configuration groups to manage your data nodes.
3. Open the dropdown menu on the form and select Duplicate to create a copy of the selected configuration group.
4. Fill in the Create New Configuration Group form as follows:
  Value
  Entry
  Name
  Datanode PDSO Volume Group
  Description
  Configuration Group for Datanodes with Data Optimizer volumes
5. Click OK.
6. In the Manage HDFS Configuration Groups dialog box, select the Datanode PDSO Volume Group from the list of configuration groups in the left pane.
7. On the right side of the dialog box, click the + (plus) icon to add hosts to the selected configuration group.
  The Select Configuration Group Hosts dialog box opens.
8. In the Select Configuration Group Hosts dialog box, select the checkbox next to each of the datanodes with a Data Optimizer Volume.
9. Click OK.
  The Manage HDFS Configuration Groups dialog box opens.
10. Click Save.
  The HDFS Configs page appears.
Locate the Datanode directories property (dfs.datanode.data.dir).
(Optional) If you did not create a configuration group, proceed to the next step. If you created a new HDFS configuration group because Data Optimizer is deployed to a subset of datanodes, perform the following steps:
1. Place your cursor over the Datanode directories field.
  The + (override) icon appears to the right of the field.
2. Click the + icon to override the current entry.
3. When prompted, choose the HDFS Configuration Group and select the Datanode PDSO Volume Group you created previously in this task.
The new override value prepopulates with the value from the current configuration.
Place your cursor after the end of the current text in the Datanode directories text box and add the [ARCHIVE]<pdso_mount_point>/data value.
The <pdso_mount_point> value is associated with the Pentaho Data Optimizer Mount Point property in the Data Optimizer Configuration.
For example, if the PDSO mount point is /mnt/pdso, then the value of the new entry would be [ARCHIVE]/mnt/pdso/data.
The property requires a comma delimited list, so be sure to separate the new entry from the existing entries with a comma.
Note: As a best practice, create a subdirectory under the Data Optimizer mount point for the HDFS Datanode directory and assign it a name. In this example, the subdirectory name is data, but the name can be whatever you choose.
Save your work.
Refresh or restart your datanodes. See Refresh HDFS Datanodes after adding Data Optimizer volumes.

Verify Data Optimizer is working properly. See Tiering HDFS Blocks to Data Optimizer.

PreviousConfigure Data Optimizer volumes to restart automatically NextRefresh HDFS Datanodes after adding Data Optimizer volumes

Last updated 23 days ago

Was this helpful?