Step 7: Configure HDFS to use the Pentaho Data Optimizer volume

Before you can tier HDFS datanodes to the Data Optimizer volume, you must configure the HDFS datanodes to see the volume and to recognize Data Optimizer as an ARCHIVE type volume. If you deployed Data Optimizer to some, but not all datanodes, you must create a new configuration group for the datanodes running Data Optimizer volumes.

  1. In the Ambari dashboard, navigate to HDFS > Configs

  2. If you configured a subset of data nodes with Data Optimizer, create a new HDFS configuration group with the following steps:

    1. In the top of the HDFS Configs window, open the Config Group menu and click Manage Config Groups.

      The Manage HDFS Configuration Groups dialog box opens.

    2. Select the HDFS configuration group you want to copy from the list of existing configuration groups.

      Although this selection is typically the Default group, your selected group may differ if you are already using configuration groups to manage your data nodes.

    3. Open the menu on the form and select Duplicate to create a copy of the selected configuration group.

    4. Fill in the Create New Configuration Group form as follows:

      Value
      Entry

      Name

      Datanode PDSO Volume Group

      Description

      Configuration Group for Datanodes with Data Optimizer volumes

    5. Click OK.

    6. In the Manage HDFS Configuration Groups dialog box, select Datanode PDSO Volume Group from the list of configuration groups in the left pane.

    7. Click the + (plus) icon, on the right side of the dialog box, to add hosts to the selected configuration group.

      The Select Configuration Group Hosts dialog box opens.

    8. Select the checkbox next to each of the datanodes with a Data Optimizer volume.

    9. Click OK.

      The Manage HDFS Configuration Groups dialog box appears.

    10. Click Save.

      The HDFS Configs page appears.

  3. On the HDFS Configs page, locate the Datanode directories property (dfs.datanode.data.dir)

  4. If you did not create a configuration group, then proceed to the next step. If you created a new HDFS configuration group because Data Optimizer is deployed to a subset of datanodes, then:

    1. Point over the Datanode directories field.

      The + (override) icon appears to the right of the field.

    2. Click the + icon to override the current entry.

    3. When prompted, choose the HDFS Configuration Group and select theDatanode PDSO Volume Group you created previously in this task.

      The new override value prepopulates with the value from the current configuration.

  5. Place your cursor at the end of the current text in the Datanode directories field and add the [ARCHIVE]<pdso_mount_point>/data value, where <pdso_mount_point> is the value associated with the Pentaho Data Optimizer Mount Point property in the Data Optimizer configuration.

    For example, if the mount point is /mnt/pdso, then the value of the new entry would be [ARCHIVE]/mnt/pdso/data.

    The property requires a comma delimited list so be sure to separate the new entry from the existing entries with a comma.

    As a best practice, create a subdirectory under the Pentaho Data Optimizer Mount Point for the HDFS Datanode directory and assign it a name. In this example, the subdirectory name is data, but the name can be whatever you choose.

  6. Save your work.

  7. Refresh or restart your datanodes. See Step 8: Restart HDFS datanodes after adding volumes.

  8. Verify Pentaho Data Optimizer is working properly. See Tiering HDFS Blocks to Data Optimizer.

Last updated

Was this helpful?