Running and stopping Data Optimizer in Cloudera

Use the following best practices when starting or stopping Data Optimizer, its volumes, and other services in Cloudera.

Start and stop Data Optimizer

When you are stopping services for a single host or for all hosts in the cluster, as a best practice, always start the Data Optimizer volume component before the HDFS DataNode component and stop it after the HDFS DataNode component. Stopping the Data Optimizer volume while HDFS is running might lead to data availability issues or lead to transient HDFS volume failures. Starting HDFS when Data Optimizer is not running can negatively impact your operations.

If the DataNode is not put into maintenance mode, more blocks will be created to maintain the number of required replication copies. As a best practice, set the DataNode to a maintenance state as described in HDFS-7877 to avoid unnecessarily re-protecting a large number of blocks on a Data Optimizer volume when performing routine DataNode maintenance.

Start and stop Data Optimizer volumes

As a best practice, always start Data Optimizer volumes before the HDFS DataNodes and stop them after stopping the HDFS DataNodes. This sequence ensures optimal operation and prevents potential data availability issues. Like most services and service roles, the Data Optimizer service and the Volume role include start and stop commands that you can access and execute in multiple ways through Cloudera Manager.

  • Service-wide actions

    Use the Start and Stop actions on the Data Optimizer service to start and stop all volume instances associated with the service simultaneously.

  • Individual volume actions

    Start and stop individual volume instances through the Hosts tab. Drill down into an individual host and select Start, Stop, or Restart from the action menu for the specific volume instance.

Start and stop all services

Cloudera Manager does not provide a way to define dependencies between services or to influence the order when stopping or starting all services or when performing rolling restarts. Cloudera Manager is unaware that HDFS depends on Data Optimizer and assumes that Data Optimizer (like most services) is dependent on HDFS. Because of this, when using the cluster or host level Start, Stop, and Restart commands, HDFS will be stopped after Data Optimizer, and started before Data Optimizer, which is not recommended. Be aware that the cluster Stop command stops Data Optimizer volumes before it stops HDFS DataNodes.

The cluster Start command starts HDFS DataNodes before it starts Data Optimizer volumes. For this reason, always start the Data Optimizer service before using the cluster Start command.

Last updated

Was this helpful?