Data Optimizer configuration parameters

The Data Optimizer management interface distributes the configuration information to the Data Optimizer volume nodes for use by the Data Optimizer volume service.

circle-exclamation
circle-info

Do not include leading or trailing spaces if you copy and paste parameter values. Ambari and Cloudera Manager do not validate input.

ENDPOINT

Required

Endpoint address for Hitachi Content Platform. If the ENDPOINT_TYPE is HCP, use the form tenant.hcp_dns_name.

ENDPOINT_TYPE

Optional

Default endpoint type. Acceptable values are case sensitive. - If connecting to Hitachi Content Platform, use HCP.

  • If connecting to Virtual Storage Platform One Object, use HCPCS.

  • If connecting to Amazon S3, use AWS.

BUCKET

Required

Content Platform bucket name or a wildcard value of instance_id. You can use the unique ID generated by Content Platform (instance_id) as a wildcard to avoid name conflicts and to simplify configuration of the instances. Multiple instances can share a common configuration if you use the instance_id wildcard and all other values are identical. You cannot append or prepend the instance_id wildcard value to any other value. For example, bucket_instance_id is an invalid value. If Content Platform is properly configured, Data Optimizer creates its own bucket if the bucket does not already exist.

ACCESS_KEY

Required

S3 Access Key ID used to authenticate S3 requests to Content Platform.

SECRET_KEY

Required

S3 Secret Key used to authenticate S3 requests.

PROTOCOL

Optional

Protocol used to encrypt communication between Data Optimizer and Content Platform using TLS. The default value is https. Acceptable, case sensitive values are https and http.

VERIFY_SSL_CERTIFICATE

Optional

Value used to specify whether to verify certificates within Data Optimizer. Acceptable, case sensitive values are true and false. The default is value is true. If the VERIFY_SSL_CERTIFICATE parameter is set to false, certificate verification is disabled within Data Optimizer. Set this parameter to false when Content Platform is presenting a self-signed certificate, and you still want to use TLS to encrypt transmissions between Data Optimizer and Content Platform.

MOUNT_POINT

Required

HDFS DataNode local directory where Data Optimizer is mounted. The directory must exist and the HDFS user using Data Optimizer must have write permission for the directory. The directory must allow rwx permissions for the owner and owner’s group. For example:```mkdir MOUNT_POINT*<mount point>*chown user:group MOUNT_POINT*<mount point>*chmod 770 MOUNT_POINT*<mount point>*

MD_STORE_DIR

Optional

Local directory used to store files associated with persisting the Data Optimizer local metadata store. The MD_STORE_DIR parameter value must be a fully-qualified directory path starting at the system root (/). If an MD_STORE_DIR value is not specified, the CACHE_DIR directory is used. Specify a value for MD_STORE_DIR when the CACHE_DIR directory is located is on volatile storage or if there is a more durable location for long term file persistence. Do not choose a volatile storage medium for this directory as it is intended to persist for the life of the Data Optimizer volume. For example, if you use transient storage for the CACHE_DIR directory such as RAM_DISK, you should specify a more durable location for the MD_STORE_DIR directory. In addition, if you have a more durable location, such as a RAID partition, and there is room for the metadata store files (up to 2.5 GB), you should specify a MD_STORE_DIR directory on that partition. If the files associated with metadata store persistence are lost or corrupted, you can recover them as explained in Recovering from local metadata store failure or corruption.

RECOVERY_MODE

Optional

Value used to specify whether recovery mode is enabled. Do not set the RECOVERY_MODE parameter unless you have read and understood the section Recovering from local metadata store failure or corruption. The default value is false. Acceptable, case sensitive values are true and false.

LOG_LEVEL

Optional

Value used to specify how verbose the logging is for Data Optimizer. The default value is INFO. Acceptable, case-sensitive values are ALERT, ERR, WARNING, INFO, and DEBUG. See Data Optimizer logging for more details about logging and log levels.

METRICS_FILE

Optional

Local file that Data Optimizer writes metrics to when prompted by the ldoctl metrics collect command. The METRICS_FILEvalue must be a fully-qualified file path starting at the system root (/). If a METRICS_FILE value is not defined, Data Optimizer writes metrics to the system journal. The parent directory must exist and the HDFS user using Data Optimizer must have write permission for the directory. See Monitor Data Optimizer for more information.

LOG_SDK

Optional

Local directory where detailed AWS S3 logs are saved. If the LOG_SDK parameter is specified and if LOG_LEVEL is set to DEBUG, Data Optimizer volumes log details about the S3 communication between the Data Optimizer instance and Content Platform. The directory must exist, must be a fully-qualified directory path starting at the system root (/), and the HDFS user using Data Optimizer must have write permission for the directory. See AWS S3 SDK logging for more information.

circle-info

The configuration file is located in the `/etc/ldo` directory on each HDFS DataNode on which both the Data Optimizer is installed, and the ARCHIVE volumes are configured.

General Data Optimizer Configuration for Ambari

circle-exclamation
circle-info

Do not include leading or trailing spaces if you copy and paste parameter values. Ambari and Cloudera Manager do not validate input.

Parameter
Description

ENDPOINT_TYPE

The type of S3 endpoint you are using. Acceptable, case sensitive values are HCP, HCPCS, and AWS. The default value is HCP. - If connecting to Hitachi Content Platform, use HCP.

  • If connecting to Virtual Storage Platform One Object, use HCPCS.

  • If connecting to Amazon S3, use AWS.

AWS_REGION

The AWS region that Ambari connects to. The AWS_REGION value is required if S3 Endpoint Type is AWS.

ENDPOINT

The S3 endpoint URL for the object storage service.- If the ENDPOINT_TYPE is HCP, use the form tenant.hcp_dns_name.

  • If the ENDPOINT_TYPE is HCPCS, use the form hcpcs_dns_name.

  • If the ENDPOINT_TYPE is AWS, you can leave the field blank or populate it with a region-specific S3 endpoint.

BUCKET

S3 bucket used on the object store for all the backend storage of the Data Optimizer instances.

ACCESS_KEY

S3 Access Key ID used to authenticate S3 requests to the object store.

SECRET_KEY

S3 Secret Key used to authenticate S3 requests to the object store.

ENDPOINT_SCHEME

S3 Connection Scheme or Endpoint Scheme. Acceptable, case sensitive values are https and http. The default value is https. If set to https, Data Optimizer uses TLS to encrypt all communication with object storage.

VERIFY_SSL_CERTIFICATE

Value used to specify whether to verify certificates within the Data Optimizer volume. Acceptable, case sensitive values are Enabled and Disabled. The default value is Enabled. |If the ENDPOINT_SCHEME parameter is:|Then set the VERIFY_SSL_CERTIFICATE parameter to:| |-----------------------------------------|-------------------------------------------------------| |https|Enabled| |https and the object store certificate is self-signed|Disabled|

By default, Content Platform uses a self-signed certificate that is not in the trust store on the HDFS DataNode. Disabling verification allows TLS negotiation to occur, despite the untrusted certificate. Disabling verification does not reduce the strength of TLS encryption, but it does disable endpoint authentication. It is a best practice to replace the Content Platform self-signed certificate with one signed by a trusted certificate authority. See the Hitachi Content Platform documentation for details.

MOUNT_POINT

HDFS DataNode local directory where the Data Optimizer instance is mounted. HDFS writes block replicas to the local directory you specify. The MOUNT_POINT parameter value must be a fully-qualified directory path starting at the system root (/).

VOLUME_STORAGE_LIMIT_GB

The storage capacity in GB of each Data Optimizer volume instance. If the combined usage of Data Optimizer volumes exceeds the quota allocated to their shared bucket on Content Platform, writes to those Data Optimizer volumes fail. The VOLUME_STORAGE_LIMIT_GB parameter value, multiplied by the number of Data Optimizer instances should not exceed the Content Platform quota. In fact, the Content Platform quota should include additional capacity for deleted versions and to account for asynchronous garbage collection services. HDFS writes only the amount of data to each Data Optimizer volume that is equal to or less than the amount specified in the HCP Bucket Storage Limit parameter, minus the reserved space (dfs.datanode.du.reserved).

CACHE_DIR

A local directory on the HDFS DataNode that Data Optimizer uses to store temporary files associated with open file handles. The CACHE DIR parameter must be a fully-qualified directory path starting at the system root (/).

MD_STORE_DIR

Local directory on each node used to store files associated with persisting the Data Optimizer local metadata store. The MD_STORE_DIR parameter value must be a fully-qualified directory path starting at the system root (/). Specify a value for MD_STORE_DIR when the CACHE_DIR directory is located is on volatile storage or if there is a more durable location for long term file persistence. Do not choose a volatile storage medium for this directory as it is intended to persist for the life of the Data Optimizer volume. If the files associated with metadata store persistence are lost or corrupted, you can recover them as explained in Recovering from local metadata store failure or corruption.

LOG_LEVEL

Value used to specify how verbose the logging is for Data Optimizer. The default value is WARNING. Acceptable, case sensitive values are ALERT, ERR, WARNING, INFO, and DEBUG. See Data Optimizer logging for details about logging and log levels.

LOG_SDK

Optional. Local directory where detailed AWS S3 logs are saved. If the LOG_SDK parameter is specified and if LOG_LEVEL is set to DEBUG, Data Optimizer volumes log details about the S3 communication between the Data Optimizer volume instance and Content Platform. The LOG_SDK parameter value must exist, must be a fully-qualified directory path starting at the system root (/), and the HDFS user using Data Optimizer must have write permission for the directory. See AWS S3 SDK logging for further details.

circle-info

The configuration file is located in the `/etc/ldo` directory on each HDFS DataNode on which both the Data Optimizer is installed, and the ARCHIVE volumes are configured.

Settings for HTTP/S Proxy Connections

In some cases, Data Optimizer is installed on a host that does not have direct access to the object storage service and must connect by a proxy. This is more likely to be the case when using a cloud storage provider such as Amazon Web Services. Using the settings in this section, you can configure Data Optimizer to use an http or https proxy. If a proxy is not required, leave these settings at their defaults.

Parameter
Description

PROXY

The IP address or domain name of the http or https proxy server, if required.

PROXY_PORT

The port that the proxy server listens on.

PROXY_SCHEME

The scheme is either http or https, depending on what the proxy server supports.

PROXY_USER

The user for the proxy server, if authentication is required.

PROXY_PASSWORD

The password for the proxy server, if authentication is required.

Recovery Specific Configuration for Ambari

Use the following parameter to configure the recovery mode for Ambari.

circle-exclamation
Parameter
Description

RECOVERY_MODE

Value used to determine whether recovery mode is enabled. The RECOVERY_MODE parameter controls the Data Optimizer authoritative versus non-authoritative behavior. Accepable values are Enabled and Disabled. The default value is Disabled.

Volume Monitor Configuration for Cloudera Manager only

Use the following parameter to configure the Volume Monitor interval for Cloudera Manager.

Parameter
Description

MONITOR_INTERVAL

Value used to specify how frequently, in minutes, the Volume Monitor checks the health of the Data Optimizer volume. As a best practice, set the interval to five minutes.

Last updated

Was this helpful?