Data Optimizer configuration parameters
The Data Optimizer management interface distributes the configuration information to the Data Optimizer volume nodes for use by the Data Optimizer volume service.
Never modify the BUCKET and MOUNT_POINT parameters in the Data Optimizer configuration file after the initial installation. Changing these values after installation breaks the instance because the Data Optimizer instance ID is calculated based on the values provided in these parameters.
ENDPOINT
Required
Endpoint address for Hitachi Content Platform. If the ENDPOINT_TYPE is HCP, use the form tenant.hcp_dns_name.
ENDPOINT_TYPE
Optional
Default endpoint type. Acceptable values are case sensitive. - If connecting to Hitachi Content Platform, use HCP.
If connecting to Virtual Storage Platform One Object, use HCPCS.
If connecting to Amazon S3, use AWS.
BUCKET
Required
Content Platform bucket name or a wildcard value of instance_id. You can use the unique ID generated by Content Platform (instance_id) as a wildcard to avoid name conflicts and to simplify configuration of the instances. Multiple instances can share a common configuration if you use the instance_id wildcard and all other values are identical. You cannot append or prepend the instance_id wildcard value to any other value. For example, bucket_instance_id is an invalid value. If Content Platform is properly configured, Data Optimizer creates its own bucket if the bucket does not already exist.
ACCESS_KEY
Required
S3 Access Key ID used to authenticate S3 requests to Content Platform.
SECRET_KEY
Required
S3 Secret Key used to authenticate S3 requests.
PROTOCOL
Optional
Protocol used to encrypt communication between Data Optimizer and Content Platform using TLS. The default value is https. Acceptable, case sensitive values are https and http.
VERIFY_SSL_CERTIFICATE
Optional
Value used to specify whether to verify certificates within Data Optimizer. Acceptable, case sensitive values are true and false. The default is value is true. If the VERIFY_SSL_CERTIFICATE parameter is set to false, certificate verification is disabled within Data Optimizer. Set this parameter to false when Content Platform is presenting a self-signed certificate, and you still want to use TLS to encrypt transmissions between Data Optimizer and Content Platform.
MOUNT_POINT
Required
HDFS DataNode local directory where Data Optimizer is mounted. The directory must exist and the HDFS user using Data Optimizer must have write permission for the directory. The directory must allow rwx permissions for the owner and owner’s group. For example:```mkdir MOUNT_POINT*<mount point>*chown user:group MOUNT_POINT*<mount point>*chmod 770 MOUNT_POINT*<mount point>*
MD_STORE_DIR
Optional
Local directory used to store files associated with persisting the Data Optimizer local metadata store. The MD_STORE_DIR parameter value must be a fully-qualified directory path starting at the system root (/). If an MD_STORE_DIR value is not specified, the CACHE_DIR directory is used. Specify a value for MD_STORE_DIR when the CACHE_DIR directory is located is on volatile storage or if there is a more durable location for long term file persistence. Do not choose a volatile storage medium for this directory as it is intended to persist for the life of the Data Optimizer volume. For example, if you use transient storage for the CACHE_DIR directory such as RAM_DISK, you should specify a more durable location for the MD_STORE_DIR directory. In addition, if you have a more durable location, such as a RAID partition, and there is room for the metadata store files (up to 2.5 GB), you should specify a MD_STORE_DIR directory on that partition. If the files associated with metadata store persistence are lost or corrupted, you can recover them as explained in Recovering from local metadata store failure or corruption.
RECOVERY_MODE
Optional
Value used to specify whether recovery mode is enabled. Do not set the RECOVERY_MODE parameter unless you have read and understood the section Recovering from local metadata store failure or corruption. The default value is false. Acceptable, case sensitive values are true and false.
LOG_LEVEL
Optional
Value used to specify how verbose the logging is for Data Optimizer. The default value is INFO. Acceptable, case-sensitive values are ALERT, ERR, WARNING, INFO, and DEBUG. See Data Optimizer logging for more details about logging and log levels.
METRICS_FILE
Optional
Local file that Data Optimizer writes metrics to when prompted by the ldoctl metrics collect command. The METRICS_FILEvalue must be a fully-qualified file path starting at the system root (/). If a METRICS_FILE value is not defined, Data Optimizer writes metrics to the system journal. The parent directory must exist and the HDFS user using Data Optimizer must have write permission for the directory. See Monitor Data Optimizer for more information.
LOG_SDK
Optional
Local directory where detailed AWS S3 logs are saved. If the LOG_SDK parameter is specified and if LOG_LEVEL is set to DEBUG, Data Optimizer volumes log details about the S3 communication between the Data Optimizer instance and Content Platform. The directory must exist, must be a fully-qualified directory path starting at the system root (/), and the HDFS user using Data Optimizer must have write permission for the directory. See AWS S3 SDK logging for more information.
General Data Optimizer Configuration for Ambari
Never modify the BUCKET and MOUNT_POINT parameters in the Data Optimizer configuration file after the initial installation. Changing these values after installation breaks the instance because the Data Optimizer instance ID is calculated based on the values provided in these parameters.
ENDPOINT_TYPE
The type of S3 endpoint you are using. Acceptable, case sensitive values are HCP, HCPCS, and AWS. The default value is HCP. - If connecting to Hitachi Content Platform, use HCP.
If connecting to Virtual Storage Platform One Object, use HCPCS.
If connecting to Amazon S3, use AWS.
AWS_REGION
The AWS region that Ambari connects to. The AWS_REGION value is required if S3 Endpoint Type is AWS.
ENDPOINT
The S3 endpoint URL for the object storage service.- If the ENDPOINT_TYPE is HCP, use the form tenant.hcp_dns_name.
If the ENDPOINT_TYPE is HCPCS, use the form
hcpcs_dns_name.If the ENDPOINT_TYPE is AWS, you can leave the field blank or populate it with a region-specific S3 endpoint.
BUCKET
S3 bucket used on the object store for all the backend storage of the Data Optimizer instances.
ACCESS_KEY
S3 Access Key ID used to authenticate S3 requests to the object store.
SECRET_KEY
S3 Secret Key used to authenticate S3 requests to the object store.
ENDPOINT_SCHEME
S3 Connection Scheme or Endpoint Scheme. Acceptable, case sensitive values are https and http. The default value is https. If set to https, Data Optimizer uses TLS to encrypt all communication with object storage.
VERIFY_SSL_CERTIFICATE
Value used to specify whether to verify certificates within the Data Optimizer volume. Acceptable, case sensitive values are Enabled and Disabled. The default value is Enabled. |If the ENDPOINT_SCHEME parameter is:|Then set the VERIFY_SSL_CERTIFICATE parameter to:| |-----------------------------------------|-------------------------------------------------------| |https|Enabled| |https and the object store certificate is self-signed|Disabled|
By default, Content Platform uses a self-signed certificate that is not in the trust store on the HDFS DataNode. Disabling verification allows TLS negotiation to occur, despite the untrusted certificate. Disabling verification does not reduce the strength of TLS encryption, but it does disable endpoint authentication. It is a best practice to replace the Content Platform self-signed certificate with one signed by a trusted certificate authority. See the Hitachi Content Platform documentation for details.
MOUNT_POINT
HDFS DataNode local directory where the Data Optimizer instance is mounted. HDFS writes block replicas to the local directory you specify. The MOUNT_POINT parameter value must be a fully-qualified directory path starting at the system root (/).
VOLUME_STORAGE_LIMIT_GB
The storage capacity in GB of each Data Optimizer volume instance. If the combined usage of Data Optimizer volumes exceeds the quota allocated to their shared bucket on Content Platform, writes to those Data Optimizer volumes fail. The VOLUME_STORAGE_LIMIT_GB parameter value, multiplied by the number of Data Optimizer instances should not exceed the Content Platform quota. In fact, the Content Platform quota should include additional capacity for deleted versions and to account for asynchronous garbage collection services. HDFS writes only the amount of data to each Data Optimizer volume that is equal to or less than the amount specified in the HCP Bucket Storage Limit parameter, minus the reserved space (dfs.datanode.du.reserved).
CACHE_DIR
A local directory on the HDFS DataNode that Data Optimizer uses to store temporary files associated with open file handles. The CACHE DIR parameter must be a fully-qualified directory path starting at the system root (/).
MD_STORE_DIR
Local directory on each node used to store files associated with persisting the Data Optimizer local metadata store. The MD_STORE_DIR parameter value must be a fully-qualified directory path starting at the system root (/). Specify a value for MD_STORE_DIR when the CACHE_DIR directory is located is on volatile storage or if there is a more durable location for long term file persistence. Do not choose a volatile storage medium for this directory as it is intended to persist for the life of the Data Optimizer volume. If the files associated with metadata store persistence are lost or corrupted, you can recover them as explained in Recovering from local metadata store failure or corruption.
LOG_LEVEL
Value used to specify how verbose the logging is for Data Optimizer. The default value is WARNING. Acceptable, case sensitive values are ALERT, ERR, WARNING, INFO, and DEBUG. See Data Optimizer logging for details about logging and log levels.
LOG_SDK
Optional. Local directory where detailed AWS S3 logs are saved. If the LOG_SDK parameter is specified and if LOG_LEVEL is set to DEBUG, Data Optimizer volumes log details about the S3 communication between the Data Optimizer volume instance and Content Platform. The LOG_SDK parameter value must exist, must be a fully-qualified directory path starting at the system root (/), and the HDFS user using Data Optimizer must have write permission for the directory. See AWS S3 SDK logging for further details.
Settings for HTTP/S Proxy Connections
In some cases, Data Optimizer is installed on a host that does not have direct access to the object storage service and must connect by a proxy. This is more likely to be the case when using a cloud storage provider such as Amazon Web Services. Using the settings in this section, you can configure Data Optimizer to use an http or https proxy. If a proxy is not required, leave these settings at their defaults.
PROXY
The IP address or domain name of the http or https proxy server, if required.
PROXY_PORT
The port that the proxy server listens on.
PROXY_SCHEME
The scheme is either http or https, depending on what the proxy server supports.
PROXY_USER
The user for the proxy server, if authentication is required.
PROXY_PASSWORD
The password for the proxy server, if authentication is required.
Recovery Specific Configuration for Ambari
Use the following parameter to configure the recovery mode for Ambari.
Do not enable this parameter unless you have familiarized yourself with the Maintain Data Optimizer metadata section and understand the implications.
RECOVERY_MODE
Value used to determine whether recovery mode is enabled. The RECOVERY_MODE parameter controls the Data Optimizer authoritative versus non-authoritative behavior. Accepable values are Enabled and Disabled. The default value is Disabled.
Volume Monitor Configuration for Cloudera Manager only
Use the following parameter to configure the Volume Monitor interval for Cloudera Manager.
MONITOR_INTERVAL
Value used to specify how frequently, in minutes, the Volume Monitor checks the health of the Data Optimizer volume. As a best practice, set the interval to five minutes.
Last updated
Was this helpful?

