# Data Optimizer configuration parameters

The Data Optimizer management interface distributes the configuration information to the Data Optimizer volume nodes for use by the Data Optimizer volume service.

{% hint style="warning" %}
Never modify the **BUCKET** and **MOUNT\_POINT** parameters in the Data Optimizer configuration file after the initial installation. Changing these values after installation breaks the instance because the Data Optimizer instance ID is calculated based on the values provided in these parameters.
{% endhint %}

{% hint style="info" %}
Do not include leading or trailing spaces if you copy and paste parameter values. Ambari and Cloudera Manager do not validate input.
{% endhint %}

<table data-header-hidden><thead><tr><th width="165">Parameter</th><th width="141">Requirement</th><th>Description</th></tr></thead><tbody><tr><td><strong>ENDPOINT</strong></td><td>Required</td><td>Endpoint address for Hitachi Content Platform. If the <strong>ENDPOINT_TYPE</strong> is <em>HCP</em>, use the form <code>tenant.hcp_dns_name</code>.</td></tr><tr><td><strong>ENDPOINT_TYPE</strong></td><td>Optional</td><td><p>Default endpoint type. Acceptable values are case sensitive. - If connecting to Hitachi Content Platform, use <em>HCP</em>.</p><ul><li>If connecting to Virtual Storage Platform One Object, use <em>HCPCS</em>.</li><li>If connecting to Amazon S3, use <em>AWS</em>.</li></ul></td></tr><tr><td><strong>BUCKET</strong></td><td>Required</td><td>Content Platform bucket name or a wildcard value of <em>instance_id</em>. You can use the unique ID generated by Content Platform (<em>instance_id</em>) as a wildcard to avoid name conflicts and to simplify configuration of the instances. Multiple instances can share a common configuration if you use the <em>instance_id</em> wildcard and all other values are identical. You cannot append or prepend the <em>instance_id</em> wildcard value to any other value. For example, <em>bucket_instance_id</em> is an invalid value. If Content Platform is properly configured, Data Optimizer creates its own bucket if the bucket does not already exist.</td></tr><tr><td><strong>ACCESS_KEY</strong></td><td>Required</td><td>S3 Access Key ID used to authenticate S3 requests to Content Platform.</td></tr><tr><td><strong>SECRET_KEY</strong></td><td>Required</td><td>S3 Secret Key used to authenticate S3 requests.</td></tr><tr><td><strong>PROTOCOL</strong></td><td>Optional</td><td>Protocol used to encrypt communication between Data Optimizer and Content Platform using TLS. The default value is https. Acceptable, case sensitive values are <em>https</em> and <em>http</em>.</td></tr><tr><td><strong>VERIFY_SSL_CERTIFICATE</strong></td><td>Optional</td><td>Value used to specify whether to verify certificates within Data Optimizer. Acceptable, case sensitive values are <em>true</em> and <em>false</em>. The default is value is true. If the <strong>VERIFY_SSL_CERTIFICATE</strong> parameter is set to <em>false</em>, certificate verification is disabled within Data Optimizer. Set this parameter to <em>false</em> when Content Platform is presenting a self-signed certificate, and you still want to use TLS to encrypt transmissions between Data Optimizer and Content Platform.</td></tr><tr><td><strong>MOUNT_POINT</strong></td><td>Required</td><td><p>HDFS DataNode local directory where Data Optimizer is mounted. The directory must exist and the HDFS user using Data Optimizer must have write permission for the directory. The directory must allow <code>rwx</code> permissions for the owner and owner’s group. For example:```<code>mkdir</code> <strong>MOUNT_POINT</strong>*&#x3C;mount point>*<code>chown</code> <em>user</em>:<em>group</em> <strong>MOUNT_POINT</strong>*&#x3C;mount point>*<code>chmod</code> 770 <strong>MOUNT_POINT</strong>*&#x3C;mount point>*</p><pre><code>
&#x3C;/td>&#x3C;/tr>&#x3C;tr>&#x3C;td>

**BUCKET\_STORAGE\_LIMIT\_GB**

\</td>\<td>

Required

\</td>\<td>

Size in GB to report as the total capacity of the volume. **CAUTION:**

If the usage exceeds the quota, or upper limit, on the volume’s Content Platform bucket, writes to the volume fail. Data Optimizer does not prevent writing to the volume if the usage exceeds the capacity.

As a best practice, specify a value that is less than the bucket quota, so that HDFS stops choosing the volume for writes before the volume exceeds its quota on Content Platform.

\</td>\</tr>\<tr>\<td>

**CACHE\_DIR**

\</td>\<td>

Required

\</td>\<td>

Directory that Data Optimizer uses to store temporary files associated with open file handles. If **MD\_STORE\_DIR** is not specified, Data Optimizer also uses this directory to store files associated with persisting the local metadata store. The directory must exist and the HDFS user using Data Optimizer must have write permission for the directory. The directory must allow `rwx` permissions for the owner and owner’s group. The **CACHE DIR** parameter must be a fully-qualified directory path starting at the system root (`/`). For example: \`\`\`
mkdir **CACHE\_DIR** *cache dir*
chown *user*:*group* **CACHE\_DIR** *cache dir*
chmod 770 **CACHE\_DIR** *cache dir* </code></pre></td></tr><tr><td><strong>MD\_STORE\_DIR</strong></td><td>Optional</td><td>Local directory used to store files associated with persisting the Data Optimizer local metadata store. The <strong>MD\_STORE\_DIR</strong> parameter value must be a fully-qualified directory path starting at the system root (<code>/</code>). If an <strong>MD\_STORE\_DIR</strong> value is not specified, the <strong>CACHE\_DIR</strong> directory is used. Specify a value for <strong>MD\_STORE\_DIR</strong> when the <strong>CACHE\_DIR</strong> directory is located is on volatile storage or if there is a more durable location for long term file persistence. Do not choose a volatile storage medium for this directory as it is intended to persist for the life of the Data Optimizer volume. For example, if you use transient storage for the <strong>CACHE\_DIR</strong> directory such as <code>RAM\_DISK</code>, you should specify a more durable location for the <strong>MD\_STORE\_DIR</strong> directory. In addition, if you have a more durable location, such as a RAID partition, and there is room for the metadata store files (up to 2.5 GB), you should specify a <strong>MD\_STORE\_DIR</strong> directory on that partition. If the files associated with metadata store persistence are lost or corrupted, you can recover them as explained in <a href="/pages/8egas9P3oLp2Z3XgMzOv">Recovering from local metadata store failure or corruption</a>.</td></tr><tr><td><strong>RECOVERY\_MODE</strong></td><td>Optional</td><td>Value used to specify whether recovery mode is enabled. Do not set the <strong>RECOVERY\_MODE</strong> parameter unless you have read and understood the section <a href="/pages/8egas9P3oLp2Z3XgMzOv">Recovering from local metadata store failure or corruption</a>. The default value is false. Acceptable, case sensitive values are <em>true</em> and <em>false</em>.</td></tr><tr><td><strong>LOG\_LEVEL</strong></td><td>Optional</td><td>Value used to specify how verbose the logging is for Data Optimizer. The default value is INFO. Acceptable, case-sensitive values are <em>ALERT</em>, <em>ERR</em>, <em>WARNING</em>, <em>INFO</em>, and <em>DEBUG</em>. See <a href="/pages/8CN2VYLXJRcCvT5807W5">Data Optimizer logging</a> for more details about logging and log levels.</td></tr><tr><td><strong>METRICS\_FILE</strong></td><td>Optional</td><td>Local file that Data Optimizer writes metrics to when prompted by the <code>ldoctl metrics collect</code> command. The <strong>METRICS\_FILE</strong>value must be a fully-qualified file path starting at the system root (<code>/</code>). If a <strong>METRICS\_FILE</strong> value is not defined, Data Optimizer writes metrics to the system journal. The parent directory must exist and the HDFS user using Data Optimizer must have write permission for the directory. See <a href="/pages/Lt5qSfgaHdEol4iOSjAW">Monitor Data Optimizer</a> for more information.</td></tr><tr><td><strong>LOG\_SDK</strong></td><td>Optional</td><td>Local directory where detailed AWS S3 logs are saved. If the <strong>LOG\_SDK</strong> parameter is specified and if <strong>LOG\_LEVEL</strong> is set to <em>DEBUG</em>, Data Optimizer volumes log details about the S3 communication between the Data Optimizer instance and Content Platform. The directory must exist, must be a fully-qualified directory path starting at the system root (<code>/</code>), and the HDFS user using Data Optimizer must have write permission for the directory. See <a href="/pages/WL9u1xHHgPM7AHvZclz6">AWS S3 SDK logging</a> for more information.</td></tr></tbody></table>

{% hint style="info" %}
The configuration file is located in the \``/etc/ldo`\` directory on each HDFS DataNode on which both the Data Optimizer is installed, and the **ARCHIVE** volumes are configured.
{% endhint %}

## General Data Optimizer Configuration for Ambari

{% hint style="warning" %}
Never modify the **BUCKET** and **MOUNT\_POINT** parameters in the Data Optimizer configuration file after the initial installation. Changing these values after installation breaks the instance because the Data Optimizer instance ID is calculated based on the values provided in these parameters.
{% endhint %}

{% hint style="info" %}
Do not include leading or trailing spaces if you copy and paste parameter values. Ambari and Cloudera Manager do not validate input.
{% endhint %}

<table><thead><tr><th width="174">Parameter</th><th>Description</th></tr></thead><tbody><tr><td><strong>ENDPOINT_TYPE</strong></td><td><p>The type of S3 endpoint you are using. Acceptable, case sensitive values are <em>HCP</em>, <em>HCPCS</em>, and <em>AWS</em>. The default value is HCP. - If connecting to Hitachi Content Platform, use <em>HCP</em>.</p><ul><li>If connecting to Virtual Storage Platform One Object, use <em>HCPCS</em>.</li><li>If connecting to Amazon S3, use <em>AWS</em>.</li></ul></td></tr><tr><td><strong>AWS_REGION</strong></td><td>The AWS region that Ambari connects to. The AWS_REGION value is required if <em>S3 Endpoint Type</em> is <em>AWS</em>.</td></tr><tr><td><strong>ENDPOINT</strong></td><td><p>The S3 endpoint URL for the object storage service.- If the <strong>ENDPOINT_TYPE</strong> is <em>HCP</em>, use the form <code>tenant.hcp_dns_name</code>.</p><ul><li>If the <strong>ENDPOINT_TYPE</strong> is <em>HCPCS</em>, use the form <code>hcpcs_dns_name</code>.</li><li>If the <strong>ENDPOINT_TYPE</strong> is <em>AWS</em>, you can leave the field blank or populate it with a region-specific S3 endpoint.</li></ul></td></tr><tr><td><strong>BUCKET</strong></td><td>S3 bucket used on the object store for all the backend storage of the Data Optimizer instances.</td></tr><tr><td><strong>ACCESS_KEY</strong></td><td>S3 Access Key ID used to authenticate S3 requests to the object store.</td></tr><tr><td><strong>SECRET_KEY</strong></td><td>S3 Secret Key used to authenticate S3 requests to the object store.</td></tr><tr><td><strong>ENDPOINT_SCHEME</strong></td><td>S3 Connection Scheme or Endpoint Scheme. Acceptable, case sensitive values are <em>https</em> and <em>http</em>. The default value is https. If set to <em>https</em>, Data Optimizer uses TLS to encrypt all communication with object storage.</td></tr><tr><td><strong>VERIFY_SSL_CERTIFICATE</strong></td><td><p>Value used to specify whether to verify certificates within the Data Optimizer volume. Acceptable, case sensitive values are <em>Enabled</em> and <em>Disabled</em>. The default value is Enabled. |If the <strong>ENDPOINT_SCHEME</strong> parameter is:|Then set the <strong>VERIFY_SSL_CERTIFICATE</strong> parameter to:|<br>|-----------------------------------------|-------------------------------------------------------|<br>|<em>https</em>|<em>Enabled</em>|<br>|<em>https</em> and the object store certificate is self-signed|<em>Disabled</em>|</p><p>By default, Content Platform uses a self-signed certificate that is not in the trust store on the HDFS DataNode. Disabling verification allows TLS negotiation to occur, despite the untrusted certificate. Disabling verification does not reduce the strength of TLS encryption, but it does disable endpoint authentication. It is a best practice to replace the Content Platform self-signed certificate with one signed by a trusted certificate authority. See the <strong>Hitachi Content Platform</strong> documentation for details.</p></td></tr><tr><td><strong>MOUNT_POINT</strong></td><td>HDFS DataNode local directory where the Data Optimizer instance is mounted. HDFS writes block replicas to the local directory you specify. The <strong>MOUNT_POINT</strong> parameter value must be a fully-qualified directory path starting at the system root (<code>/</code>).</td></tr><tr><td><strong>VOLUME_STORAGE_LIMIT_GB</strong></td><td>The storage capacity in GB of each Data Optimizer volume instance. If the combined usage of Data Optimizer volumes exceeds the quota allocated to their shared bucket on Content Platform, writes to those Data Optimizer volumes fail. The <strong>VOLUME_STORAGE_LIMIT_GB</strong> parameter value, multiplied by the number of Data Optimizer instances should not exceed the Content Platform quota. In fact, the Content Platform quota should include additional capacity for deleted versions and to account for asynchronous garbage collection services. HDFS writes only the amount of data to each Data Optimizer volume that is equal to or less than the amount specified in the <strong>HCP Bucket Storage Limit</strong> parameter, minus the reserved space (<code>dfs.datanode.du.reserved</code>).</td></tr><tr><td><strong>CACHE_DIR</strong></td><td>A local directory on the HDFS DataNode that Data Optimizer uses to store temporary files associated with open file handles. The <strong>CACHE DIR</strong> parameter must be a fully-qualified directory path starting at the system root (<code>/</code>).</td></tr><tr><td><strong>MD_STORE_DIR</strong></td><td>Local directory on each node used to store files associated with persisting the Data Optimizer local metadata store. The <strong>MD_STORE_DIR</strong> parameter value must be a fully-qualified directory path starting at the system root (<code>/</code>). Specify a value for <strong>MD_STORE_DIR</strong> when the <strong>CACHE_DIR</strong> directory is located is on volatile storage or if there is a more durable location for long term file persistence. Do not choose a volatile storage medium for this directory as it is intended to persist for the life of the Data Optimizer volume. If the files associated with metadata store persistence are lost or corrupted, you can recover them as explained in <a href="/pages/8egas9P3oLp2Z3XgMzOv">Recovering from local metadata store failure or corruption</a>.</td></tr><tr><td><strong>LOG_LEVEL</strong></td><td>Value used to specify how verbose the logging is for Data Optimizer. The default value is WARNING. Acceptable, case sensitive values are <em>ALERT</em>, <em>ERR</em>, <em>WARNING</em>, <em>INFO</em>, and <em>DEBUG</em>. See <a href="/pages/8CN2VYLXJRcCvT5807W5">Data Optimizer logging</a> for details about logging and log levels.</td></tr><tr><td><strong>LOG_SDK</strong></td><td>Optional. Local directory where detailed AWS S3 logs are saved. If the <strong>LOG_SDK</strong> parameter is specified and if <strong>LOG_LEVEL</strong> is set to <em>DEBUG</em>, Data Optimizer volumes log details about the S3 communication between the Data Optimizer volume instance and Content Platform. The <code>LOG_SDK</code> parameter value must exist, must be a fully-qualified directory path starting at the system root (<code>/</code>), and the HDFS user using Data Optimizer must have write permission for the directory. See <a href="/pages/WL9u1xHHgPM7AHvZclz6">AWS S3 SDK logging</a> for further details.</td></tr></tbody></table>

{% hint style="info" %}
The configuration file is located in the \``/etc/ldo`\` directory on each HDFS DataNode on which both the Data Optimizer is installed, and the **ARCHIVE** volumes are configured.
{% endhint %}

## Settings for HTTP/S Proxy Connections

In some cases, Data Optimizer is installed on a host that does not have direct access to the object storage service and must connect by a proxy. This is more likely to be the case when using a cloud storage provider such as Amazon Web Services. Using the settings in this section, you can configure Data Optimizer to use an http or https proxy. If a proxy is not required, leave these settings at their defaults.

| Parameter           | Description                                                                          |
| ------------------- | ------------------------------------------------------------------------------------ |
| **PROXY**           | The IP address or domain name of the http or https proxy server, if required.        |
| **PROXY\_PORT**     | The port that the proxy server listens on.                                           |
| **PROXY\_SCHEME**   | The scheme is either *http* or *https*, depending on what the proxy server supports. |
| **PROXY\_USER**     | The user for the proxy server, if authentication is required.                        |
| **PROXY\_PASSWORD** | <p></p><p>The password for the proxy server, if authentication is required.</p>      |

## Recovery Specific Configuration for Ambari

Use the following parameter to configure the recovery mode for Ambari.

{% hint style="warning" %}
Do not enable this parameter unless you have familiarized yourself with the [Maintain Data Optimizer metadata](/pdc-10.2-data-optimizer/pdso-install-landing-page/pdso-install-in-hadoop-cluster/pdso-maintain-landing-page/pdso-maintain-data-optimizer-metadata-cp.md) section and understand the implications.
{% endhint %}

<table><thead><tr><th width="204">Parameter</th><th>Description</th></tr></thead><tbody><tr><td><strong>RECOVERY_MODE</strong></td><td>Value used to determine whether recovery mode is enabled. The <strong>RECOVERY_MODE</strong> parameter controls the Data Optimizer authoritative versus non-authoritative behavior. Accepable values are <em>Enabled</em> and <em>Disabled</em>. The default value is Disabled.</td></tr></tbody></table>

## Volume Monitor Configuration for Cloudera Manager only

Use the following parameter to configure the Volume Monitor interval for Cloudera Manager.

<table><thead><tr><th width="227">Parameter</th><th>Description</th></tr></thead><tbody><tr><td><strong>MONITOR_INTERVAL</strong></td><td>Value used to specify how frequently, in minutes, the Volume Monitor checks the health of the Data Optimizer volume. As a best practice, set the interval to five minutes.</td></tr></tbody></table>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.pentaho.com/pdc-10.2-data-optimizer/pdso-install-landing-page/pdso-install-in-hadoop-cluster/pdso-configure-data-storage-optimizer/pdso-configuration-parameters.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
