Maintain Pentaho Data Optimizer
Use the following tools to maintain Data Optimizer:
Maintain Data Optimizer metadata
Each Pentaho Data Optimizer volume maintains an authoritative local data store that contains the metadata of all files and directories stored by the instance. The term “authoritative” means that the metadata in the store is complete, current, and trusted. The authoritative local metadata store gives significant performance benefits, allowing the Data Optimizer volume to handle metadata inquiries internally without making external API calls to the S3 storage device.
If the local metadata store for a Data Optimizer volume becomes lost or corrupted, you need to recover the metadata store. For example, corruption might occur if the metadata is stored on a failed disk drive. Recovering the store involves relocating the store, putting the Data Optimizer volume in recovery mode, and running the recovery procedure.
Data Optimizer recovery mode
In recovery mode, the Pentaho Data Optimizer volume operates with its metadata store in non-authoritative mode while recovering file system metadata. The term “non-authoritative” means that the contents of the local metadata store might be incomplete. As a result, certain metadata inquiries can only be serviced by making external API calls to the S3 storage device. The metadata retrieved from the S3 storage device is cached to the local metadata store, reducing the need for future external API calls. Data Optimizer volumes are fully functional while in recovery mode, though performance might be affected.
Authoritative vs non-authoritative
When authoritative (RECOVERY_MODE=false), the store is considered complete, that means if a file or folder is not in the store, it does not exist. In authoritative mode, all metadata requests are serviced from the local metadata store.
When non-authoritative (RECOVERY_MODE=true), the store is considered incomplete. The absence of a file or folder in the store can indicate that the entity does not exist, or it mean that the file or directory has not been accessed since recovery began. In non-authoritative mode, directory listings and metadata requests for records not previously cached in the local metadata store must be serviced by querying the S3 bucket.
Recovering from local metadata store failure or corruption
If the local metadata store fails, perform the following steps on the affected DataNode:
- Put the DataNode in maintenance mode. 
- Stop the HDFS DataNode software. 
- Shut down Data Optimizer. 
- Choose one of the following actions: - If the metadata store is corrupted, delete all metadata store files ( - metadata.db3*) stored in the folder identified by the- CACHE_DIRor- MD_STORE_DIR(if present) configuration parameter.
- If the drive hosting the folder identified by the - CACHE_DIRor- MD_STORE_DIR(if present) configuration parameter has failed, then change the parameter to point to a folder on a healthy drive.
 
- Edit the Data Optimizer configuration file for the DataNode and add or edit the property - RECOVERY_MODE, to set the value to- true.
- Restart Data Optimizer. 
- Restart the HDFS DataNode software. 
- Take the HDFS DataNode out of maintenance mode. 
After completing the these steps, the Data Optimizer software runs in a passive, opportunistic recovery mode and is fully functional. As file system operations are performed, Data Optimizer rebuilds its local metadata store opportunistically.
Note: Passive recovery does not fully restore the local metadata store to an authoritative state; active recovery is required for that purpose.
Restoring the metadata store to its authoritative state
To force an active recovery, perform the following steps:
- SSH to the DataNode in question. 
- Turn logging for the Data Optimizer instance up to DEBUG. See Adjusting log level at run time in Log levels. 
- Put the DataNode in maintenance mode. - Note: To ensure that all metadata has been recovered, run the - ducommand at least twice while no other processes are modifying the contents of the mount_point folder. After you have had two concurrent successful- dupasses, with the number of records in the metadata store (- md_cache_size) being the same after the last two successful passes, you can be confident that all metadata has been recovered.
- Run the following commands: - `ldoctl metrics collect sudo -u *user* du -s *mount\_point*; [ $? == 0 ] && (echo "Success") || (echo "Failure") ` `ldoctl metrics collect sudo -u *user* du -s *mount\_point*; [ $? == 0 ] && (echo "Success") || (echo "Failure") ` `ldoctl metrics collect journalctl -et ldo -g '"md_cache_size"' `- user is the username of the user matching the - UIDparameter in the configuration file, that is the user with access to the file system.
- mount_point is the folder matching the - MOUNT_POINTparameter in the configuration file, that is where Data Optimizer is mounted.
- The - ducommand attempts a recursive folder listing and metadata poll of the entire file system.
- The - ldoctlcommand emits metrics to the systemd journal. Run this command before and after each- ducommand.
- The - journalctlcommand grabs the metrics you are interested in from the journal. If the- ducommand finishes successfully it prints "Success" on the window upon completion; otherwise, it prints "Failure."
 - The following example shows the command with two successful results: - $ `ldoctl metrics collect` $ `sudo -u <user> du -s <mount_point>; [ $? == 0 ] && (echo "Success") || (echo "Failure")` 321577600 /mnt/ldo/data/ Success $ `ldoctl metrics collect` $ `sudo -u <user> du -s <mount_point>; [ $? == 0 ] && (echo "Success") || (echo "Failure")` 321577600 /mnt/ldo/data/ Success $ `ldoctl metrics collect` $ `journalctl -et ldo -g '"md_cache_size"'` … [t:3892][metrics.c:231] {"type": "counter", "event": "md_cache_size", "total": 1}, … [t:3892][metrics.c:231] {"type": "counter", "event": "md_cache_size", "total": 2512325}, … [t:3892][metrics.c:231] {"type": "counter", "event": "md_cache_size", "total": 2512325},
- If the - ducommand completes successfully twice in a row, compare the last two totals for md_cache_size metrics.- If the numbers match, all metadata from the S3 bucket has been recovered locally in the metadata store. 
- Perform the following steps based on the success or failure of the previous command and the comparisons of md_cache_size metrics: - If the result of the previous command is "Failure," or the last two md_cache_size totals do not match: - Repeat the process, running the - du,- ldoctl, and- journalctcommands until you have two concurrent successful results with matching totals.- `sudo -u *user* du -s *mount\_point*; [ $? == 0 ] && (echo "Success") || (echo "Failure") ldoctl metrics collect journalctl -et ldo -g '"md_cache_size"' `
- If the - ducommand fails, it reports a list of the failed files or folders. If the list of failed resources is shrinking or changing, or the totals reported by the metrics are increasing, then the active recovery is making progress.
- If over several passes the - ducommand reports “Failure,” and the list of failing resources does not change, that might indicate the active recovery is not making progress and additional troubleshooting is required. See the Pentaho Customer Portal.
 
- If the result of the previous command is "Success" and the last two md_cache_size totals match: - Edit the Data Optimizer configuration file for the DataNode and find the property - RECOVERY_MODE. Remove the property or set the value to- false.
- Use the - ldoctlcommand line tool to reload the configuration file and take the software out of recovery mode as follows:- ldoctl config reload
- Take the DataNode out of maintenance mode. 
 
 
Data Optimizer alerts
Pentaho Data Optimizer gives the alerts for Ambari. In addition to standard service alerts that Ambari gives for all services, the Data Optimizer installation delivers an alert called PDO metadata store verification. This alert monitors the volume’s metadata database and alerts users if there is evidence of corruption. In the event of corruption see: Maintain Data Optimizer metadata for possible solutions.
The ldoctl command line utility
You can access the following ldoctl command line utility functions to help maintain and monitor Data Optimizer:
- Reloading the configuration file at run time to apply new configurations without restarting the software. 
- Collecting and marking logs to identify specific events and tracking the performance or issues. 
- Collecting performance metrics. 
You can find the ldoctl utility documention in man pages and inline help. You can also find examples of how to use this utility throughout the guide.
Troubleshoot Data Optimizer
Use the following best practices when troubleshooting:
Access the Data Optimizer volumes directly
When necessary, you can access the Data Optimizer volume directly for troubleshooting. This might include performing directory listings, checking file status, or running the df command to report file system usage. In all these cases, access the volume as either the root user or as the System User specified in the Data Optimizer configuration. The following examples show how to directly access the volumes:
- Attempt to access as - svcusruser (Permission denied): Attempting to access the Data Optimizer mount as the- svcusruser might result in a "permission denied" error.- [svccusr]# ls -l /ldo/mnt ls: cannot access /ldo/mnt: Permission denied d????????? ? ? ? ? ? mnt
- Successful access as - rootand- hdfsusers:- Using the - rootor- hdfsusers for the- lsoperation succeeds.
- The mount is owned by the - hdfsuser and- hdfsgroup, with the- hdfsuser having full- rwx(read, write, execute) permissions, and other users in the- hdfsgroup having- rx(read, execute) permissions.
 - [svccusr]# sudo ls -l /ldo/mnt drwxr-x--- 1 hdfs hdfs 0 Aug 8 1995 dn [svccusr]# sudo -u hdfs ls -l /ldo/mnt drwxr-x--- 1 hdfs hdfs 0 Aug 8 1995 dn
- Failed access using - sudoto the- hdfsgroup:- Sudoing to the - hdfsgroup might still result in a "permission denied" error when accessing the LDO mount.
- Ensure you use either the root user or the owner to access Data Storage Optimizer mounted volumes. 
 - [svccusr]# sudo -g hdfs ls -l /ldo/mnt ls: cannot access /ldo/mnt: Permission denied
Troubleshooting using direct access
Perform the following steps to use sudo for basic file operations in the Data Optimizer volume to troubleshoot through direct access:
- Create and work in your own folder. - Do not write to or modify folders created by HDFS. 
- Create your own folder in the Data Optimizer volume. 
- Write or copy files into your folder. 
 
- Verify with an S3 client: - Verify that the folders and files are in the bucket as expected using an S3 client. 
- Read the files back and compute checksums to verify the content. 
 
- Refer to logs for issues: - Refer to the Data Optimizer logs if you encounter any issues. 
- Increase the log level to DEBUG if necessary. 
 
- Enable AWS SDK logging for network issues: - If the error message indicates the issue might be between the Data Optimizer instance and the S3 bucket (S3 SDK, Network, HCP, and so on.), enable AWS SDK logging to gain further insights. 
 
Tiering HDFS Blocks to Data Optimizer
Once you have verified that the Data Storage Optimizer is functioning correctly with the direct access method, the next step is to ensure that the datanode can tier block replicas to Data Optimizer. To test HDFS tiering to Data Optimizer, you need to set storage policies in HDFS and write to COLD folders or tier COLD files using the mover service.
Perform the following steps to verify block storage:
- Use the - hdfs fsckcommand with the- --blocks --files --locations- arguments to confirm that the blocks for all files set to the COLD storage policy are on ARCHIVE storage. 
- Perform recursive directory listings of the Data Optimizer mount points on the datanodes using the - findor- treecommand.
- Compare these listings with similar listings from the Data Optimizer bucket using the AWS CLI or other S3 clients. Ensure that everything HDFS wrote to the Data Optimizer mount was also written to the Data Optimizer bucket. 
To get Instance ID:
- Use the command - journalctl -t ldo | grep "instance id"to retrieve the ID for the Data Optimizer volume on a node and match it to the top-level folder in the S3 bucket.
- You can also find the instance ID in the - .lock.chillfsfile in the Data Optimizer instance’s cache directory.
Once you have validated that the Data Optimizer instance is functioning properly with the direct access method, and HDFS is configured to use Data Optimizer volumes as ARCHIVE storage, you can verify that HDFS files are properly tiering to Data Optimizer.
You can cause HDFS blocks to be written to Data Optimizer volumes in two ways. The first is to create a folder in HDFS, set the storage policy on the folder to COLD/WARM, and then to write files to that folder. The second is to set the storage policy for existing HDFS files and folders to COLD or WARM and then run the mover service, targeting those files and folders with the required storage policy.
To write directly to a COLD directory, create the directory in HDFS, set the policy on the directory to cold, and write files to the cold directory:

If using the mover service, you should first identify the locations of the blocks you will be tiering to Data Optimizer. To do this use the hdfs fsck command with the -blocks -files -locations arguments. This will provide a list of the current block replicas and their datanode location.
For more information on HDFS storage types, storage policies, and the mover service, see HDFS Heterogeneous Storage documentation for your Hadoop distribution.
Data Optimizer logging
You can acquire Data Optimizer logs through Cloudera's and Ambari's log collectors or by directly accessing the log files.
Viewing logs on Cloudera
To view logs through Cloudera, go to Diagnostics > Logs. All logs collected by Cloudera will be present here. To filter out logs specific to Data Optimizer, enter Data Storage Optimizer in the Services filter. You can view and download specific log files through Cloudera’s log viewer.
Viewing logs on Ambari
Depending on the version of Ambari you are using, the Log Search service, and other services it depends on, you might first need to install and configure it. If the Log Search service was installed before installing Data Optimizer, restart the Log Search service to begin indexing the logs in Data Optimizer. Without this service, Data Optimizer still logs; however, logs will not be integrated into Ambari, and you should access them natively.
To view logs through Ambari, use the Log Search UI link in the Quick Links drop-down menu in the Log Search service.

To isolate only Data Optimizer logs, in the Troubleshooting and Service Logs tabs, include the ldo_volume as part of the component. This is the default component for Data Optimizer, but can be changed in Advanced ldo-logsearch-conf configurations.
Viewing logs natively
You can access logs, located at /var/log/ldo, with the latest logs located at /var/log/ldo/ldo.log with logs rotating when it reaches 128 MB.
Note: Configurations for Data Optimizer cannot be changed.
Marking logs
Inserting entries into the logs can help mark the beginning or end of a test or series of tests. The ldoctl command line utility gives a command to easily insert entries into the log. Use the following command to insert a custom message into the log:
ldoctl logs mark "Message to put in the log."Log format
The Data Optimizer log format is as follows:
<date> <LOG_LEVEL> ldo.<component>:<line>: [p:<pid>][t:<thread>] <message>
Note: The date format is YYYY-mm-dd HH:MM:ss,ms.
Log levels
By default, the Data Optimizer logs events at the INFO level. This level logs rare events such as startup and shutdown, and events that should not occur, such as errors or warnings. You can adjust the logging level as necessary based on the following levels:
- ALERT: Internal API Violated, Possibly a Bug.
- ERROR: An Error Occurred, Immediate Action Required.
- WARNING: A Significant Event, No Immediate Action Required.
- INFO: Notable Events in Normal Product Flow, No Action Required.
- DEBUG: Verbose Logging, Useful for Troubleshooting.
Each log level includes itself and all the levels above it in the list. For example, WARNING includes ALERT, ERROR, and WARNING messages. DEBUG includes all levels.
The desired logging level is specified by setting the LOG_LEVEL parameter in the configuration file. When Data Optimizer starts up it reads this parameter and sets its logging level accordingly. Typically, you would not want to set this level to anything more than WARNING or INFO, as that might result in unnecessarily verbose logging to your systemd journal.
Adjusting log level at run time
In order to troubleshoot an issue, you might want to increase the logging level without restarting the software, as a restart might resolve the issue before you have an opportunity to determine the cause. In this case, the ldoctl can be used to get and set the log level of a live running instance:
ldoctl -m <mount_pont> get
ldoctl -m <mount_point> set <log_level>Note: Setting the logl level through ldoctl will not persist if ldo is restarted.
AWS S3 SDK logging
In some cases, particularly when troubleshooting issues between the Data Optimizer instance and the Hitachi Content Platform,you may want to see the detailed AWS SDK logs. These logs include many client-side details as to the content of the requests being sent to the HCP and the detailed content of the responses received. To enable this logging the LOG_SDK configuration parameter must be properly set, and the LOG_LEVEL parameter set to DEBUG. This logging is enabled only at start time and cannot be enabled by reloading the configuration file. When enabled, details about the communication between the Data Optimizer instance and the HCP are logged to a file called ldo_aws_sdk_*datestamp*.log in the specified folder, where datestamp represents the day and hour logging began.
As a best practice, it is recommended to keep SDK logging disabled unless actively troubleshooting. The logs can get quite large, and Data Optimizer does not manage or rotate these logs. It is up to the user to delete these files when done with troubleshooting exercises.
Data Optimizer configuration troubleshooting
To troubleshoot Data Optimizer configuration issues, verify the following:
- All required configuration parameters are defined, and the values are correct. - All parameter names and most values are case sensitive. 
- Make sure not to use Windows style new lines if you edited on Windows. 
- Verify the log for when the configuration file is loaded and confirm that there are no errors and that all loaded parameters are correct. Look for the string “Found config key=.” 
 
- Paths in the configuration file are correct and case matches the path on the local file system. 
- Ownership and permissions are correct for all required files and directories. - All files and directories are owned by the user matching UID in the configuration file. 
- Permissions = 770 for directories CACHE_DIR, MD_STORE_DIR, MOUNT_POINT, and LOG_SDK (if specified) 
- Permissions = 770 for parent folder of METRICS_FILE (if specified) 
- Permissions = 640 for the configuration file. 
 
Hitachi Content Platform (HCP) configuration troubleshooting
To troubleshoot Hitachi Content Platform (HCP) configuration issues, verify the following:
- HCP is reachable over the network at ports 80/443 from the DataNode. 
- HCP is properly configured according to the instructions in this guide and the HCP documentation. - Your user credentials are correct, and your user has Namespace Management permission selected. 
 
- HCP Hard Quotas and Namespace Quotas have not been exceeded. 
Data Optimizer does not start
If Data Optimizer does not start, verify the following:
- Verify the log for an indication of failure.If not, try starting in DEBUG logging mode. 
- Make sure that the mount folder is empty.You will get an error starting the software if the mount point is not empty. 
- Verify the file and folder ownership permissions. 
In some cases, if a mount point is started manually, and particularly if it is started as the root user, certain files and directories might end up being inaccessible to the HDFS user who is supposed to own these assets. Verify that the following files and directories are owned by the correct user (Hortonworks - hdfs:hadoop, Cloudera - hdfs:hdfs). Including hidden files:
/tmp/ldo_*
/tmp/ldo_*/*
All directories specified in the Data Optimizer configuration, including the Data Optimizer Mount Point and the Data Optimizer Cache Directory.
Monitor Data Optimizer
Data Optimizer maintains operation and performance metrics that are useful for ongoing monitoring of the software. To instruct Data Optimizer to emit these metrics use the ldoctl command line utility as follows:
ldoctl metrics collectThe command causes Data Optimizer to emit a JSON string containing the metrics to the systemd journal log, or to the file specified in the METRICS_FILE parameter in the configuration file, if specified and permissions are properly set on the target folder.
ldoctl metrics collect
The command causes Data Optimizer to emit a JSON string containing the metrics to the systemd journal log, or to the file specified in the METRICS_FILE parameter in the configuration file, if specified and permissions are properly set on the target folder.
The format of the JSON string is as follows:
{
    "date": "2019 10 25 13:48:00 +0000",
    "metrics": [
        {
            "event": "*event\_name*",
            "max": *max\_value*,
            "mean": *mean\_value*,
            "min": *min\_value*,
            "stddev": *stddev*,
            "total": *event\_count*,
            "type": "timer"
        },
        {
            "event": "*event\_name*",
            "total": *event\_count*,
            "type": "counter"
        }
        
    ]
}
Types of metrics events
There are two types of metrics events: timer and counter. Both timer and counter metrics events have an event name in the event field.
- Timer events - Report the maximum ( - max), mean (- mean), minimum (- min), and standard deviation (- stddev) of the amount of time it takes to complete certain measured internal operations. The- totalfield for timer events is always a count of the number of times the event has occurred. Following is a list of the- timerevents:- get_attr: Filesystem request for file attributes, that is,- stat
- readdir: Directory listing, includes S3 listing if applicable
- md_cache_add_entry: Adding an entry to the local metadata store
- md_cache_update_entry: Updating an entry in the local metadata store
- md_cache_evict_entry: Removing an entry in the local metadata store
- md_cache_get_entry: Getting an entry from the local metadata store
- md_cache_copy_entry_new_path: Copying an entry in the local metadata store
- md_cache_get_size: Counting the entries in the local metadata store
- md_cache_readdir_root: Listing the root directory from the local metadata store
- md_cache_readdir: Listing a directory from the local metadata store
- *_prepare_stmt: Preparing a DB operation for the local metadata store
- *_exec_stmt: Executing a DB operation on the local metadata store
 
- Counter Events - The - totalfield can be a count of occurrences or some other count. Following is a list of the- counterevents:- md_cache_size: The number of entries in the local metadata store
- md_cache_size: The number of entries in the local metadata store
- open_file_size: The number of currently open file handles in the Data Optimizer volume
- s3_op_put: The count of S3 PUT requests to the HCP
- s3_op_get: The count of S3 GET requests to the HCP
- s3_op_list: The count of bucket listing requests to the HCP
- s3_op_delete: The count of S3 DELETE requests to the HCP
- s3_op_copy: The count of S3 put-copy requests to the HCP
- s3_op_error: The count of S3 error responses from the HCP
- warnings: The count of logged warnings
- errors: The count of logged errors
 
Note: All values reported in the metrics are cumulative and reset only when Data Optimizer restarts.
Last updated
Was this helpful?

