Restoring the metadata store to its authoritative state
To force an active recovery, perform the following steps:
SSH to the DataNode in question.
Turn logging for the Data Optimizer instance up to DEBUG. See Adjusting log level at runtime.
Put the DataNode in maintenance mode.
Note: To ensure that all metadata has been recovered, run the
ducommand at least twice while no other processes are modifying the contents of the mount_point folder. After you have had two concurrent successfuldupasses, with the number of records in the metadata store (md_cache_size) being the same after the last two successful passes, you can be confident that all metadata has been recovered.Run the following commands:
`ldoctl metrics collect sudo -u *user* du -s *mount\_point*; [ $? == 0 ] && (echo "Success") || (echo "Failure") ` `ldoctl metrics collect sudo -u *user* du -s *mount\_point*; [ $? == 0 ] && (echo "Success") || (echo "Failure") ` `ldoctl metrics collect journalctl -et ldo -g '"md_cache_size"' `user is the username of the user matching the
UIDparameter in the configuration file, that is the user with access to the file system.mount_point is the folder matching the
MOUNT_POINTparameter in the configuration file, that is where Data Optimizer is mounted.The
ducommand attempts a recursive folder listing and metadata poll of the entire file system.The
ldoctlcommand emits metrics to the systemd journal. Run this command before and after eachducommand.The
journalctlcommand grabs the metrics you are interested in from the journal. If theducommand finishes successfully it prints "Success" on the window upon completion; otherwise, it prints "Failure."
Here is an example showing the command with two successful results:
$ `ldoctl metrics collect` $ `sudo -u <user> du -s <mount_point>; [ $? == 0 ] && (echo "Success") || (echo "Failure")` 321577600 /mnt/ldo/data/ Success $ `ldoctl metrics collect` $ `sudo -u <user> du -s <mount_point>; [ $? == 0 ] && (echo "Success") || (echo "Failure")` 321577600 /mnt/ldo/data/ Success $ `ldoctl metrics collect` $ `journalctl -et ldo -g '"md_cache_size"'` … [t:3892][metrics.c:231] {"type": "counter", "event": "md_cache_size", "total": 1}, … [t:3892][metrics.c:231] {"type": "counter", "event": "md_cache_size", "total": 2512325}, … [t:3892][metrics.c:231] {"type": "counter", "event": "md_cache_size", "total": 2512325},If the
ducommand completes successfully twice in a row, compare the last two totals for md_cache_size metrics.If the numbers match, all metadata from the S3 bucket has been recovered locally in the metadata store.
Perform the following steps based on the success or failure of the previous command and the comparisons of md_cache_size metrics:
If the result of the previous command is "Failure," or the last two md_cache_size totals do not match:
Repeat the process, running the
du,ldoctl, andjournalctcommands until you have two concurrent successful results with matching totals.`sudo -u *user* du -s *mount\_point*; [ $? == 0 ] && (echo "Success") || (echo "Failure") ldoctl metrics collect journalctl -et ldo -g '"md_cache_size"' `If the
ducommand fails, it reports a list of the failed files or folders. If the list of failed resources is shrinking or changing, or the totals reported by the metrics are increasing, then the active recovery is making progress.If over several passes the
ducommand reports “Failure,” and the list of failing resources does not change, that might indicate the active recovery is not making progress and additional troubleshooting is required. Contact Pentaho Customer Portal.
If the result of the previous command is "Success" and the last two md_cache_size totals match:
Edit the Data Optimizer configuration file for the DataNode and find the property
RECOVERY_MODE. Remove the property or set the value tofalse.Use the
ldoctlcommand line tool to reload the configuration file and take the software out of recovery mode as follows:ldoctl config reloadTake the DataNode out of maintenance mode.
Last updated
Was this helpful?

