Restoring the metadata store to its authoritative state
To force an active recovery, perform the following steps:
SSH to the DataNode in question.
Turn logging for the Data Optimizer instance up to DEBUG. See Adjusting log level at runtime.
Put the DataNode in maintenance mode.
Note: To ensure that all metadata has been recovered, run the
du
command at least twice while no other processes are modifying the contents of the mount_point folder. After you have had two concurrent successfuldu
passes, with the number of records in the metadata store (md_cache_size
) being the same after the last two successful passes, you can be confident that all metadata has been recovered.Run the following commands:
`ldoctl metrics collect sudo -u *user* du -s *mount\_point*; [ $? == 0 ] && (echo "Success") || (echo "Failure") ` `ldoctl metrics collect sudo -u *user* du -s *mount\_point*; [ $? == 0 ] && (echo "Success") || (echo "Failure") ` `ldoctl metrics collect journalctl -et ldo -g '"md_cache_size"' `
user is the username of the user matching the
UID
parameter in the configuration file, that is the user with access to the file system.mount_point is the folder matching the
MOUNT_POINT
parameter in the configuration file, that is where Data Optimizer is mounted.The
du
command attempts a recursive folder listing and metadata poll of the entire file system.The
ldoctl
command emits metrics to the systemd journal. Run this command before and after eachdu
command.The
journalctl
command grabs the metrics you are interested in from the journal. If thedu
command finishes successfully it prints "Success" on the window upon completion; otherwise, it prints "Failure."
Here is an example showing the command with two successful results:
$ `ldoctl metrics collect` $ `sudo -u <user> du -s <mount_point>; [ $? == 0 ] && (echo "Success") || (echo "Failure")` 321577600 /mnt/ldo/data/ Success $ `ldoctl metrics collect` $ `sudo -u <user> du -s <mount_point>; [ $? == 0 ] && (echo "Success") || (echo "Failure")` 321577600 /mnt/ldo/data/ Success $ `ldoctl metrics collect` $ `journalctl -et ldo -g '"md_cache_size"'` … [t:3892][metrics.c:231] {"type": "counter", "event": "md_cache_size", "total": 1}, … [t:3892][metrics.c:231] {"type": "counter", "event": "md_cache_size", "total": 2512325}, … [t:3892][metrics.c:231] {"type": "counter", "event": "md_cache_size", "total": 2512325},
If the
du
command completes successfully twice in a row, compare the last two totals for md_cache_size metrics.If the numbers match, all metadata from the S3 bucket has been recovered locally in the metadata store.
Perform the following steps based on the success or failure of the previous command and the comparisons of md_cache_size metrics:
If the result of the previous command is "Failure," or the last two md_cache_size totals do not match:
Repeat the process, running the
du
,ldoctl
, andjournalct
commands until you have two concurrent successful results with matching totals.`sudo -u *user* du -s *mount\_point*; [ $? == 0 ] && (echo "Success") || (echo "Failure") ldoctl metrics collect journalctl -et ldo -g '"md_cache_size"' `
If the
du
command fails, it reports a list of the failed files or folders. If the list of failed resources is shrinking or changing, or the totals reported by the metrics are increasing, then the active recovery is making progress.If over several passes the
du
command reports “Failure,” and the list of failing resources does not change, that might indicate the active recovery is not making progress and additional troubleshooting is required. Contact Pentaho Customer Portal.
If the result of the previous command is "Success" and the last two md_cache_size totals match:
Edit the Data Optimizer configuration file for the DataNode and find the property
RECOVERY_MODE
. Remove the property or set the value tofalse
.Use the
ldoctl
command line tool to reload the configuration file and take the software out of recovery mode as follows:ldoctl config reload
Take the DataNode out of maintenance mode.
Last updated
Was this helpful?