I have noticed that sometimes after the vStorage APIs for Data Protection (VADP) backup the virtual machine (VM) snapshot is not deleted even when the backup is successfully completed. This can cause a chain reaction that would save several snapshot vmdk-files on the datastore and eventually the datastore could run out of space. After the first failed snapshot removal VADP backups continue working normally except that the snapshot vmdk-file amount starts growing. In some cases the failed snapshot removal leaves an error message on the vCenter events but this is not always the case.
How to identify the problem
Like I already mentioned the issue can be spotted from the growing number of snapshot vmdk-files on the datastore. If you are monitoring VM snapshots then you should be able to notice the situation before the datastore runs out of space.
Another thing is to check if the VM has any shapshots. When VADP backup is running there should be “Consolidate Helper” snapshot active and after the VADP backup is done this should be deleted. If the backup is not running and this snapshot exists this confirms that there is an issue with the snapshots.
There could also be “Unable to access file <unspecified filename> since it is locked” error shown on the VM’s task details
I’ve also seen that even when the VADP initiated snapshot removal is successful the “Consolidate Helper” snapshot and snapshot vmdk-files still exist.
At this point I would suggest reading Ivo Beerens’ blog post about a similar issue with the snapshots. He is describing a solution when getting the “Unable to access file <unspecified filename> since it is locked” error. It didn’t work on my case so I had to find another way to solve this issue.
After the orphaned “Consolidate Helper” snapshot is manually removed vCenter is not showing any shapshots for the VM and also checking from ESX console confirms that there are no snapshots, however, all the snapshot vmdk-files are still present.
How to fix the problem
The first thing is to schedule downtime for the VM because it needs to be shut down to complete these steps. Because the snapshot files keep increasing there should be enough free space on the datastore to accommodate the snapshots until this fix can be performed.
The next thing would be to make sure that the VADP backup is disabled while the following operations are performed. Running VADP backup while working on the virtual disks can really mess up the snapshots.
After the previous steps are covered and the VM is shut down make a copy of the VM folder. This is the first thing I do if I have to work with vmdk-files. Just in case if something goes wrong.
The fix is to clone the vmdk-file with snapshots to a new vmdk using vmkfstools-command (the VM that I was working on was on ESX 4.1 so vmkfstools was still available) to consolidate the snapshots and then remove the current virtual disk(s) from the VM and add the new cloned disk(s) to it. Although there are some considerations before cloning the vmdks:
Don’t rely on the fact that the vmdk-file with the highest number (i.e. [servername]-000010.vmdk) is the latest snapshot. Always check from VM properties or from vmx-file if using command line.
[servername].vmx from command line:
If you plan to work with the copied vmdk-files keep in mind that the “parentFileNameHint=” row on the vmdk-file points to the original location of the parent. So before you clone the vmdk-file you should change the path to point to the path of the copy.
Now that the latest snapshot vmdk-file is recognized the clone can be done with the vmkfstools –i command from command line:
vmkfstools –i [servername]-0000[nn].vmdk [newname].vmdk
After the clone is done the virtual disk can be removed from VM (I used the “remove from virtual machine” option, not the delete option) and the new one can be added. If the VM has more than one virtual disk then this procedure hast to be done to all of them. After confirming that the VM starts normally and that all the data is intact the unused vmdks can be removed. In my case I had VM with two virtual disks and both had serveral snapshot vmdks so I used storage vMotion to move the VM to another datastore and then deleted the folder that was left to the old datastore.