Tag Archives: ESX

VNXe 3100 performance


Earlier this year I installed a VNXe 3100 and have now done some testing with it. I have already covered the VNXe 3300 performance in a couple of my previous posts: Hands-on with VNXe 3300 Part 6: Performance and VNXe 3300 performance follow up (EFDs and RR settings). The 3100 has fewer disks than the 3300, also less memory and only two I/O ports. So I wanted to see how the 3100 would perform compared to the 3300. I ran the same Iometer tests that I ran on the 3300. In this post I will compare those results to the ones that I introduced in the previous posts. The environment is a bit different so I will quickly describe that before presenting the results.

Test environment

  • EMC VNXe 3100 (21 600GB SAS Drives)
  • Dell PE 2900 server
  • HP ProCurve 2510G
  • Two 1Gb iSCSI NICs
  • ESXi 4.1U1 / ESXi 5.0
  • Virtual Win 2008 R2 (1vCPU and 4GB memory)

Test results

I ran the tests on both ESXi 4.1 and ESXi 5.0 but the iSCSI results were very similar so I used the average of both. NFS results had some differences so I will present the results for both 4 and 5 separately. I also did the tests with and without LAG and also when changing the default RR settings. VNXe was configured with one 20 disk pool with 100GB datastore provisioned to ESXi servers. The tests were run on 20GB virtual disk on the 100GB datastore.

[update] My main focus in these tests has been on iSCSI because that is what we are planning to use. I only ran quick tests with the generic NFS and not with the one that is configured under Storage – VMware. After Paul’s comment I ran a couple of test on the “VMware NFS” and I then added “ESXi 4 VMware NFS” to the test results:

Conclusions

With default settings the performance of the 3300 and the 3100 is fairly similar. The 3300 gives better throughput when the default IO operation limit is set from the default 1000 to 1. The differences on the physical configurations might also have an effect on this. With random workload the performance is similar even when the default settings are changed. Of course the real difference would be seen when both would be under heavy load. During the tests there was only the test server running on the VNXes.

On the NFS I didn’t have comparable results from the 3300. I ran different tests on the 3300 and those results weren’t good either. The odd thing is that ESXi 4 and ESXi 5 gave quite different results when running the tests on NFS.

Looking these and the previous results I would still be sticking with iSCSI on VNXe. What comes to the performance of the 3100 it is surprisingly close to its bigger sibling 3300.

[update] Looking at the new test results NFS is performing as well as iSCSI. With the modified RR settings iSCSI gets better max throughput but then again with random workloads NFS seems to perform better. So the type of NFS storage provisioned to the ESX hosts makes a difference. Now comes the question NFS or iSCSI? Performance vice either one is a good choice. But which one suits your environment better?

Disclaimer

These results reflect the performance of the environment that the tests were ran in. Results may vary depending on the hardware and how the environment is configured.


VADP backup fails to remove snapshot


I have noticed that sometimes after the vStorage APIs for Data Protection (VADP) backup the virtual machine (VM) snapshot is not deleted even when the backup is successfully completed. This can cause a chain reaction that would save several snapshot vmdk-files on the datastore and eventually the datastore could run out of space. After the first failed snapshot removal VADP backups continue working normally except that the snapshot vmdk-file amount starts growing. In some cases the failed snapshot removal leaves an error message on the vCenter events but this is not always the case.

How to identify the problem

Like I already mentioned the issue can be spotted from the growing number of snapshot vmdk-files on the datastore. If you are monitoring VM snapshots then you should be able to notice the situation before the datastore runs out of space.

Another thing is to check if the VM has any shapshots. When VADP backup is running there should be “Consolidate Helper” snapshot active and after the VADP backup is done this should be deleted. If the backup is not running and this snapshot exists this confirms that there is an issue with the snapshots.

There could also be “Unable to access file <unspecified filename> since it is locked” error shown on the VM’s task details

I’ve also seen that even when the VADP initiated snapshot removal is successful the “Consolidate Helper” snapshot and snapshot vmdk-files still exist.

At this point I would suggest reading Ivo Beerens’ blog post about a similar issue with the snapshots. He is describing a solution when getting the “Unable to access file <unspecified filename> since it is locked” error. It didn’t work on my case so I had to find another way to solve this issue.

After the orphaned “Consolidate Helper” snapshot is manually removed vCenter is not showing any shapshots for the VM and also checking from ESX console confirms that there are no snapshots, however, all the snapshot vmdk-files are still present.

How to fix the problem

The first thing is to schedule downtime for the VM because it needs to be shut down to complete these steps. Because the snapshot files keep increasing there should be enough free space on the datastore to accommodate the snapshots until this fix can be performed.

The next thing would be to make sure that the VADP backup is disabled while the following operations are performed. Running VADP backup while working on the virtual disks can really mess up the snapshots.

After the previous steps are covered and the VM is shut down make a copy of the VM folder. This is the first thing I do if I have to work with vmdk-files. Just in case if something goes wrong.

The fix is to clone the vmdk-file with snapshots to a new vmdk using vmkfstools-command (the VM that I was working on was on ESX 4.1 so vmkfstools was still available) to consolidate the snapshots and then remove the current virtual disk(s) from the VM and add the new cloned disk(s) to it. Although there are some considerations before cloning the vmdks:

Don’t rely on the fact that the vmdk-file with the highest number (i.e. [servername]-000010.vmdk) is the latest snapshot. Always check from VM properties or from vmx-file if using command line.

VM properties:

[servername].vmx from command line:

If you plan to work with the copied vmdk-files keep in mind that the “parentFileNameHint=” row on the vmdk-file points to the original location of the parent. So before you clone the vmdk-file you should change the path to point to the path of the copy.

Now that the latest snapshot vmdk-file is recognized the clone can be done with the vmkfstools –i command from command line:

vmkfstools –i [servername]-0000[nn].vmdk [newname].vmdk

After the clone is done the virtual disk can be removed from VM (I used the “remove from virtual machine” option, not the delete option) and the new one can be added. If the VM has more than one virtual disk then this procedure hast to be done to all of them. After confirming that the VM starts normally and that all the data is intact the unused vmdks can be removed. In my case I had VM with two virtual disks and both had serveral snapshot vmdks so I used storage vMotion to move the VM to another datastore and then deleted the folder that was left to the old datastore.


Unintentionally deleting a virtual disk


Have you ever lost a file that you certainly didn’t delete? Well that might happen with vmdk-files if datastore is not unpresented and virtual disk deleted in the right order.

Lets say that you have a virtual machine with two virtual disks in two different datastores; vmfs_a and vmfs_b. You then unpresent from ESX the LUN containing vmfs_b and vmdk-file of hard disk 2. After the LUN is unpresented the second virtual disk (which is still visible on vm’s settings) is pointing to the first virtual disk’s vmdk. So if you now remove the second virtual disk with the option ‘remove from virtual machine and delete files from disk’ it actually deletes the vmdk of the first virtual disk.

Vm with two virtual disks:

 
After unpresenting the datastore vmfs_b from ESX the second virtual disk’s disk file is pointing to the first virtual disk’s vmdk:
 
 
Removing ‘hard disk 2’ from vm with option ‘Remove from virtual machine and delete from disk’:
 
 
After removing the second virtual disk the first virtual disk still points to its original location:
 
 
When browsing to the vmfs_a datastore the test-vm.vmdk file is no longer there:
 
 
It’s this easy to unintentionally delete the wrong vmdk-file. So before unpresenting LUN from ESX make sure that the datastore is empty to avoid this kind of a situation. I would also suggest masking LUN before unpresenting it because all paths-down condition might occur when LUN is unpresented without masking.
 
VMware kb about all paths-down condition:
 
VMware kb about unpresenting LUN from ESX:
 

ESX 4.x losing statistics


Just noticed some weird behaviour on ESXi 4.1 U1 (build 348481) statistics while rescanning datastores. Wanted to share my findings and maybe also find an explanation or a fix for this problem.

I started “rescan for datastores” on a cluster that has two ESXi’s in it and noticed that while the scan was on the realtime statistics wasn’t showing any graph. Same issue with both hosts when using Virtual Center and also when connected directly to the ESXi. Also no graph on VM’s during the rescan. I did some tests with ESX/ESXi 4.1 (build 260247) using different storages and got the problem reproduced on those too.

Gaps on the graph while rescanning:

I also noticed that all of ESXi’s CPU was consumed during the rescan.

During the scan:

After the scan:

One host was showing zero CPU usage while rescanning:

Don’t know if this is a bug or what but I hope I can find a fix for this soon.

Has anyone seen this happen before?


%d bloggers like this: