Tag Archives: VMware

Changing round robin IO operation limit on ESXi 5


After I published the post VNXe 3300 performance follow up (EFDs and RR settings) I started seeing visitors landing to my blog through search engines searching “IO operation limit ESXi 5”. In the previous post I only described how the IO operation limit can be changed on ESX 4 using PowerCLI. Commands with ESXi 5 are a bit different. This post will describe how it can be done on ESXi 5 using ESXi Shell and PowerCLI.

Round Robin settings

First thing to do is to change the datastore path selection policy to RR (from vSphere client – select host – configure – storage adapters – iSCSI sw adapter – right click the device and select manage paths – for path selection select Round Robin (VMware) and click change)

Changing IO operation limit using PowerCLI

1. Open PowerCLI and connect to the server

Connect-VIServer -Server [servername]

2. Retrieve esxcli instance

$esxcli = Get-EsxCli

3. Change device IO Operation Limit to 1 and set Limit Type to Iops. [deviceidentifier] can be found from vSphere client’s iSCSI sw adapter view and is in format of naa.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.

$esxcli.storage.nmp.psp.roundrobin.deviceconfig. ‘

set($null,”[deviceidentifier]“,1,”iops”,$null)

3. Check that the changes were completed.

$esxcli.storage.nmp.psp.roundrobin.deviceconfig. ‘

get(“[deviceidentifier]“)

Chaning IO operation limit using ESXi Shell

1. Login to ESXi using SSH

2. Change device IO Operation Limit to 1 and set Limit Type to Iops. [deviceidentifier] can be found from vSphere client’s iSCSI sw adapter view and is in format of naa.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.

esxcli storage nmp psp roundrobin deviceconfig set –type=iops –iops 1 –device=[deviceidentifier]

3.   Check that the changes were completed.

esxcli storage nmp psp roundrobin deviceconfig get –device=[deviceidentifier]


Nested ESXi with swap to host cache on VMware Player


Just after the vSphere 5 was released I wrote a post about running ESXi 5 on VMware Player 3. It was an easy way to get to know the ESXi 5 and create a small home lab on your laptop. The issue with running multiple ESXi instances on my laptop is the lack of memory. I have 8GB of memory so that sets some limitations.

After VMware Player 4 was released on January 24 I upgraded my Player and started to play around with it. I found out that it was really easy to run nested ESXis with the new Player version. This wouldn’t help much because I still had only 8GB memory on my laptop. But I also had an SSD on my laptop. I knew that ESXi 5 has a feature called “swap to host cache” which allows the use of an SSD as a swap for the ESXi. So I started testing if it would be possible to run ESXi on the Player, to configure swap to host cache enabling the use of my SSD drive and then to run nested ESXis on the first ESXi. And yes it is possible. Here is how to do it.

Installing the first ESXi

ESXi installation follows the steps that I described on my previous post. The only addition to those steps is that the “Virtualize Intel VT-x/EPT or AMD-V/RVI” option should be selected for the processors to be able to run nested ESXis. I also added a 25GB disk for the host cache and a 100GB drive for nested VM’s.

Configuring the swap to host cache on the first ESXi

The first step before installing any nested VMs is to configure the swap to host cache on the ESXi that is running on the VMware Player. Duncan Epping has a really elaborate post (Swap to host cache aka swap to SSD?) that describes how the cache works and how it can be enabled. Duncan’s post has a link to William Lam’s post (How to Trick ESXi in seeing an SSD Datastore) that I followed to get the ESXi to actually show the virtual disk as an SSD datastore. I then followed Duncan’s instructions to enable the cache. So I now have the ESXi 5 running on VMware Player on my laptop with 23GB of SSD host cache.

Installing nested VMs

When creating a nested VM to run an ESXi the guest default operating system selection can be used.

After the VM is created the guest operating system type needs to be changed to Other/VMware ESXi 5.x:

Host cache in work

To test it up I created three 8GB VMs for the ESXis and then I also deployed the vCenter appliance that also has 8GB memory configured to it. I then started installing the ESXis and could see that the host cache was being utilized.


VNXe 3100 performance


Earlier this year I installed a VNXe 3100 and have now done some testing with it. I have already covered the VNXe 3300 performance in a couple of my previous posts: Hands-on with VNXe 3300 Part 6: Performance and VNXe 3300 performance follow up (EFDs and RR settings). The 3100 has fewer disks than the 3300, also less memory and only two I/O ports. So I wanted to see how the 3100 would perform compared to the 3300. I ran the same Iometer tests that I ran on the 3300. In this post I will compare those results to the ones that I introduced in the previous posts. The environment is a bit different so I will quickly describe that before presenting the results.

Test environment

  • EMC VNXe 3100 (21 600GB SAS Drives)
  • Dell PE 2900 server
  • HP ProCurve 2510G
  • Two 1Gb iSCSI NICs
  • ESXi 4.1U1 / ESXi 5.0
  • Virtual Win 2008 R2 (1vCPU and 4GB memory)

Test results

I ran the tests on both ESXi 4.1 and ESXi 5.0 but the iSCSI results were very similar so I used the average of both. NFS results had some differences so I will present the results for both 4 and 5 separately. I also did the tests with and without LAG and also when changing the default RR settings. VNXe was configured with one 20 disk pool with 100GB datastore provisioned to ESXi servers. The tests were run on 20GB virtual disk on the 100GB datastore.

[update] My main focus in these tests has been on iSCSI because that is what we are planning to use. I only ran quick tests with the generic NFS and not with the one that is configured under Storage – VMware. After Paul’s comment I ran a couple of test on the “VMware NFS” and I then added “ESXi 4 VMware NFS” to the test results:

Conclusions

With default settings the performance of the 3300 and the 3100 is fairly similar. The 3300 gives better throughput when the default IO operation limit is set from the default 1000 to 1. The differences on the physical configurations might also have an effect on this. With random workload the performance is similar even when the default settings are changed. Of course the real difference would be seen when both would be under heavy load. During the tests there was only the test server running on the VNXes.

On the NFS I didn’t have comparable results from the 3300. I ran different tests on the 3300 and those results weren’t good either. The odd thing is that ESXi 4 and ESXi 5 gave quite different results when running the tests on NFS.

Looking these and the previous results I would still be sticking with iSCSI on VNXe. What comes to the performance of the 3100 it is surprisingly close to its bigger sibling 3300.

[update] Looking at the new test results NFS is performing as well as iSCSI. With the modified RR settings iSCSI gets better max throughput but then again with random workloads NFS seems to perform better. So the type of NFS storage provisioned to the ESX hosts makes a difference. Now comes the question NFS or iSCSI? Performance vice either one is a good choice. But which one suits your environment better?

Disclaimer

These results reflect the performance of the environment that the tests were ran in. Results may vary depending on the hardware and how the environment is configured.


Ask The Expert wrap up


It has now been almost two weeks since the EMC Ask the Expert: VNXe front-end Networks with VMware event ended. We had a couple of meetings before hand where we discussed and planned the event, but we really didn’t know what to expect from it. Matt and I were committed to answer the questions during the two weeks so it was a bit different than a normal community thread. Now looking at the amount of views the discussion got we know that it was a success. During the two weeks of time that the event was active we had more than 2300 views on the page. We had several people asking questions and opinions from us. As a summary Matt and I wrote a document that covers the main concerns discussed during the event. In this document we look into the VNXe HA configurations, link aggregation and also do a quick overview of the ESX side configurations:

Ask the Expert Wrap ups – for the community, by the community

I was really excited when I was asked to participate a great event like this. Thank you Mark, Matt and Sean, it was great working with you guys!


EMC Ask The Expert


You may have already visited the EMC Support Community Ask The Expert Forum page or read posts about it by Matthew Brender, Mark Browne or Sean Thulin. EMC Ask The Expert Series is basically engagement between customers, partners and EMC employees or whoever wants to participate. The series consists of several topics and there are also several ways to take part (i.e. online webinar, forum conversation).

Like Matt, Mark and Sean have already mentioned on their posts the first Ask The Expert event started already on the January 16 and is running till  January 26. The first event is about VNXe network configurations and troubleshooting. Matthew and I have already been answering questions for a bit over a week and will continue until the end of this week. Just as I was writing this post we passed 1500 views on the topic.

How is this different from any other EMC support forum topic?

Both Matt and I are committed to monitor and answer this Ask The Expert topic for the period of two weeks. We will both get email alerts whenever someone posts on the topic and we will try to answer the questions during the same day. Matt will be answering as an EMC employee and I will be answering as a customer.

The topic is about VNXe networking but it doesn’t mean that you can’t ask questions about other topics concerning VNXe. The topic is set to keep the thread fairly short. If other than networking questions are raised we will start new topic on the forum and continue the conversation in that thread.

There are still four full days to take an advantage of my and Matt’s knowledge about VNXe. The event ends on Friday but that doesn’t mean we are not answering any VNXe related questions in the forums anymore. It means that after Friday you might not get your questions answered as quickly as you would get during this event while both of us are committed to interact with this topic.

I would encourage anyone to ask questions or raise concerns about any VNXe topic on the EMC support forums. If you don’t have ECN (EMC Community Network) account I would recommend creating one and interacting if you are working with EMC products. If you are EMC customer and have Powerlink account you can login to ECN using that account:

If you have a question about VNXe and for some reason don’t want to post it on the ECN forum just leave a comment on this post and I will address the question on Ask The Expert thread. We are also monitoring #EMCAskTheExpert tag on Twitter and will pick questions from there too.


Changing MTU on VNXe disconnects datastores from ESXi


While testing the VNXe 3100 (OE 2.1.3.16008) I found a problem when changing the MTU settings for link aggregate. With specific combination of configurations, changing the MTU causes the ESXi (4.1.0, 502767) to loose all iSCSI datastores and even changing the settings back the datastores are still not visible on ESXi. VNXe also can’t provision new datastores to ESXi while this problem is occurring. There are a couple of workarounds for this but no official fix is available to avoid this kind of a situation.

How did I find it?

After the initial configuration I created link aggregate from two ports, set the MTU to 9000 and also created one iSCSI server on SP A with two IP addresses. I then configured ESXi also to use MTU 9000. Datastore creation on VNXe side went through successfully but on the ESXi side I could see an error that the VMFS volume couldn’t be created.

I could see the LUN under iSCSI adapter but manually creating a VMFS datastore also failed. I then realized that I hadn’t configured jumbo frames on the switch and decided to change the ESXi and VNXe MTUs back to 1500. After I changed the VNXe MTU the LUN disappeared from ESXi. Manually changing the datastore access settings from VNXe didn’t help either. I just couldn’t get the ESXi see the LUN anymore. I then tried to provision a new datastore to ESX but got this error:

Ok, so I deleted the datastore and the iSCSI server and then recreated the iSCSI server and provisioned a new datastore for the ESXi without any problems. I had a suspicion that the MTU change caused the problem and tried it again. I changed the link aggregation on VNXe from 1500 to 9000 and after that was done the datastore disappeared from ESXi. Changing MTU back to 1500 didn’t help, the datastore and LUN were not visible on ESX. Also creating a new datastore gave the same error as before. Datastore was created on VNXe but was not accessible from ESXi. Deleting and recreating datastores and iSCSI servers resolved the issue again.

What is the cause of this problem?

So it seemed that the MTU change was causing the problem. I started testing with different scenarios and found out that the problem was the combination of the MTU change and also the iSCSI server having two IP addresses. Here are some scenarios that I tested (sorry about the rough grammar, tried to keep the descriptions short):

Link aggregation MTU 1500 and iSCSI server with two IP addresses. Provisioned storage works on ESXi. Changing VNXe link aggregation MTU to 9000 and ESXi lose connection to datastore. Change VNXe MTU back to 1500 and ESXi still can’t see the datastore. Trying to provision new datastore to ESXi results an error. Removing the other IP address doesn’t resolve the problem.

Ling aggregation MTU 1500 and iSCSI server with two IP addresses. Provisioned storage works on ESXi. Removing the other IP from iSCSI server and changing MTU to 9000. Datastore is still visible and accessible from ESXi side. Changing MTU back to 1500 and datastore is still visible and accessible from ESXi. Datastore provisioning to ESXi is successful. After adding another IP address to iSCSI server ESX loses the connection to datastore. Provisioning new datastore to ESXi results an error. Removing the other IP address also doesn’t resolve the problem.

Ling aggregation MTU 1500 and iSCSI server with one IP address. Provisioned storage works on ESX. Change MTU to 9000. Datastore is still visible and accessible from ESXi side. Changing MTU back to 1500 and datastore is still visible and accessible from ESXi. Datastore provisioning to ESXi is successful. After adding another IP address to iSCSI server ESX loses the connection to datastore. Provisioning new datastore to ESXi results an error. Removing the other IP doesn’t resolve the problem.

Link aggregation MTU 1500 and two iSCSI servers on one SP both configured with one IP. One datastore on both iSCSI servers (there is also an issue getting the datastore on the other iSCSI server provisioned, see my previous post). Adding a second IP for the first iSCSI server and both datastores are still accessible from ESXi. When changing MTU to 9000 ESX loses connection to both datastores. Changing MTU back to 1500 and both datastores are still not visible on ESXi. Also getting the same error as previously when trying to provision new storage.

I also tested different combinations with iSCSI servers on different SPs and if SPA iSCSI server has two IP addresses and SPB iSCSI server has only one IP and the MTU is changed then the datastores on SPB iSCSI server are not affected.

How to fix this?

Currently there is no official fix for this. I have reported the problem to EMC support and demonstrated the issue to EMC support technician and uploaded all the logs, so they are working on trying to find the root cause of this.

Apparently when an iSCSI server has two IP addresses and the MTU is changed the iSCSI server goes to some kind of “lockdown” mode and doesn’t allow any connections to be initiated. Like I already described the VNXe can be returned to operational state by removing all datastores and iSCSI servers and recreating them. Of course this is not an option when there is production data on the datastores.

EMC support technician showed me a quicker and a less radical workaround to get the array back to operational state: Restarting the iSCSI service on the VNXe. CAUTION: Restarting iSCSI service will disconnect all provisioned datastores from hosts. Connection to datastores will be established after the iSCSI service is restarted. But this will cause all running VMs to crash.

The easiest way to restart iSCSI service is enabling the iSNS server from iSCSI server settings, giving it an IP address and applying changes. After the changes are applied iSNS server can be disabled. This will trigger the iSCSI service to restart and all datastores that were disconnected are again visible and usable on ESXi.

Conclusions

After this finding I would suggest not to configure iSCSI serves with two IP addresses. If MTU change can do this much damage what about other changes?

If you have two iSCSI servers with two IP addresses I would advise not to change MTU even if it would be done during a planned service break. If for some reason it is mandatory to do the change, contact EMC support before doing it. If you have arrays affected by this issue I would encourage to contact EMC support before trying to restart the iSCSI service.

Once again I have to give credit to EMC support. They have some great people working there.


VADP backup fails to remove snapshot


I have noticed that sometimes after the vStorage APIs for Data Protection (VADP) backup the virtual machine (VM) snapshot is not deleted even when the backup is successfully completed. This can cause a chain reaction that would save several snapshot vmdk-files on the datastore and eventually the datastore could run out of space. After the first failed snapshot removal VADP backups continue working normally except that the snapshot vmdk-file amount starts growing. In some cases the failed snapshot removal leaves an error message on the vCenter events but this is not always the case.

How to identify the problem

Like I already mentioned the issue can be spotted from the growing number of snapshot vmdk-files on the datastore. If you are monitoring VM snapshots then you should be able to notice the situation before the datastore runs out of space.

Another thing is to check if the VM has any shapshots. When VADP backup is running there should be “Consolidate Helper” snapshot active and after the VADP backup is done this should be deleted. If the backup is not running and this snapshot exists this confirms that there is an issue with the snapshots.

There could also be “Unable to access file <unspecified filename> since it is locked” error shown on the VM’s task details

I’ve also seen that even when the VADP initiated snapshot removal is successful the “Consolidate Helper” snapshot and snapshot vmdk-files still exist.

At this point I would suggest reading Ivo Beerens’ blog post about a similar issue with the snapshots. He is describing a solution when getting the “Unable to access file <unspecified filename> since it is locked” error. It didn’t work on my case so I had to find another way to solve this issue.

After the orphaned “Consolidate Helper” snapshot is manually removed vCenter is not showing any shapshots for the VM and also checking from ESX console confirms that there are no snapshots, however, all the snapshot vmdk-files are still present.

How to fix the problem

The first thing is to schedule downtime for the VM because it needs to be shut down to complete these steps. Because the snapshot files keep increasing there should be enough free space on the datastore to accommodate the snapshots until this fix can be performed.

The next thing would be to make sure that the VADP backup is disabled while the following operations are performed. Running VADP backup while working on the virtual disks can really mess up the snapshots.

After the previous steps are covered and the VM is shut down make a copy of the VM folder. This is the first thing I do if I have to work with vmdk-files. Just in case if something goes wrong.

The fix is to clone the vmdk-file with snapshots to a new vmdk using vmkfstools-command (the VM that I was working on was on ESX 4.1 so vmkfstools was still available) to consolidate the snapshots and then remove the current virtual disk(s) from the VM and add the new cloned disk(s) to it. Although there are some considerations before cloning the vmdks:

Don’t rely on the fact that the vmdk-file with the highest number (i.e. [servername]-000010.vmdk) is the latest snapshot. Always check from VM properties or from vmx-file if using command line.

VM properties:

[servername].vmx from command line:

If you plan to work with the copied vmdk-files keep in mind that the “parentFileNameHint=” row on the vmdk-file points to the original location of the parent. So before you clone the vmdk-file you should change the path to point to the path of the copy.

Now that the latest snapshot vmdk-file is recognized the clone can be done with the vmkfstools –i command from command line:

vmkfstools –i [servername]-0000[nn].vmdk [newname].vmdk

After the clone is done the virtual disk can be removed from VM (I used the “remove from virtual machine” option, not the delete option) and the new one can be added. If the VM has more than one virtual disk then this procedure hast to be done to all of them. After confirming that the VM starts normally and that all the data is intact the unused vmdks can be removed. In my case I had VM with two virtual disks and both had serveral snapshot vmdks so I used storage vMotion to move the VM to another datastore and then deleted the folder that was left to the old datastore.


%d bloggers like this: