Changing MTU on VNXe disconnects datastores from ESXi


While testing the VNXe 3100 (OE 2.1.3.16008) I found a problem when changing the MTU settings for link aggregate. With specific combination of configurations, changing the MTU causes the ESXi (4.1.0, 502767) to loose all iSCSI datastores and even changing the settings back the datastores are still not visible on ESXi. VNXe also can’t provision new datastores to ESXi while this problem is occurring. There are a couple of workarounds for this but no official fix is available to avoid this kind of a situation.

How did I find it?

After the initial configuration I created link aggregate from two ports, set the MTU to 9000 and also created one iSCSI server on SP A with two IP addresses. I then configured ESXi also to use MTU 9000. Datastore creation on VNXe side went through successfully but on the ESXi side I could see an error that the VMFS volume couldn’t be created.

I could see the LUN under iSCSI adapter but manually creating a VMFS datastore also failed. I then realized that I hadn’t configured jumbo frames on the switch and decided to change the ESXi and VNXe MTUs back to 1500. After I changed the VNXe MTU the LUN disappeared from ESXi. Manually changing the datastore access settings from VNXe didn’t help either. I just couldn’t get the ESXi see the LUN anymore. I then tried to provision a new datastore to ESX but got this error:

Ok, so I deleted the datastore and the iSCSI server and then recreated the iSCSI server and provisioned a new datastore for the ESXi without any problems. I had a suspicion that the MTU change caused the problem and tried it again. I changed the link aggregation on VNXe from 1500 to 9000 and after that was done the datastore disappeared from ESXi. Changing MTU back to 1500 didn’t help, the datastore and LUN were not visible on ESX. Also creating a new datastore gave the same error as before. Datastore was created on VNXe but was not accessible from ESXi. Deleting and recreating datastores and iSCSI servers resolved the issue again.

What is the cause of this problem?

So it seemed that the MTU change was causing the problem. I started testing with different scenarios and found out that the problem was the combination of the MTU change and also the iSCSI server having two IP addresses. Here are some scenarios that I tested (sorry about the rough grammar, tried to keep the descriptions short):

Link aggregation MTU 1500 and iSCSI server with two IP addresses. Provisioned storage works on ESXi. Changing VNXe link aggregation MTU to 9000 and ESXi lose connection to datastore. Change VNXe MTU back to 1500 and ESXi still can’t see the datastore. Trying to provision new datastore to ESXi results an error. Removing the other IP address doesn’t resolve the problem.

Ling aggregation MTU 1500 and iSCSI server with two IP addresses. Provisioned storage works on ESXi. Removing the other IP from iSCSI server and changing MTU to 9000. Datastore is still visible and accessible from ESXi side. Changing MTU back to 1500 and datastore is still visible and accessible from ESXi. Datastore provisioning to ESXi is successful. After adding another IP address to iSCSI server ESX loses the connection to datastore. Provisioning new datastore to ESXi results an error. Removing the other IP address also doesn’t resolve the problem.

Ling aggregation MTU 1500 and iSCSI server with one IP address. Provisioned storage works on ESX. Change MTU to 9000. Datastore is still visible and accessible from ESXi side. Changing MTU back to 1500 and datastore is still visible and accessible from ESXi. Datastore provisioning to ESXi is successful. After adding another IP address to iSCSI server ESX loses the connection to datastore. Provisioning new datastore to ESXi results an error. Removing the other IP doesn’t resolve the problem.

Link aggregation MTU 1500 and two iSCSI servers on one SP both configured with one IP. One datastore on both iSCSI servers (there is also an issue getting the datastore on the other iSCSI server provisioned, see my previous post). Adding a second IP for the first iSCSI server and both datastores are still accessible from ESXi. When changing MTU to 9000 ESX loses connection to both datastores. Changing MTU back to 1500 and both datastores are still not visible on ESXi. Also getting the same error as previously when trying to provision new storage.

I also tested different combinations with iSCSI servers on different SPs and if SPA iSCSI server has two IP addresses and SPB iSCSI server has only one IP and the MTU is changed then the datastores on SPB iSCSI server are not affected.

How to fix this?

Currently there is no official fix for this. I have reported the problem to EMC support and demonstrated the issue to EMC support technician and uploaded all the logs, so they are working on trying to find the root cause of this.

Apparently when an iSCSI server has two IP addresses and the MTU is changed the iSCSI server goes to some kind of “lockdown” mode and doesn’t allow any connections to be initiated. Like I already described the VNXe can be returned to operational state by removing all datastores and iSCSI servers and recreating them. Of course this is not an option when there is production data on the datastores.

EMC support technician showed me a quicker and a less radical workaround to get the array back to operational state: Restarting the iSCSI service on the VNXe. CAUTION: Restarting iSCSI service will disconnect all provisioned datastores from hosts. Connection to datastores will be established after the iSCSI service is restarted. But this will cause all running VMs to crash.

The easiest way to restart iSCSI service is enabling the iSNS server from iSCSI server settings, giving it an IP address and applying changes. After the changes are applied iSNS server can be disabled. This will trigger the iSCSI service to restart and all datastores that were disconnected are again visible and usable on ESXi.

Conclusions

After this finding I would suggest not to configure iSCSI serves with two IP addresses. If MTU change can do this much damage what about other changes?

If you have two iSCSI servers with two IP addresses I would advise not to change MTU even if it would be done during a planned service break. If for some reason it is mandatory to do the change, contact EMC support before doing it. If you have arrays affected by this issue I would encourage to contact EMC support before trying to restart the iSCSI service.

Once again I have to give credit to EMC support. They have some great people working there.

Advertisements

3 responses to “Changing MTU on VNXe disconnects datastores from ESXi

  • Kent

    Hi Henri, just experienced the same exact same issue. Can i ask if i set the MTU to 9000 before creating the iSCSI servers (with 2 IPs) and then create datastores is that okay? I am going to try this anyway but dont want to have issues later?

    • henriwithani

      Yes you can change the MTU and then create the iSCSI server with two ip addresses. If you are not using link aggregate, remember to set the addresses on different network cards.

      You shouldn’t have issues later if you don’t change anything. But if you later create/break link aggregation you have to restart the iSCSI server.

  • Jo

    Had the same issue. Luckily I wasn’t production yet. Getting very frustrated with this array. Can’t understand why they couldn’t just use VNX OE code in a scaled down box and just disabled some feature sets to make it entry level instead of building new celerra based code from the ground up. Too many kinks to work out.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: