Tag Archives: ESXi

Changing MTU on VNXe disconnects datastores from ESXi


While testing the VNXe 3100 (OE 2.1.3.16008) I found a problem when changing the MTU settings for link aggregate. With specific combination of configurations, changing the MTU causes the ESXi (4.1.0, 502767) to loose all iSCSI datastores and even changing the settings back the datastores are still not visible on ESXi. VNXe also can’t provision new datastores to ESXi while this problem is occurring. There are a couple of workarounds for this but no official fix is available to avoid this kind of a situation.

How did I find it?

After the initial configuration I created link aggregate from two ports, set the MTU to 9000 and also created one iSCSI server on SP A with two IP addresses. I then configured ESXi also to use MTU 9000. Datastore creation on VNXe side went through successfully but on the ESXi side I could see an error that the VMFS volume couldn’t be created.

I could see the LUN under iSCSI adapter but manually creating a VMFS datastore also failed. I then realized that I hadn’t configured jumbo frames on the switch and decided to change the ESXi and VNXe MTUs back to 1500. After I changed the VNXe MTU the LUN disappeared from ESXi. Manually changing the datastore access settings from VNXe didn’t help either. I just couldn’t get the ESXi see the LUN anymore. I then tried to provision a new datastore to ESX but got this error:

Ok, so I deleted the datastore and the iSCSI server and then recreated the iSCSI server and provisioned a new datastore for the ESXi without any problems. I had a suspicion that the MTU change caused the problem and tried it again. I changed the link aggregation on VNXe from 1500 to 9000 and after that was done the datastore disappeared from ESXi. Changing MTU back to 1500 didn’t help, the datastore and LUN were not visible on ESX. Also creating a new datastore gave the same error as before. Datastore was created on VNXe but was not accessible from ESXi. Deleting and recreating datastores and iSCSI servers resolved the issue again.

What is the cause of this problem?

So it seemed that the MTU change was causing the problem. I started testing with different scenarios and found out that the problem was the combination of the MTU change and also the iSCSI server having two IP addresses. Here are some scenarios that I tested (sorry about the rough grammar, tried to keep the descriptions short):

Link aggregation MTU 1500 and iSCSI server with two IP addresses. Provisioned storage works on ESXi. Changing VNXe link aggregation MTU to 9000 and ESXi lose connection to datastore. Change VNXe MTU back to 1500 and ESXi still can’t see the datastore. Trying to provision new datastore to ESXi results an error. Removing the other IP address doesn’t resolve the problem.

Ling aggregation MTU 1500 and iSCSI server with two IP addresses. Provisioned storage works on ESXi. Removing the other IP from iSCSI server and changing MTU to 9000. Datastore is still visible and accessible from ESXi side. Changing MTU back to 1500 and datastore is still visible and accessible from ESXi. Datastore provisioning to ESXi is successful. After adding another IP address to iSCSI server ESX loses the connection to datastore. Provisioning new datastore to ESXi results an error. Removing the other IP address also doesn’t resolve the problem.

Ling aggregation MTU 1500 and iSCSI server with one IP address. Provisioned storage works on ESX. Change MTU to 9000. Datastore is still visible and accessible from ESXi side. Changing MTU back to 1500 and datastore is still visible and accessible from ESXi. Datastore provisioning to ESXi is successful. After adding another IP address to iSCSI server ESX loses the connection to datastore. Provisioning new datastore to ESXi results an error. Removing the other IP doesn’t resolve the problem.

Link aggregation MTU 1500 and two iSCSI servers on one SP both configured with one IP. One datastore on both iSCSI servers (there is also an issue getting the datastore on the other iSCSI server provisioned, see my previous post). Adding a second IP for the first iSCSI server and both datastores are still accessible from ESXi. When changing MTU to 9000 ESX loses connection to both datastores. Changing MTU back to 1500 and both datastores are still not visible on ESXi. Also getting the same error as previously when trying to provision new storage.

I also tested different combinations with iSCSI servers on different SPs and if SPA iSCSI server has two IP addresses and SPB iSCSI server has only one IP and the MTU is changed then the datastores on SPB iSCSI server are not affected.

How to fix this?

Currently there is no official fix for this. I have reported the problem to EMC support and demonstrated the issue to EMC support technician and uploaded all the logs, so they are working on trying to find the root cause of this.

Apparently when an iSCSI server has two IP addresses and the MTU is changed the iSCSI server goes to some kind of “lockdown” mode and doesn’t allow any connections to be initiated. Like I already described the VNXe can be returned to operational state by removing all datastores and iSCSI servers and recreating them. Of course this is not an option when there is production data on the datastores.

EMC support technician showed me a quicker and a less radical workaround to get the array back to operational state: Restarting the iSCSI service on the VNXe. CAUTION: Restarting iSCSI service will disconnect all provisioned datastores from hosts. Connection to datastores will be established after the iSCSI service is restarted. But this will cause all running VMs to crash.

The easiest way to restart iSCSI service is enabling the iSNS server from iSCSI server settings, giving it an IP address and applying changes. After the changes are applied iSNS server can be disabled. This will trigger the iSCSI service to restart and all datastores that were disconnected are again visible and usable on ESXi.

Conclusions

After this finding I would suggest not to configure iSCSI serves with two IP addresses. If MTU change can do this much damage what about other changes?

If you have two iSCSI servers with two IP addresses I would advise not to change MTU even if it would be done during a planned service break. If for some reason it is mandatory to do the change, contact EMC support before doing it. If you have arrays affected by this issue I would encourage to contact EMC support before trying to restart the iSCSI service.

Once again I have to give credit to EMC support. They have some great people working there.


MS iSCSI vs. RDM vs. VMFS


Have you ever wondered if there is a real performance difference when a LUN is connected using a Microsoft (MS) iSCSI initiator, using raw device mapping (RDM) or when the virtual disk is on VMFS? You might also have wondered if multipathing (MP) really makes difference. While investigating other iSCSI performance issues I ended up doing some Iometer tests with different disk configurations. In this post I will share some of the test results.

Test environment [list edited 9/7/2011]

  • EMC CX4-240
  • Dell PE M710HD Blade server
  • Two Dell PowerConnect switches
  • Two 10G iSCSI NICs per ESXi, total of four iSCSI paths.
  • Jumbo frames enabled
  • ESXi 4.1U1
  • Virtual Win 2008 (1vCPU and 4GB memory)

Disk Configurations

  • 4x300GB 15k FC Disk RAID10
    • 100GB LUN for VMFS partition (8MB block size)
      • 20GB virtual disk (vmdk) for VMFS and VMFS MP tests
    • 20GB LUN for MS iSCSI, MS iSCSI MP, RDM physical and RDM virtual tests

MS iSCSI initiator used virtual machine’s VNXNET 3 adapter (one or two depending on the configuration) that was connected to the dedicated iSCSI network through ESXi’s 10GB nic. MS iSCSI initiator multipathing was configured using Microsoft TechNet Installing and Configuring MPIO guide. Multipathing for RDM and VMFS disks was configured by enabling round robin path selection policy. When multipathing was enabled there were two active paths to storage.

Iometer configuration

When I was trying to figure out what would be the best way to test the different disk configurations I found a post “Open unofficial storage performance thread”  from VMware Communities. In the thread there is this Iometer configuration that would test maximum throughput and also simulate real life scenario. Other Community users have also posted their results there. I decided to use the Iometer configuration posted on the thread so that I could also compare my results with the others.

Max Throughput-100%Read

  • 1 Worker
  • 8000000 sectors max disk size
  • 64 outstanding I/Os per target
  • 500 transactions per connection
  • 32KB transfer request size
  • 100% sequential distribution
  • 100% Read distribution
  • 5 minute run time

Max Throughput-50%Read

  • 1 Worker
  • 8000000 sectors max disk size
  • 64 outstanding I/Os per target
  • 500 transactions per connection
  • 32KB transfer request size
  • 100% sequential distribution
  • 50% read/write distribution
  • 5 minute run time

RealLife-60%Rand-65%Read

  • 1 Worker
  • 8000000 sectors max disk size
  • 64 outstanding I/Os per target
  • 500 transactions per connection
  • 8KB transfer request size
  • 40% sequential / 60% random distribution
  • 35 % read /65% write distribution
  • 5 minute run time

Random-8k-70%Read

  • 1 Worker
  • 8000000 sectors max disk size
  • 64 outstanding I/Os per target
  • 500 transactions per connection
  • 8KB transfer request size
  • 100% random distribution
  • 30 % read /70% write distribution
  • 5 minute run time

Test results

Each Iometer test was ran twice and the results are the average of those two test runs. If the results were not similar enough (i.e. several hundreds difference in IOps) then a third test was ran and the results are the average of those three runs.

Conclusions

Looking at these results VMFS was performing very well with both single and multipath. Both RDM disks with multipathing are really close to the performance of VMFS. And then there is MS iSCSI initiator that gave kind of conflicting results. You would think that multipathing would give better results than single path, but actually that was the case only on the max throughput test. Keep in mind that these tests were ran on a virtual machine that was running on ESXi and that the MS iSCSI initiator was configured to use virtual nics. I would guess that Windows Server 2008 running on a physical server with MS iSCSI initiator would give much better results.

Overall VMFS would be the best choice to put the virtual disk on but it’s not always that simple. Some clustering softwares don’t support virtual disks on VMFS and then the options are RDM or MS iSCSI. There could also be limitations for physical or virtual RDM disk usage.

Disclaimer

These results reflect the performance of the environment that the tests were ran in. Results may vary depending on the hardware and how the environment is configured.


EFD vs. FC Pools


Our CX4 with Flare30 has been in production for about six months now and we decided to add some more FAST Cache on it. It currently has two mirrored 100GB EFDs configured as FAST cache and we just got two new 100GB disks to be added to the cache. We’ve also been pondering if we should add EFDs to the current pools for databases. Before adding the two new disks to the cache I wanted to make some performance tests on the EFDs. I also wanted to compare the EFD performance with the performance of the current pools that we have in production.

The focus of these tests was to see if the EFDs would have the desired performance advantage against the current pools that we already have in use. Like I mentioned we already have 100GB FAST cache in use and it is also enabled on the pools that I used to run these tests.

I used Iometer to generate the load and to gather the results. In the past I’ve done Iometer tests with storage arrays that are not in any other use. In those cases I’ve used iometer setup described in VMware’s Recommendations for Aligning VMFS Partitions document. Using those settings to run Iometer tests would have been time consuming and would have also generated huge load on the production CX. Now that I was only focusing to compare the simulated database load on different disk configurations I decided to run the test with only one transfer request size.

While I was creating the disks for the test I decided to add a couple of more disks and run some additional tests. I was curious to see how a properly alligned disk would really perform compared to an unaligned one and also what kind of performance difference there was between VMFS and RAW-disks. Yes I know that the VMware’s document I mentioned above already proves that an aligned disk performs better than an unaligned. I just wanted to know what was the case in our environment.

Test environment [list edited 9/7/2011]

  • CX4-240 with 91GB FAST Cache
  • Dell PE M710HD Blade server
  • 2x Dell PowerConnect switches
  • Two 10G iSCSI NICs with total of four paths between storage and ESXi. Round robin path selection policy enabled for each lun with two I/O active paths.
  • ESXi 4.1U1
  • Virtual Win 2003 SE SP2 (1vCPU and 2GB memory)

Disk Configurations

  • 15 FC Disk RAID5 Pool with FAST Cache enabled
    • 50GB LUN for VMFS partition
      • 20GB unaligned virtual disk (POOL_1_vmfs_u)
      • 20GB aligned virtual disk (POOL_1_vmfs_a)
    • 20GB LUN for unaligned RAW disk (POOL_1_raw_u)
    • 20GB LLUN for aligned RAW disk (POOL_1_raw_a)
  • 25 FC Disk RAID5 Pool with FAST Cache enabled
    • 50GB LUN for VMFS partition
      • 20GB unaligned virtual disk (POOL_2_vmfs_u)
      • 20GB aligned virtual disk (POOL_2_vmfs_a)
    • 20GB LUN for unaligned RAW disk  (POOL_2_raw_u)
    • 20GB LLUN for aligned RAW disk  (POOL_1_raw_a)
  • 2 EFD Disk RAID1 RAID Group
    • 50GB LUN for VMFS partition
      • 20GB unaligned virtual disk (EFD_vmfs_u)
      • 20GB aligned virtual disk (EFD_vmfs_a)
    • 20GB LUN for unaligned RAW disk (EFD_raw_u)
    • 20GB LLUN for aligned RAW disk (EFD_raw_a)

Raw disks were configured to use physical compatibility mode on ESXi.

Unaligned disks were configured using Windows Disk Management and formatted using default values.

Partitions to aligned disks were created using diskpart command ‘create partition primary align=1024′ and partition were formatted with a 32K allocation size.

Iometer configuration

  • 1 Worker
  • 8KB transfer request size
  • Read/write ratio of 66/34 and 100% random distribution
  • 8 outstanding I/Os per target
  • 4 minute run time
  • 60 sec ramp-up time

Results

Each Iometer test to specific disk was repeated three times. Results are the average of these three runs. Keep in mind that the array was running over 100 production VMs during the tests, so these results are not absolute.

Conclusions

When comparing the results on unaligned disks and aligned disks there are no huge differences. Although POOL_1_raw_u and POOL_2_vmfs_u results kind of jump out from those charts. I did three more test runs for those disks and still got the same results. This might have something to do with the production load that we are having on the CX.

Also the performance differences between raw disks and disks on vmfs were not major, but still noticeable, i.e. the difference on IOps between POOL_2_vmfs_a and POOL_2_raw_a is over 200. EFD raw is also giving about 200 more IOps than EFD vmfs.

Let’s get to the point. The whole purpose of these tests was to compare FC pool and EFD performance. If you haven’t noticed from the graphs the difference is HUGE! Do I even have to say more? I think the graphs have spoken. Those 5000+ IOps was achieved only with two EFDs. Think about having a whole array full of those.

After these tests my suggestion is to use VMFS datastores instead of raw disks. But there are still some cases that you might need to use raw disks with virtual machines, i.e. when having a physical/virtual cluster. Aligning Windows Server disks is not a big thing anymore because Windows Server 2008 does that automatically. If you have some old Windows Server 2003 installations I would suggest you to check if the disks are aligned or not. There is a Microsoft KB that describes how to check disk alignment. If the server disks are not aligned you might want to start planning to move your data to aligned disks. What comes to the EFDs the performance gained using those is self-evident. EFDs are still a bit expensive. But think about the price of the arrays and the disks needed for the same IOps than what the EFDs can provide. In some cases you need to think more about the price per IO than price per GB.

Disclaimer

These results reflect the performance of the environment that the tests were ran. Results may vary depending on the hardware and how the environment is configured.


ESX 4.x losing statistics


Just noticed some weird behaviour on ESXi 4.1 U1 (build 348481) statistics while rescanning datastores. Wanted to share my findings and maybe also find an explanation or a fix for this problem.

I started “rescan for datastores” on a cluster that has two ESXi’s in it and noticed that while the scan was on the realtime statistics wasn’t showing any graph. Same issue with both hosts when using Virtual Center and also when connected directly to the ESXi. Also no graph on VM’s during the rescan. I did some tests with ESX/ESXi 4.1 (build 260247) using different storages and got the problem reproduced on those too.

Gaps on the graph while rescanning:

I also noticed that all of ESXi’s CPU was consumed during the rescan.

During the scan:

After the scan:

One host was showing zero CPU usage while rescanning:

Don’t know if this is a bug or what but I hope I can find a fix for this soon.

Has anyone seen this happen before?


%d bloggers like this: