Tag Archives: Virtual

MS iSCSI vs. RDM vs. VMFS


Have you ever wondered if there is a real performance difference when a LUN is connected using a Microsoft (MS) iSCSI initiator, using raw device mapping (RDM) or when the virtual disk is on VMFS? You might also have wondered if multipathing (MP) really makes difference. While investigating other iSCSI performance issues I ended up doing some Iometer tests with different disk configurations. In this post I will share some of the test results.

Test environment [list edited 9/7/2011]

  • EMC CX4-240
  • Dell PE M710HD Blade server
  • Two Dell PowerConnect switches
  • Two 10G iSCSI NICs per ESXi, total of four iSCSI paths.
  • Jumbo frames enabled
  • ESXi 4.1U1
  • Virtual Win 2008 (1vCPU and 4GB memory)

Disk Configurations

  • 4x300GB 15k FC Disk RAID10
    • 100GB LUN for VMFS partition (8MB block size)
      • 20GB virtual disk (vmdk) for VMFS and VMFS MP tests
    • 20GB LUN for MS iSCSI, MS iSCSI MP, RDM physical and RDM virtual tests

MS iSCSI initiator used virtual machine’s VNXNET 3 adapter (one or two depending on the configuration) that was connected to the dedicated iSCSI network through ESXi’s 10GB nic. MS iSCSI initiator multipathing was configured using Microsoft TechNet Installing and Configuring MPIO guide. Multipathing for RDM and VMFS disks was configured by enabling round robin path selection policy. When multipathing was enabled there were two active paths to storage.

Iometer configuration

When I was trying to figure out what would be the best way to test the different disk configurations I found a post “Open unofficial storage performance thread”  from VMware Communities. In the thread there is this Iometer configuration that would test maximum throughput and also simulate real life scenario. Other Community users have also posted their results there. I decided to use the Iometer configuration posted on the thread so that I could also compare my results with the others.

Max Throughput-100%Read

  • 1 Worker
  • 8000000 sectors max disk size
  • 64 outstanding I/Os per target
  • 500 transactions per connection
  • 32KB transfer request size
  • 100% sequential distribution
  • 100% Read distribution
  • 5 minute run time

Max Throughput-50%Read

  • 1 Worker
  • 8000000 sectors max disk size
  • 64 outstanding I/Os per target
  • 500 transactions per connection
  • 32KB transfer request size
  • 100% sequential distribution
  • 50% read/write distribution
  • 5 minute run time

RealLife-60%Rand-65%Read

  • 1 Worker
  • 8000000 sectors max disk size
  • 64 outstanding I/Os per target
  • 500 transactions per connection
  • 8KB transfer request size
  • 40% sequential / 60% random distribution
  • 35 % read /65% write distribution
  • 5 minute run time

Random-8k-70%Read

  • 1 Worker
  • 8000000 sectors max disk size
  • 64 outstanding I/Os per target
  • 500 transactions per connection
  • 8KB transfer request size
  • 100% random distribution
  • 30 % read /70% write distribution
  • 5 minute run time

Test results

Each Iometer test was ran twice and the results are the average of those two test runs. If the results were not similar enough (i.e. several hundreds difference in IOps) then a third test was ran and the results are the average of those three runs.

Conclusions

Looking at these results VMFS was performing very well with both single and multipath. Both RDM disks with multipathing are really close to the performance of VMFS. And then there is MS iSCSI initiator that gave kind of conflicting results. You would think that multipathing would give better results than single path, but actually that was the case only on the max throughput test. Keep in mind that these tests were ran on a virtual machine that was running on ESXi and that the MS iSCSI initiator was configured to use virtual nics. I would guess that Windows Server 2008 running on a physical server with MS iSCSI initiator would give much better results.

Overall VMFS would be the best choice to put the virtual disk on but it’s not always that simple. Some clustering softwares don’t support virtual disks on VMFS and then the options are RDM or MS iSCSI. There could also be limitations for physical or virtual RDM disk usage.

Disclaimer

These results reflect the performance of the environment that the tests were ran in. Results may vary depending on the hardware and how the environment is configured.


EFD vs. FC Pools


Our CX4 with Flare30 has been in production for about six months now and we decided to add some more FAST Cache on it. It currently has two mirrored 100GB EFDs configured as FAST cache and we just got two new 100GB disks to be added to the cache. We’ve also been pondering if we should add EFDs to the current pools for databases. Before adding the two new disks to the cache I wanted to make some performance tests on the EFDs. I also wanted to compare the EFD performance with the performance of the current pools that we have in production.

The focus of these tests was to see if the EFDs would have the desired performance advantage against the current pools that we already have in use. Like I mentioned we already have 100GB FAST cache in use and it is also enabled on the pools that I used to run these tests.

I used Iometer to generate the load and to gather the results. In the past I’ve done Iometer tests with storage arrays that are not in any other use. In those cases I’ve used iometer setup described in VMware’s Recommendations for Aligning VMFS Partitions document. Using those settings to run Iometer tests would have been time consuming and would have also generated huge load on the production CX. Now that I was only focusing to compare the simulated database load on different disk configurations I decided to run the test with only one transfer request size.

While I was creating the disks for the test I decided to add a couple of more disks and run some additional tests. I was curious to see how a properly alligned disk would really perform compared to an unaligned one and also what kind of performance difference there was between VMFS and RAW-disks. Yes I know that the VMware’s document I mentioned above already proves that an aligned disk performs better than an unaligned. I just wanted to know what was the case in our environment.

Test environment [list edited 9/7/2011]

  • CX4-240 with 91GB FAST Cache
  • Dell PE M710HD Blade server
  • 2x Dell PowerConnect switches
  • Two 10G iSCSI NICs with total of four paths between storage and ESXi. Round robin path selection policy enabled for each lun with two I/O active paths.
  • ESXi 4.1U1
  • Virtual Win 2003 SE SP2 (1vCPU and 2GB memory)

Disk Configurations

  • 15 FC Disk RAID5 Pool with FAST Cache enabled
    • 50GB LUN for VMFS partition
      • 20GB unaligned virtual disk (POOL_1_vmfs_u)
      • 20GB aligned virtual disk (POOL_1_vmfs_a)
    • 20GB LUN for unaligned RAW disk (POOL_1_raw_u)
    • 20GB LLUN for aligned RAW disk (POOL_1_raw_a)
  • 25 FC Disk RAID5 Pool with FAST Cache enabled
    • 50GB LUN for VMFS partition
      • 20GB unaligned virtual disk (POOL_2_vmfs_u)
      • 20GB aligned virtual disk (POOL_2_vmfs_a)
    • 20GB LUN for unaligned RAW disk  (POOL_2_raw_u)
    • 20GB LLUN for aligned RAW disk  (POOL_1_raw_a)
  • 2 EFD Disk RAID1 RAID Group
    • 50GB LUN for VMFS partition
      • 20GB unaligned virtual disk (EFD_vmfs_u)
      • 20GB aligned virtual disk (EFD_vmfs_a)
    • 20GB LUN for unaligned RAW disk (EFD_raw_u)
    • 20GB LLUN for aligned RAW disk (EFD_raw_a)

Raw disks were configured to use physical compatibility mode on ESXi.

Unaligned disks were configured using Windows Disk Management and formatted using default values.

Partitions to aligned disks were created using diskpart command ‘create partition primary align=1024′ and partition were formatted with a 32K allocation size.

Iometer configuration

  • 1 Worker
  • 8KB transfer request size
  • Read/write ratio of 66/34 and 100% random distribution
  • 8 outstanding I/Os per target
  • 4 minute run time
  • 60 sec ramp-up time

Results

Each Iometer test to specific disk was repeated three times. Results are the average of these three runs. Keep in mind that the array was running over 100 production VMs during the tests, so these results are not absolute.

Conclusions

When comparing the results on unaligned disks and aligned disks there are no huge differences. Although POOL_1_raw_u and POOL_2_vmfs_u results kind of jump out from those charts. I did three more test runs for those disks and still got the same results. This might have something to do with the production load that we are having on the CX.

Also the performance differences between raw disks and disks on vmfs were not major, but still noticeable, i.e. the difference on IOps between POOL_2_vmfs_a and POOL_2_raw_a is over 200. EFD raw is also giving about 200 more IOps than EFD vmfs.

Let’s get to the point. The whole purpose of these tests was to compare FC pool and EFD performance. If you haven’t noticed from the graphs the difference is HUGE! Do I even have to say more? I think the graphs have spoken. Those 5000+ IOps was achieved only with two EFDs. Think about having a whole array full of those.

After these tests my suggestion is to use VMFS datastores instead of raw disks. But there are still some cases that you might need to use raw disks with virtual machines, i.e. when having a physical/virtual cluster. Aligning Windows Server disks is not a big thing anymore because Windows Server 2008 does that automatically. If you have some old Windows Server 2003 installations I would suggest you to check if the disks are aligned or not. There is a Microsoft KB that describes how to check disk alignment. If the server disks are not aligned you might want to start planning to move your data to aligned disks. What comes to the EFDs the performance gained using those is self-evident. EFDs are still a bit expensive. But think about the price of the arrays and the disks needed for the same IOps than what the EFDs can provide. In some cases you need to think more about the price per IO than price per GB.

Disclaimer

These results reflect the performance of the environment that the tests were ran. Results may vary depending on the hardware and how the environment is configured.


%d bloggers like this: