Tag Archives: CX

Hands-on with VNXe 3300 Part 6: Performance


Now that the VNXe is installed, configured and also some storage has been provisioned to ESXi hosts it is time to look at the performance. Like I mentioned in the first post I had already gathered some test results from CX4-240 using Iometer and I wanted to make similar tests with VNXe so that the results could be comparable.

Hands-on with VNXe 3300 series:

  1. Initial setup and configuration
  2. iSCSI and NFS servers
  3. Software update
  4. Storage pools
  5. Storage provisioning
  6. VNXe performance
  7. Wrap up

Test environment CX4-240

  • EMC CX4-240
  • Dell PE M710HD Blade server
  • Two 10G iSCSI NICs with total of four paths between storage and ESXi. Round robin path selection policy enabled for each LUN with two active I/O  paths
  • Jumbo Frames enabled
  • ESXi 4.1U1
  • Virtual Win 2003 SE SP2 (1vCPU and 2GB memory)

Test environment VNXe 3300

  • EMC VNXe 3300
  • Dell PE M710HD Blade server
  • Two 10Gb iSCSI NICs with total of two paths between storage and ESXi. Round robin path selection policy enabled for each LUN with two active I/O paths (see Trunk restrictions and Load balancing)
  • Jumbo Frames enabled
  • ESXi 4.1U1
  • Virtual Win 2003 SE SP2 (1vCPU and 2GB memory)

Iometer Configuration

I used Iometer setup described in VMware’s Recommendations for Aligning VMFS Partitions (page 7) document.

Disk configuration

I had to shorten the explanations on the charts so here are the definitions:

  • CX4 FC 15D
    • 15 15k FC Disk RAID5 Pool on CX4-240 connected with iSCSI
  • CX4 SATA 25D
    • 25 7.2k SATA Disk RAID5 Pool on CX240 connected with iSCSI
  • VNXe 21D 2.0.3
    • 21 15k SAS Disk RAID 5 (3×6+1) Pool on VNXe 3300 connected with iSCSI. VNXe Software version 2.0.3
  • VNXe 28D 2.0.3
    • 28 15k SAS Disk RAID 5 (4×6+1) Pool on VNXe 3300. connected with iSCSI. VNXe Software version 2.0.3
  • VNXe 28D 2.1.0
    • 28 15k SAS Disk RAID 5 (4×6+1) Pool on VNXe 3300 connected with iSCSI. VNXe Software version 2.1.0
  • VNXe RG 2.1.0
    • 7 15k SAS RAID5 (6+1) RG on VNXe connected with iSCSI. VNXe Software version 2.1.0
  • VNXe 28D NFS
    • 28 15k SAS RAID 5 (4×6+1) Pool on VNXe 3300 connected with NFS. VNXe Software version 2.1.0

100Gb thick LUN was created to each pool and RG where 20Gb virtual disk was stored. This 20Gb virtual disk was presented to the virtual Windows server that was used to conduct the test. Partition to this disk was created using diskpart command ’create partition primary align=1024′ and partition was formatted with a 32K allocation size.

Trunk restrictions

Before I go through the results I want to address the limitation with the trunk between VNXe and the 10Gb switch that it is connected to. Even though there are 4Gb (4x1Gb) trunk between the storage and the switch the maximum throughput is only the throughput of one physical port.

While I was running the tests I had SSH connection open to VNXe and I ran netstat -i -c command to see what was going on with the trunk and individual ports. The first screen capture is taken while 8k sequential read test was running. You can see that all the traffic is going through that one port:

The second screen capture is taken while the VNXe was in production and several virtual machines were accessing the disk. In this case the load is balanced randomly between the physical ports:

Load balancing

VNXe 3300 is active/active array but doesn’t support ALUA. This means that LUN can only be accessed through one SP. One iSCSI/NFS server can only have one IP and this IP can only be tied to one port or trunk. Also LUN can only be served by one iSCSI/NFS server. So there will be only one path from the switch to VNXe. Round robin path selection policy can be enabled on ESX side but this will only help to balance the load between the ESX NICs. Even without the trunk round robin can’t be used to balance the load between the four VNXe ports.

Test results

Each Iometer test was ran twice and the results are the average of those two test runs. If the results were not similar enough (i.e. several hundred difference in IOps) then a third test was ran and the results are the average of those three runs.

Same as previous but without NFS results:

Average wait time

Conclusions

The first thing that caught my eye was the difference between VNXe 28 disk pool and 7 disk RG on the random write test. Quote from my last post about the pool structure:

When LUN is created it will be placed on the RAID group (RG) that is the least utilized from a capacity point of view. If the LUN created is larger than the free space on an individual RG the LUN will be then extended across multiple RGs but there is no striping involved.

The tests were ran on 100Gb LUN so it should fit in one RG if the information that I got would be correct. So comparing the pool and the results from one RG random write test it seems that even smaller LUNs are divided across multiple RGs.

Another interesting detail is the difference of the software version 2.0.3 and 2.1.0. Looking at the difference of these results it is obvious that the software version has a big effect on the performance.

NFS storage performance with random write was really bad. But with the 1k sequential read it surprised giving out 30000 iops. Based on these test I would stick with the iSCSI and maybe look at the NFS again after the next software version.

Overall VNXe is performing very well compared to CX. With this configuration the VNXe it is hitting the limits of the one physical port. This could be fixed by adding 10Gb I/O modules. Would be nice run the same test with the 10Gb I/O modules.

We are coming to an end of my hands-on series and I’ll be wrapping up the series in the next post.

[Update 2/17/2012] Updated NFS performance results with VNXe 3100 and OE 2.1.3.16008: VNXe 3100 performance

Disclaimer

These results reflect the performance of the environment that the tests were ran in. Results may vary depending on the hardware and how the environment is configured.


MS iSCSI vs. RDM vs. VMFS


Have you ever wondered if there is a real performance difference when a LUN is connected using a Microsoft (MS) iSCSI initiator, using raw device mapping (RDM) or when the virtual disk is on VMFS? You might also have wondered if multipathing (MP) really makes difference. While investigating other iSCSI performance issues I ended up doing some Iometer tests with different disk configurations. In this post I will share some of the test results.

Test environment [list edited 9/7/2011]

  • EMC CX4-240
  • Dell PE M710HD Blade server
  • Two Dell PowerConnect switches
  • Two 10G iSCSI NICs per ESXi, total of four iSCSI paths.
  • Jumbo frames enabled
  • ESXi 4.1U1
  • Virtual Win 2008 (1vCPU and 4GB memory)

Disk Configurations

  • 4x300GB 15k FC Disk RAID10
    • 100GB LUN for VMFS partition (8MB block size)
      • 20GB virtual disk (vmdk) for VMFS and VMFS MP tests
    • 20GB LUN for MS iSCSI, MS iSCSI MP, RDM physical and RDM virtual tests

MS iSCSI initiator used virtual machine’s VNXNET 3 adapter (one or two depending on the configuration) that was connected to the dedicated iSCSI network through ESXi’s 10GB nic. MS iSCSI initiator multipathing was configured using Microsoft TechNet Installing and Configuring MPIO guide. Multipathing for RDM and VMFS disks was configured by enabling round robin path selection policy. When multipathing was enabled there were two active paths to storage.

Iometer configuration

When I was trying to figure out what would be the best way to test the different disk configurations I found a post “Open unofficial storage performance thread”  from VMware Communities. In the thread there is this Iometer configuration that would test maximum throughput and also simulate real life scenario. Other Community users have also posted their results there. I decided to use the Iometer configuration posted on the thread so that I could also compare my results with the others.

Max Throughput-100%Read

  • 1 Worker
  • 8000000 sectors max disk size
  • 64 outstanding I/Os per target
  • 500 transactions per connection
  • 32KB transfer request size
  • 100% sequential distribution
  • 100% Read distribution
  • 5 minute run time

Max Throughput-50%Read

  • 1 Worker
  • 8000000 sectors max disk size
  • 64 outstanding I/Os per target
  • 500 transactions per connection
  • 32KB transfer request size
  • 100% sequential distribution
  • 50% read/write distribution
  • 5 minute run time

RealLife-60%Rand-65%Read

  • 1 Worker
  • 8000000 sectors max disk size
  • 64 outstanding I/Os per target
  • 500 transactions per connection
  • 8KB transfer request size
  • 40% sequential / 60% random distribution
  • 35 % read /65% write distribution
  • 5 minute run time

Random-8k-70%Read

  • 1 Worker
  • 8000000 sectors max disk size
  • 64 outstanding I/Os per target
  • 500 transactions per connection
  • 8KB transfer request size
  • 100% random distribution
  • 30 % read /70% write distribution
  • 5 minute run time

Test results

Each Iometer test was ran twice and the results are the average of those two test runs. If the results were not similar enough (i.e. several hundreds difference in IOps) then a third test was ran and the results are the average of those three runs.

Conclusions

Looking at these results VMFS was performing very well with both single and multipath. Both RDM disks with multipathing are really close to the performance of VMFS. And then there is MS iSCSI initiator that gave kind of conflicting results. You would think that multipathing would give better results than single path, but actually that was the case only on the max throughput test. Keep in mind that these tests were ran on a virtual machine that was running on ESXi and that the MS iSCSI initiator was configured to use virtual nics. I would guess that Windows Server 2008 running on a physical server with MS iSCSI initiator would give much better results.

Overall VMFS would be the best choice to put the virtual disk on but it’s not always that simple. Some clustering softwares don’t support virtual disks on VMFS and then the options are RDM or MS iSCSI. There could also be limitations for physical or virtual RDM disk usage.

Disclaimer

These results reflect the performance of the environment that the tests were ran in. Results may vary depending on the hardware and how the environment is configured.


EFD vs. FC Pools


Our CX4 with Flare30 has been in production for about six months now and we decided to add some more FAST Cache on it. It currently has two mirrored 100GB EFDs configured as FAST cache and we just got two new 100GB disks to be added to the cache. We’ve also been pondering if we should add EFDs to the current pools for databases. Before adding the two new disks to the cache I wanted to make some performance tests on the EFDs. I also wanted to compare the EFD performance with the performance of the current pools that we have in production.

The focus of these tests was to see if the EFDs would have the desired performance advantage against the current pools that we already have in use. Like I mentioned we already have 100GB FAST cache in use and it is also enabled on the pools that I used to run these tests.

I used Iometer to generate the load and to gather the results. In the past I’ve done Iometer tests with storage arrays that are not in any other use. In those cases I’ve used iometer setup described in VMware’s Recommendations for Aligning VMFS Partitions document. Using those settings to run Iometer tests would have been time consuming and would have also generated huge load on the production CX. Now that I was only focusing to compare the simulated database load on different disk configurations I decided to run the test with only one transfer request size.

While I was creating the disks for the test I decided to add a couple of more disks and run some additional tests. I was curious to see how a properly alligned disk would really perform compared to an unaligned one and also what kind of performance difference there was between VMFS and RAW-disks. Yes I know that the VMware’s document I mentioned above already proves that an aligned disk performs better than an unaligned. I just wanted to know what was the case in our environment.

Test environment [list edited 9/7/2011]

  • CX4-240 with 91GB FAST Cache
  • Dell PE M710HD Blade server
  • 2x Dell PowerConnect switches
  • Two 10G iSCSI NICs with total of four paths between storage and ESXi. Round robin path selection policy enabled for each lun with two I/O active paths.
  • ESXi 4.1U1
  • Virtual Win 2003 SE SP2 (1vCPU and 2GB memory)

Disk Configurations

  • 15 FC Disk RAID5 Pool with FAST Cache enabled
    • 50GB LUN for VMFS partition
      • 20GB unaligned virtual disk (POOL_1_vmfs_u)
      • 20GB aligned virtual disk (POOL_1_vmfs_a)
    • 20GB LUN for unaligned RAW disk (POOL_1_raw_u)
    • 20GB LLUN for aligned RAW disk (POOL_1_raw_a)
  • 25 FC Disk RAID5 Pool with FAST Cache enabled
    • 50GB LUN for VMFS partition
      • 20GB unaligned virtual disk (POOL_2_vmfs_u)
      • 20GB aligned virtual disk (POOL_2_vmfs_a)
    • 20GB LUN for unaligned RAW disk  (POOL_2_raw_u)
    • 20GB LLUN for aligned RAW disk  (POOL_1_raw_a)
  • 2 EFD Disk RAID1 RAID Group
    • 50GB LUN for VMFS partition
      • 20GB unaligned virtual disk (EFD_vmfs_u)
      • 20GB aligned virtual disk (EFD_vmfs_a)
    • 20GB LUN for unaligned RAW disk (EFD_raw_u)
    • 20GB LLUN for aligned RAW disk (EFD_raw_a)

Raw disks were configured to use physical compatibility mode on ESXi.

Unaligned disks were configured using Windows Disk Management and formatted using default values.

Partitions to aligned disks were created using diskpart command ‘create partition primary align=1024′ and partition were formatted with a 32K allocation size.

Iometer configuration

  • 1 Worker
  • 8KB transfer request size
  • Read/write ratio of 66/34 and 100% random distribution
  • 8 outstanding I/Os per target
  • 4 minute run time
  • 60 sec ramp-up time

Results

Each Iometer test to specific disk was repeated three times. Results are the average of these three runs. Keep in mind that the array was running over 100 production VMs during the tests, so these results are not absolute.

Conclusions

When comparing the results on unaligned disks and aligned disks there are no huge differences. Although POOL_1_raw_u and POOL_2_vmfs_u results kind of jump out from those charts. I did three more test runs for those disks and still got the same results. This might have something to do with the production load that we are having on the CX.

Also the performance differences between raw disks and disks on vmfs were not major, but still noticeable, i.e. the difference on IOps between POOL_2_vmfs_a and POOL_2_raw_a is over 200. EFD raw is also giving about 200 more IOps than EFD vmfs.

Let’s get to the point. The whole purpose of these tests was to compare FC pool and EFD performance. If you haven’t noticed from the graphs the difference is HUGE! Do I even have to say more? I think the graphs have spoken. Those 5000+ IOps was achieved only with two EFDs. Think about having a whole array full of those.

After these tests my suggestion is to use VMFS datastores instead of raw disks. But there are still some cases that you might need to use raw disks with virtual machines, i.e. when having a physical/virtual cluster. Aligning Windows Server disks is not a big thing anymore because Windows Server 2008 does that automatically. If you have some old Windows Server 2003 installations I would suggest you to check if the disks are aligned or not. There is a Microsoft KB that describes how to check disk alignment. If the server disks are not aligned you might want to start planning to move your data to aligned disks. What comes to the EFDs the performance gained using those is self-evident. EFDs are still a bit expensive. But think about the price of the arrays and the disks needed for the same IOps than what the EFDs can provide. In some cases you need to think more about the price per IO than price per GB.

Disclaimer

These results reflect the performance of the environment that the tests were ran. Results may vary depending on the hardware and how the environment is configured.


Unisphere showing LUN in two pools after migration


Just wanted to do a quick post about Unisphere showing  LUN in two pools after migrating LUN from one pool to another. There is also a quick fix for it: close your browser window and log back in to Unisphere. This small glitch rose my blood pressure for a while and I already started thinking through all kinds of data corruption issues that it might cause.

Here is what happened: I have two pools, Pool 0 and Pool 1. Using Unisphere GUI I migrated LUN from Pool 0 to Pool 1 and the migration went throught just fine. After the migration I noticed that the migrated LUN was showing up in both pools. I tried refreshing both ‘Pools’ and ‘Detailed’ views, I also went to different tabs and back to the ‘Pools/RAID Groups’ view and refreshed views again. LUN was still in both pools. Properties for that specific LUN were identical on both pools. Luckily closing the browser window and logging back in to Unisphere solved this issue.

I’ve been using Unisphere since FLARE 30 was released and haven’t seen this kind of behaviour before. I’ve seen Unisphere to hang (just showing white screen) and crash several times but nothing like this has ever happened.


%d bloggers like this: