My first year as a blogger


First of all I have to say that 2011 was a great year. One big thing for me last year was the birth of my blog. I actually created my blog on November 2010 but made my first “shy” post on February 2011. I still wasn’t sure if I wanted to blog or not. In the first month I had 51 visits on my blog. I did a couple of more posts in the following months but monthly visits were still less than 50 per month. I was on twitter but had less than 100 followers at that time. I wanted to blog but wasn’t sure if it was really worth it.

In May I attended EMC World and met lots of great people there. My blog visits went up to 200 in that month and I also gained a few more twitter followers. After May I didn’t post as much as I should have and the visits dropped again. I was still active on twitter and kept interacting with the people that I met at EMC world.

On August I was implementing few VNXe 3300s and decided to blog about my hands-on experience. I was planning to do only four posts about it but it eventually grew to a seven post series + a couple of additional follow up posts. I was amazed to see almost 1000 visits on my blog in August. Apparently this was a hot topic. In August I also attended VMworld where I again met lots of great people. I also had many great talks about the VNXe posts that I had made and also generally about VNXe, storage and virtualization. Also some EMC folks had noticed my blog and contacted me. I started to feel good about the blog and writing.

One of the great conversations that I had in VMworld was the one that I had with Scott Lowe about VNXe. I’ve been reading Scott’s blog for years and also read a couple of his books but this was the first time that I met him in person. After VMworld Scott mentioned me and my VNXe hands-on series on his blog. The day that Scott published the post was the busiest day on my blog with 450 views. Thanks to Scott, September was also the busiest month of my blog with 4300 views.

VNXe seemed to be a really interesting topic last year. After September I still got over 3000 visitors per month through the end of the year. Also by end of the year I reached 400 followers on twitter. So Thank you(!) for all my followers, subscribers and random blog readers.

Blog Stats

– 16,374 total visits in 2011 (15900 between August – December)

– 4300 visits on the busiest month (September)

– 450 visits on the busiest day (September 12)

– 20 Posts

Top 5 posts:

– Hands-on with VNXe 3300 Part 1: Initial setup (2305 views)

Hands-on with VNXe 3300 Part 6: Performance (1916 views)

Hands-on with VNXe 3300 Part 2: iSCSI and NFS servers (1794 views)

Running ESXi 5 on VMware Player (1300 views)

Hands-on with VNXe 3300 Part 3: Software update (1271 views)

In my opinion these are quite good numbers for someone who just started blogging. Like I have already said before: you have to start from somewhere.

What’s up for 2012?

Well 2012 is going to be even better. I have several interesting projects coming up and I’m really looking forward to those. One is hopefully taking place in January and involving VNXe (surprise surprise). Most importantly I’ll keep blogging and trying to stay active on twitter also. I have serveral posts lined up and new ideas evolve in my head almost daily. I guess that’s what blogging is all about. Writing helps me relax but with busy days at work, wife and two small kids at home and hobbies it is sometimes hard to find the time. I also have to thank my amazing wife for being so understanding.

This year will also bring some changes for me and my family but I will tell more about it later.  I’m excited but still have some mixed feelings.

Thanks again y’all and have a great year 2012!

-Henri


VADP backup fails to remove snapshot


I have noticed that sometimes after the vStorage APIs for Data Protection (VADP) backup the virtual machine (VM) snapshot is not deleted even when the backup is successfully completed. This can cause a chain reaction that would save several snapshot vmdk-files on the datastore and eventually the datastore could run out of space. After the first failed snapshot removal VADP backups continue working normally except that the snapshot vmdk-file amount starts growing. In some cases the failed snapshot removal leaves an error message on the vCenter events but this is not always the case.

How to identify the problem

Like I already mentioned the issue can be spotted from the growing number of snapshot vmdk-files on the datastore. If you are monitoring VM snapshots then you should be able to notice the situation before the datastore runs out of space.

Another thing is to check if the VM has any shapshots. When VADP backup is running there should be “Consolidate Helper” snapshot active and after the VADP backup is done this should be deleted. If the backup is not running and this snapshot exists this confirms that there is an issue with the snapshots.

There could also be “Unable to access file <unspecified filename> since it is locked” error shown on the VM’s task details

I’ve also seen that even when the VADP initiated snapshot removal is successful the “Consolidate Helper” snapshot and snapshot vmdk-files still exist.

At this point I would suggest reading Ivo Beerens’ blog post about a similar issue with the snapshots. He is describing a solution when getting the “Unable to access file <unspecified filename> since it is locked” error. It didn’t work on my case so I had to find another way to solve this issue.

After the orphaned “Consolidate Helper” snapshot is manually removed vCenter is not showing any shapshots for the VM and also checking from ESX console confirms that there are no snapshots, however, all the snapshot vmdk-files are still present.

How to fix the problem

The first thing is to schedule downtime for the VM because it needs to be shut down to complete these steps. Because the snapshot files keep increasing there should be enough free space on the datastore to accommodate the snapshots until this fix can be performed.

The next thing would be to make sure that the VADP backup is disabled while the following operations are performed. Running VADP backup while working on the virtual disks can really mess up the snapshots.

After the previous steps are covered and the VM is shut down make a copy of the VM folder. This is the first thing I do if I have to work with vmdk-files. Just in case if something goes wrong.

The fix is to clone the vmdk-file with snapshots to a new vmdk using vmkfstools-command (the VM that I was working on was on ESX 4.1 so vmkfstools was still available) to consolidate the snapshots and then remove the current virtual disk(s) from the VM and add the new cloned disk(s) to it. Although there are some considerations before cloning the vmdks:

Don’t rely on the fact that the vmdk-file with the highest number (i.e. [servername]-000010.vmdk) is the latest snapshot. Always check from VM properties or from vmx-file if using command line.

VM properties:

[servername].vmx from command line:

If you plan to work with the copied vmdk-files keep in mind that the “parentFileNameHint=” row on the vmdk-file points to the original location of the parent. So before you clone the vmdk-file you should change the path to point to the path of the copy.

Now that the latest snapshot vmdk-file is recognized the clone can be done with the vmkfstools –i command from command line:

vmkfstools –i [servername]-0000[nn].vmdk [newname].vmdk

After the clone is done the virtual disk can be removed from VM (I used the “remove from virtual machine” option, not the delete option) and the new one can be added. If the VM has more than one virtual disk then this procedure hast to be done to all of them. After confirming that the VM starts normally and that all the data is intact the unused vmdks can be removed. In my case I had VM with two virtual disks and both had serveral snapshot vmdks so I used storage vMotion to move the VM to another datastore and then deleted the folder that was left to the old datastore.


Hidden VNXe performance statistics


The latest Operating Environment upgrades have already brought some improvements to the statistics that are shown through the Unisphere GUI. The first VNXe OE that I worked with was showing only CPU statistics. Then along with update 2.1.0 Network Activity and Volume Activity statistics came available. I was still hoping to get some more statistics. IOps and latency graphs would have been nice additions. So I did some digging and found out that there is actually lots of statistics parameters that VNXe gathers but those are just stored in the database, maybe for support purposes.

Where is the data stored?

When logging in to the VNXe via SSH using service account and listing the content of the folder /EMC/backend/perf_stats you will see that there are several db-files in that folder.

Now when opening the file with notepad it is quite clear what kind of databases those are:

How to read the data?

Now that we know that the data is stored in SQLite database the next thing is to export the data to readable format. To do this SQLite shell is needed. SQLite is really simple to use, just download shell and run a couple of commands.

To open the database, to select the output file and to export all the data can all be done with using only three commands:

Now all the content of the database is exported to stats_basic_summary.txt. Data can now be imported to spreadsheet or to another database.

What data is stored in the databases?

Actually there is a lot of parameters and data in those databases. Here is just few of the parameters.

DART parameters in stats_basic_default.db:

SysClockUnixms
NetBasicBytesIn
NetBasicBytesOut
NetInPackets
NetOutPackets
TCPInPackets
TCPOutPackets
UDPInPackets
UDPOutPackets
StoreReadBytes
StoreWriteBytes
StoreReadRequests
StoreWriteRequests

DART parameters in stats_basic_summary.db:

NetBasicBytesIn
NetBasicBytesOut
NetInPackets
NetOutPackets
TCPInPackets
TCPOutPackets
UDPInPackets
UDPOutPackets
StoreWriteBytes
StoreReadBytes
StoreReadRequests
StoreWriteRequests
KernelBufCacheHits
kernelBufCacheLookups
CifsActiveConnections
CifsTotalConnections
CifsBasicReadBytes
CifsBasicReadOpCount
CifsBasicWriteBytes
CifsBasicWriteOpCount
FsDnlcHits
FsDnlctotal
FsOfCachehits
FsOfCachetotal
NfsActiveConnections
NfsBasicReadBytes
NfsBasicReadOpCount
NfsBasicWriteBytes
NfsBasicWriteOpCount
iSCSIBasicReads
iSCSIReadBytes
iSCSIBasicWrites
iSCSIWriteBytes

FLARE_SP parameters in stats_basic_summary.db:

HardErrorCount
HighWaterMarkFlushOff
IdleFlushOn
LowWaterMarkFlushOff
writeCacheFlushes
writeCacheBlocksFlushed
ReadHitRatio
SPTimestamp
SumOfQueueLengths
arrivalsToNonzeroQueue
SumOfLUNBlkRead
SumOfLUNBlkWrite
SumOfLUNDiskRead
SumOfLUNDIskWrite
SumOfLUNDiskBlkRead
SumOfLUNDiskBlkWrite
SumOfFRUBlkRead
SumOfFRUBlkWrite
SumOfFRUReadCount
SumOfFRUWriteCount

How can that data be used?

I take the StoreReadRequests parameter from stats_basic_default.db as an example. Some of the parameters have descriptions and this is one of those:

Total number of read requests on all DART volumes

Here is the format that the data is in after imported to spreadsheet:

There is a time stamp and also a value for the StoreReadRequests. It seems that the number of read requests that were recorded during the five minute period is added to the old value and then inserted as a new entry to the database. So basically subtracting the the earlier value from the new one we get the total number of read requests for all DART volumes for the specific five minute period of time:

4267021177 – 4266973002 = 48175

Now if we divide that result with 300 (seconds) we get the average number of read requests on all DART volumes per second during the specific five minute period:

48175 / 300 = 160.58

With some spreadsheet magic it is easy to create a nice “requests per second” graph from the data:

How can I be sure that my theory is correct?

Well, NetBasicBytesIn and NetBasicBytesOut parameter values in the stats_basic_default.db are also growing with every time stamp. These are also defined in the database: Total Bytes DART received/sent from all NICs. So I used the same math to do a graph showing network statistics for the past 24 hours. I then compared that graph with the Unisphere’s network activity graph and those were matching.

The graph that I put together using the values from the database and the formula  introduced earlier:

Unisphere network activity graph:

Conclusions

I really hope that EMC will bring more statistics to the GUI or introduce a way to export the data to readable format a bit easier. From what I’ve heard Clint Kitson from EMC has already wrote some scripts for pulling the stats from VNXe but it is not yet published for the customers. Digging into the databases is kind of a quick and dirty way to get more statistics out of the VNXe, but it seems to be working.


VNXe 3300 performance follow up (EFDs and RR settings)


On my previous post about VNXe 3300 performance I introduced results from the performance tests I had done with VNXe 3300. I will use those results as a comparison for the new tests that I ran recently. In this post I will compare the performance difference with different Round Robin policy settings. I also had a chance to test the performance of EFD disks on VNXe.

Round Robin settings

On the previous post all the tests were ran on default RR settings which means that ESX would send 1000 commands through one path before changing the path. I observed that with the default RR settings I was only getting the bandwidth of one link on the four port LACP trunk. I got some feedback from Ken advising to change the default RR IO operation limit setting from 1000 to 1 to get two links worth of bandwidth from VNXe. So I wanted to test what kind of an effect would this change have on performance.

Arnim van Lieshout has a really good post about configuring RR using PowerCLI and I used his examples for configuring the IO operation limit from 1000 to 1. If you are not confident running the full PowerCLI scripts Arnim introduced in his post here is how RR settings for individual device could be changed using GUI and couple of simple PowerCLI commands:

1. Change datastore path selection policy to RR (from vSphere client – select host – configure – storage adapters – iSCSI sw adapter – right click the device and select manage paths – for path selection select Round Robin (VMware) and click change)

2. Open PowerCLI and connect to the server

Connect-VIServer -Server [servername]

3. Retrieve esxcli instance

$esxcli = Get-EsxCli

4. Change device IO Operation Limit to 1 and set Limit Type to Iops. [deviceidentifier] can be found from vSphere client’s iSCSI sw adapter view and is in format of naa.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.

$esxcli.nmp.roundrobin.setconfig($null,”[deviceidentifier]”,1,”iops”,$null)

5. Check that the changes were completed.

$esxcli.nmp.roundrobin.getconfig(“[deviceidentifier]”)

Results 1

For these tests I used same environment and Iometer settings that I described on my Hands-on with VNXe 3300 Part 6: Performance post.

Results 2

For these tests I used the same environment except instead of virtual Win 2003 I used virtual Win 2008 (1vCPU and 4GB memory) and the following Iometer settings (I picked up these settings from VMware Community post Open unofficial storage performance thread):

Max Throughput-100%Read

  • 1 Worker
  • 8000000 sectors max disk size
  • 64 outstanding I/Os per target
  • 500 transactions per connection
  • 32KB transfer request size
  • 100% sequential distribution
  • 100% Read distribution
  • 5 minute run time

Max Throughput-50%Read

  • 1 Worker
  • 8000000 sectors max disk size
  • 64 outstanding I/Os per target
  • 500 transactions per connection
  • 32KB transfer request size
  • 100% sequential distribution
  • 50% read/write distribution
  • 5 minute run time

RealLife-60%Rand-65%Read

  • 1 Worker
  • 8000000 sectors max disk size
  • 64 outstanding I/Os per target
  • 500 transactions per connection
  • 8KB transfer request size
  • 40% sequential / 60% random distribution
  • 35 % read /65% write distribution
  • 5 minute run time

Random-8k-70%Read

  • 1 Worker
  • 8000000 sectors max disk size
  • 64 outstanding I/Os per target
  • 500 transactions per connection
  • 8KB transfer request size
  • 100% random distribution
  • 30 % read /70% write distribution
  • 5 minute run time
[Updated 11/29/11] After I had published this post Andy Banta gave me a hint on twitter:
You might squeeze more by using a number like 5 to 10, skipping
some of the path change cost.
So I ran couple of more tests changing the IO operation limit between 5-10. With the 28 disk pool there was no big difference when used the values 1 or 5-10. With the EFDs the magic number seemed to be 6 and with that I managed to get 16 MBps and 1100 IOps more out of the disks with specific work loads. I added the new EFD results to the graphs. 


Conclusions

Changing the default RR policy setting from the default 1000 io to 1 io really makes a difference on VNXe. On random workload there is not much difference between these two settings. But with sequential workload the difference is significant. Sequental write IOps and throughput is more than double with certain block sizes when using the 1 io setting. If you have ESXs connected to VNXe with LACP trunk I would recommend changing the RR policy to 1 5-10. Like I already mentioned Arnim has a really good post about configuring RR settings using PowerCLI. Another good post about multipathing is A “Multivendor Post” on using iSCSI with VMware vSphere by Chad Sakac.

Looking at the results it is obvious that EFD disks perform much better than SAS disks. On sequential workload 28 Disk SAS pool’s performance is about the same as 5 disk EFD RG’s. But on random workload EFD’s performance is about two times better than SAS pool’s. There was no other load on the disks while these tests were ran so under additional load I would expect EFD’s performing much better on sequential load as well. Better performance doesn’t come withouth a bigger price tag. EFD disks are still over 20 times more expensive per TB than SAS disks but then again SAS disks are about 3 times more expensive per IO than EFD disks.

Now if only EFDs could be used as cache on VNXe.

Disclaimer

These results reflect the performance of the environment that the tests were ran in. Results may vary depending on the hardware and how the environment is configured.


VNXe Operating Environment 2.1.1


[Edited 10/12/2011] Apparently EMC has pulled back the VNXe OE MR1 SP1 (2.1.1.14913) and it is not available for download anymore. According to EMC new image and release notes will be available soon.      

EMC has released VNXe operating environment version 2.1.1.14913 (2.1 SP 1) for VNXe 3100 and 3300. And how did I find out about the new update? Well my VNXe told me about it:

Release notes and software are available on supportbeta.emc.com. The first thing that I noticed on the short description on the upgrade package was that VNXe OE has to be 2.1.0.14097 or higher before upgrading to 2.1.1. On the release notes I couldn’t find any mention about this. The only mention about mandatory upgrade is that VNXe should be upgraded to version 2.0.2 or later within 42 days of the initial VNXe installation or otherwise the system has to be powered off before upgrade (KB article emc265195). I also mentioned about this issue on my previous post about VNXe software update. So I contacted the support using chat and quickly got a confirmation that the VNXe has to be on 2.1.0.14097 before upgrading to 2.1.1.

Here is a quick pick of the new features, enhancements and fixes. Full descriptions can be found from the release notes.

New features and enhancements

6+1 RAID5 is now supported on VNXe 3100 with SAS drives and user-defined pools. Automatic configuration will still use 4+1 RAID5 for SAS drives.

EFD drives and 1TB NL SAS drives are now supported on VNXe 3300 DPE and DAE.

There have also been improvements to Unisphere performance.

Fixed

  • Looping problem that might cause SP to boot when network or power cable is disconnected
  • Issues with email alerts
  • Issues with password reset button causing SP to reboot
  • Error with hidden shared folders
  • VMFS datastore creation issues

There is also a long list of known problems and limitations. Couple of those concern VMware integration and are good to keep in mind:

  • VMFS datastore created from VNXe will be VMFS3 an use 8MB blocksize.
  • Manual rescan for storage is required after deleting datastore from standalone ESX server

Hands-on with VNXe 3300 Part 7: Wrap up


When I started writing this series my first plan was to just make four quick posts about my experience with EMC VNXe 3300. After I started going further into detail on the configuration I realized that I couldn’t get everything that I wanted to express to fit in four posts. So I ended up adding three more posts to the series. Still, with this series I’m just touching the surface of VNXe 3300. There are so many functionalities that I didn’t go through or didn’t even mention. That was also my intention with this series. I wanted to write about my experience and look in to those features that I would implement.

  1. Initial setup and configuration
  2. iSCSI and NFS servers
  3. Software update
  4. Storage pools
  5. Storage provisioning
  6. VNXe performance
  7. Wrap up

EMCWORLD_39

Simple, Efficient, Affordable

Those are the words that EMC uses for marketing VNXe and I can mostly agree that those are accurate adjectives for VNXe. In the first part I also wanted to add the adjective “easy” among those words. A user can do the initial setup and VNXe can be operational in less than an hour depending on the software version. Unisphere UI layout is also very user friendly and illustrative. Configuration and updating SW are easy and simple.

EMCWORLD_42

Customers buying VNXe just based on the marketing material might face a big surprise when looking in to the actual memory configuration. Yes, VNXe has 12GB memory per SP, but only 256MB is dedicated for read cache and 1GB dedicated for write cache.

Configuration considerations

Even though it is easy and simple to get VNXe up and running and to start provisioning storage this doesn’t mean that the planning phase can be ignored. User can easily be in a really bad situation and the only way out is to delete all datastores to do the proper reconfigurations. Creating only one iSCSI server and putting all datastores on that one server creates a situation where all the I/O goes through one SP and the other SP is idle. Depending on the ESX iSCSI settings only one network port on VNXe could be utilized even if a four port trunk was configured. Fixing this problem is not as easy as creating it. VNXe doesn’t allow changing the datastore iSCSI server after the datastore is created. To assign a different iSCSI server for a datastore it has to be deleted and recreated. This is, again, one issue that I’m hoping will be fixed.

When using four 1GB ports my suggestion would be to configure NIC aggregation on VNXe as I described in part 2. For the ESX configuration I would suggest reading the detailed comment Ken posted in part 6 about the ESX iSCSI configurations. What comes to the VNXe iSCSI and datastore configurations I ended up creating equal number of datastores for each SP and also dedicating one iSCSI server per datastore to get the most out of the four port trunk.

Issues

The issues that I faced during the configuration were mostly minor usability flaws and some of those were already fixed in the latest software version. The biggest issue that I found was that the VNXe had to be powered off before a software update if it had been running for more than 42 days. I’ve discussed these issues with EMC and hopefully they will be fixed in the future releases.

Conclusions

Despite all the criticism I think that VNXe 3300 is a great product and it will be even better when the few small flaws are fixed. I’m really looking forward to seeing what kind of new features will be introduced in the future software releases. Chad Sakac gave a hint on his blog post about FAST VP support coming in to VNXe at some point. He also mentioned that VAAI (file) and SRM support will be coming out still this year.

I can see some new VNXe 3300 related blog posts in my near future but I think it is time to close this series and keep the new posts separate. If you have any questions about my experience with VNXe or other general questions about it please leave a comment.


Hands-on with VNXe 3300 Part 6: Performance


Now that the VNXe is installed, configured and also some storage has been provisioned to ESXi hosts it is time to look at the performance. Like I mentioned in the first post I had already gathered some test results from CX4-240 using Iometer and I wanted to make similar tests with VNXe so that the results could be comparable.

Hands-on with VNXe 3300 series:

  1. Initial setup and configuration
  2. iSCSI and NFS servers
  3. Software update
  4. Storage pools
  5. Storage provisioning
  6. VNXe performance
  7. Wrap up

Test environment CX4-240

  • EMC CX4-240
  • Dell PE M710HD Blade server
  • Two 10G iSCSI NICs with total of four paths between storage and ESXi. Round robin path selection policy enabled for each LUN with two active I/O  paths
  • Jumbo Frames enabled
  • ESXi 4.1U1
  • Virtual Win 2003 SE SP2 (1vCPU and 2GB memory)

Test environment VNXe 3300

  • EMC VNXe 3300
  • Dell PE M710HD Blade server
  • Two 10Gb iSCSI NICs with total of two paths between storage and ESXi. Round robin path selection policy enabled for each LUN with two active I/O paths (see Trunk restrictions and Load balancing)
  • Jumbo Frames enabled
  • ESXi 4.1U1
  • Virtual Win 2003 SE SP2 (1vCPU and 2GB memory)

Iometer Configuration

I used Iometer setup described in VMware’s Recommendations for Aligning VMFS Partitions (page 7) document.

Disk configuration

I had to shorten the explanations on the charts so here are the definitions:

  • CX4 FC 15D
    • 15 15k FC Disk RAID5 Pool on CX4-240 connected with iSCSI
  • CX4 SATA 25D
    • 25 7.2k SATA Disk RAID5 Pool on CX240 connected with iSCSI
  • VNXe 21D 2.0.3
    • 21 15k SAS Disk RAID 5 (3×6+1) Pool on VNXe 3300 connected with iSCSI. VNXe Software version 2.0.3
  • VNXe 28D 2.0.3
    • 28 15k SAS Disk RAID 5 (4×6+1) Pool on VNXe 3300. connected with iSCSI. VNXe Software version 2.0.3
  • VNXe 28D 2.1.0
    • 28 15k SAS Disk RAID 5 (4×6+1) Pool on VNXe 3300 connected with iSCSI. VNXe Software version 2.1.0
  • VNXe RG 2.1.0
    • 7 15k SAS RAID5 (6+1) RG on VNXe connected with iSCSI. VNXe Software version 2.1.0
  • VNXe 28D NFS
    • 28 15k SAS RAID 5 (4×6+1) Pool on VNXe 3300 connected with NFS. VNXe Software version 2.1.0

100Gb thick LUN was created to each pool and RG where 20Gb virtual disk was stored. This 20Gb virtual disk was presented to the virtual Windows server that was used to conduct the test. Partition to this disk was created using diskpart command ’create partition primary align=1024′ and partition was formatted with a 32K allocation size.

Trunk restrictions

Before I go through the results I want to address the limitation with the trunk between VNXe and the 10Gb switch that it is connected to. Even though there are 4Gb (4x1Gb) trunk between the storage and the switch the maximum throughput is only the throughput of one physical port.

While I was running the tests I had SSH connection open to VNXe and I ran netstat -i -c command to see what was going on with the trunk and individual ports. The first screen capture is taken while 8k sequential read test was running. You can see that all the traffic is going through that one port:

The second screen capture is taken while the VNXe was in production and several virtual machines were accessing the disk. In this case the load is balanced randomly between the physical ports:

Load balancing

VNXe 3300 is active/active array but doesn’t support ALUA. This means that LUN can only be accessed through one SP. One iSCSI/NFS server can only have one IP and this IP can only be tied to one port or trunk. Also LUN can only be served by one iSCSI/NFS server. So there will be only one path from the switch to VNXe. Round robin path selection policy can be enabled on ESX side but this will only help to balance the load between the ESX NICs. Even without the trunk round robin can’t be used to balance the load between the four VNXe ports.

Test results

Each Iometer test was ran twice and the results are the average of those two test runs. If the results were not similar enough (i.e. several hundred difference in IOps) then a third test was ran and the results are the average of those three runs.

Same as previous but without NFS results:

Average wait time

Conclusions

The first thing that caught my eye was the difference between VNXe 28 disk pool and 7 disk RG on the random write test. Quote from my last post about the pool structure:

When LUN is created it will be placed on the RAID group (RG) that is the least utilized from a capacity point of view. If the LUN created is larger than the free space on an individual RG the LUN will be then extended across multiple RGs but there is no striping involved.

The tests were ran on 100Gb LUN so it should fit in one RG if the information that I got would be correct. So comparing the pool and the results from one RG random write test it seems that even smaller LUNs are divided across multiple RGs.

Another interesting detail is the difference of the software version 2.0.3 and 2.1.0. Looking at the difference of these results it is obvious that the software version has a big effect on the performance.

NFS storage performance with random write was really bad. But with the 1k sequential read it surprised giving out 30000 iops. Based on these test I would stick with the iSCSI and maybe look at the NFS again after the next software version.

Overall VNXe is performing very well compared to CX. With this configuration the VNXe it is hitting the limits of the one physical port. This could be fixed by adding 10Gb I/O modules. Would be nice run the same test with the 10Gb I/O modules.

We are coming to an end of my hands-on series and I’ll be wrapping up the series in the next post.

[Update 2/17/2012] Updated NFS performance results with VNXe 3100 and OE 2.1.3.16008: VNXe 3100 performance

Disclaimer

These results reflect the performance of the environment that the tests were ran in. Results may vary depending on the hardware and how the environment is configured.


Hands-on with VNXe 3300 Part 5: Storage provisioning


When provisioning storage on VNXe there are many options depending on the host type: Microsoft Exchange, Shared folders, Generic iSCSI, VMware or Hyper-V . Because this VNXe is only going to serve VMware ESXi hosts I’m going to concentrate on that part. I will go through provisioning storage from VNXe and also using VSI Unified Storage Management Plug-in.

Hands-on with VNXe 3300 series [list updated 9/23/2011]:

  1. Initial setup and configuration
  2. iSCSI and NFS servers
  3. Software update
  4. Storage pools
  5. Storage provisioning
  6. VNXe performance
  7. Wrap up

iSCSI datastore

In the last part a storage pool was created. The next step is to create datastores on it and provision those to the hosts. There are two options when provisioning datastores for VMware from VNXe: VMware datastore or generic iSCSI. When using the VMware storage VNXe will automatically configure the iSCSI targets to selected ESX hosts and also create the VMFS datastore. If generic iSCSI is created then all those steps have to be done manually to each ESX. I really recommend using the VMware storage from these two options. VMware storage can be created from Storage – VMware storage.

At this point only iSCSI server was configured so the only option was to create a VMFS datastore.

The next step is to select the iSCSI server and the size for the datastore. When creating the first datastore it really doesn’t matter which iSCSI server is selected. If the iSCSI_A (on SP A) server is selected for the first datastore then iSCSI_B (on SP B) should be selected for the second datastore to balance the load on the SP’s. When selecting the iSCSI server on SP A this means that all the datastore I/O will go through the SP A. If all the datastores are placed on one SP there could be a situation where the whole VNXe performance is impacted because all I/O goes through that SP and the other SP is idle. So it is important to balance the datastores between SPs. VNXe is not doing this automatically so the user has to manually do this when creating datastores. If datastores are distributed between the SPs and the other SP fails all the datastores on the failed SP’s iSCSI server are moved to the other SP. When the failed SP comes back online all the datastores originally located on it are moved back.

There is an option to configure protection for the storage. Because this datastore is only for testing I chose not to configure the protection.

Step 5 is to select the hosts that the datastore will be attached to. Datastore can be connected to a specific iSCSI initiator on ESX server by expanding the host and selecting Datastore access for the specific IQN. If Datastore access is selected from the host level then the VNXe targets are added to all iSCSI initiators on ESX.

After these steps are completed VNXe starts creating the datastore, adding iSCSI targets to the selected ESX hosts and iSCSI initiators on those and finally mounts and creates the VMFS datastore.

 iSCSI datastore issues

After creating the first datastore I noticed from vCenter that there were “Add Internet SCSI send targets” and “Rescan all HBAs” tasks showing on the hosts that I selected to add the datastore to. After watching those tasks looping for 15 minutes and datastore not showing up on the ESXi servers I figured out that there was something wrong with the configuration.

I found out that the ESXi server had datastores connected also from other storage units that use CHAP authentication. On the ESXi iSCSI initiator the CHAP settings were set to “Inherit from parent” and that meant all the new targets would also inherit these CHAP settings. After disabling the inheritance the new datastore was connected and the VMFS datastore was created on it. I haven’t tried to use CHAP authentication with VNXe datastores so I don’t know if those settings are automatically configured to ESX. VNXe already has the ability to manipulate the ESX server configuration so I would imagine that it could also be possible to change the iSCSI target option to “Do not use CHAP” when VMware datastore is created on VNXe without CHAP authentication. Maybe in next the software version?

Another issue I had was that  a VMFS datastore was not always created during the process of creating a VMware datastore. VMware Storage Wizard on VNXe indicated that creating a VMware datastore is completed but actually the “create VMFS datastore” task was never initiated on the ESX. I’ve created over 20 datastores using this method and I would say that in about 50% of those cases the VMFS datastores were also created. Not a big thing, but still an annoying small glitch.

NFS datastore

Creating an NFS datastore is very similar to creating an iSCSI datastore. The only differences are steps 2 and 5 on the VMware Storage Wizard. On step 2 NFS is selected instead of VMFS (iSCSI). On this page there are two “hidden” advanced options: Deduplication and Caching. These options are hidden under the “Show advanced” link – similar to what criticized in iSCSI server configuration. In my opinion these should be shown by default.

On step 3 the same rule applies to selecting the NFS server as with the iSCSI server. The user has to manually balance the datastores between the SPs.

On step 5 datastore access (no access, read-only or read/write) will be chosen for the host or a specific ip address on host.

When all the steps on the wizard are done VNXe creates the datastore and mounts it to the selected host or hosts. I only gave one host access to this new NFS datastore but I could see that VNXe tried to do something NAS related to all the hosts in the Virtual Center connected to VNXe and gave some errors:

VSI

In a nutshell VSI Unified Storage Management (USM) is a plug-in that integrates with VMware vCenter and can be used to provision storage using vCenter UI. There is lots of good documentation on EMC Unified Storage Community and Powerlink so I’m not going to dig any deeper into it. VSI USM can be downloaded from Powerlink – Support – Software Downloads and Licensing – Downloads T-Z – Virtual Storage Integrator (VSI). I recommend reading the VSI USM Read Me First document to see what else needs to be installed to make the VSI plug-in work.

After the VSI USM plug-in and all the other needed packages have been installed the VNXe has to be connected to vCenter. This is done from vCenter Home – Solutions and Applications – EMC – Unified Storage Management – Add. A wizard will walk through the steps needed to connect vCenter and VNXe.

Now storage can be provisioned to a cluster or to an individual ESX host by right clicking cluster/host and selecting EMC – Unified Storage – Provision Storage.

The wizard follows the same steps as the VMware Storage Wizard when provisioning storage from VNXe.

Storage type:

Storage Array:

Storage Pool:

iSCSI Server:

Storage Details:

When using VSI to provision storage the iSCSI initiators and targets are configured automatically and VMFS datastore is also created in the process.

Conclusions

Again the suitable word to describe storage provisioning would be simple, if it would work every time. After provisioning several datastores I noticed that a VMFS datastore wasn’t always created when the iSCSI storage was provisioned from VNXe. Also there were issues if CHAP wasn’t used on VNXe but was used on ESX host for other datastores. This happens when using either VNXe or VSI storage provisioning.

Storage provisioning from VNXe is easy but it is even easier using VSI. When the initial setup is done, the iSCSI/NFS server configured and the storage pool(s) created there isn’t a need to login to VXNe anymore to provision storage if VSI is in use. This of course needs vCenter and all the necessary plug-ins to be installed.

Some users might never see these issues that I found out but for some these might be show stoppers. Not all businesses have vCenters in use so they have to use the Unisphere UI to provision storage and then the VMFS datastore might or might not be created. I can imagine how frustrated users can be when facing these kinds of issues.

Also, users shouldn’t  be responsible of the decision in which SP the new datastore is placed on. This should be something that VNXe decides.

Don’t take me wrong. The integration with VNXe and vCenter/ESX is smooth and it will be even better after these issues have been fixed.

In the next part of my hands-on series I will look into the performance of VNXe 3300 and I will also post some test statistics.


Hands-on with VNXe 3300 Part 4: Storage pools


When EMC announced that VNXe will also utilize storage pools my first thought was that it is similar to what CX/VNX has. Storage pool would consist of five disk RAID 5 groups and LUNs would be striped across all of these RAID groups to utilize all spindles. After some discussions with EMC experts I found out that this is not how the pool works in VNXe. In this part I will go a bit deeper into the pool structure and also explain how Storage Pool is created.

Hands-on with VNXe 3300 series [list edited 9/23/2011]:

  1. Initial setup and configuration
  2. iSCSI and NFS servers
  3. Software update
  4. Storage pools
  5. Storage provisioning
  6. VNXe performance
  7. Wrap up

Pool Structure

VNXe 3300 can be furnished with SAS, NL-SAS or Flash drives. The one that I was configuring had 30 SAS disks so there were two options when creating Storage Pools: 6+1 drive RAID 5 groups or 3+3 RAID 1/0 groups. I chose to create one big pool with 28 disks (four 6+1 RAID 5 groups) and one hot spare disk (EMC recommends having one hot spare disk for every 30 SAS disks).  EMC also recommends not putting any I/O intensive load on the first four disks because PSL (Persistent Storage Layout) is located on those disks. I wanted to test the storage pool performance with all the disks that were available so I ignored this recommendation and also used the first four disks in the pool too.

When LUN is created it will be placed on the RAID group (RG) that is the least utilized from a capacity point of view. If the LUN created is larger than the free space on an individual RG the LUN will be then extended across multiple RGs but there is no striping involved. So depending of the LUN size and pool utilization a new LUN could reside either in one RG or several RGs. This means that only one RG is used for sequential workloads but random workload could be spread over several RGs. Now if disks are added to the storage pool those newly added RGs are the least utilized and will be used first when new LUNs are created. So a storage pool on VNXe can be considered more as a capacity pool than a performance pool.

Before I wrote this post I was in contact with EMC Technology Consultant (TC) and EMC vSpecialist to get my facts right. Both of them confirmed that the LUNs in VNXe pool are not striped across RGs. Pool structure was explained to me by the EMC TC. Looking at the test results that I posted on part 6 and also looking at the feedback that I got the description above is not accurate. Here is a quote from Brian Castelli’s (EMC employee) comment:

 “When provisioning storage elements, such as an iSCSI LUN, the VNXe will always stripe across as many RAID Groups as it can–up to a maximum of four.”

Based on Brian’s comment LUNs in VNXe pool are striped across multiple RGs. [Edited 9/15/2011]

Creating Storage Pools

Storage pools are configured and managed from System – Storage Pools. If no pools have been configured then Unconfigured Disk Pool is only shown.

Selecting Configure Disks will start disk configuration wizard and there are three options to select from: Automatically configure pools, Manually create a new pool, and Manually add disks to an existing pool. Quite easy to understand what each option stands for. I chose the Automatically configure pools option. When using the automatic configuration option 6+1 disk RAID 5 groups are used to create the pool.

Next step is to select how many disks are added to the new pool and you can see that the options are multiples of seven (6+1 RAID 5).

A hot spare pool will also be created when using the automatic pool configuration option.

When selecting Manually create a new pool there is a list of alternatives (see picture below) based on the desired purpose of the pool. This makes creating a storage pool easy because VNXe suggests the RAID level based on the selection that the user made. There is also an option further down on the wizard where the user can select the number of disks used and the RAID level (Balanced Perf/Cap R5 or High Performance R1/0).

Conclusions

It feels a little disappointing to find out that the pool structure wasn’t what I was expecting it to be. But maybe my expectations were also too high in the first place.

Creating a Storage Pool is in line with one of EMC’s definitions for VNXe: simple. When Automatic configuration option is selected Unisphere will take care of deciding what disks are used in the pool and what is the correct number of hot spares needed based on EMC’s best practices.

The next part will cover storage provisioning from VNXe and also using EMC’s VSI plug-in for vCenter.


VMworld – Long time no see


It is good to get to see you again this year my good friend! It has been two years since I last attended VMworld in Las Vegas and this is going to be my fourth VMworld (2006 LA, 2008 LV, 2009 SF and 2011 LV).

It’s been a hectic couple of weeks and I haven’t had time to prepare as thoroughly as I would have liked to. But I’m happy with the sessions that I added to the schedule weeks ago because now all of those are full. Although most if not all sessions will be recorded so if you attend the conference you can watch those recordings later. But if you have any questions about the topic or if you want to talk to the presenter after the session then it would be good to attend that specific session. If you are only planning to listen then you could just as well watch the recording.

Sessions 

Here is my list of sessions that I’m planning to attend:

  • VSP1628 VMware vSphere Clustering Q&A
  • VSP1926 Getting Started with VMware vSphere Design
  • VSP3205 Technology Preview: VMware vStorage APIs for VM and Application Granular Data
  • VSP1956 The VMware ESXi Quiz Show
  • VSP1425 Ask the Expert vBloggers
  • VSP1956 Protecting SMBs Using Site Recovery Manager 5.0 with VMware vSphere Replication
  • BCA1995 Design, Deploy, Optimized SQL Server on VMware ESXi 5
  • VSP1823 VMware Storage Distributed Resource Scheduler
  • VSP3116 VMware vSphere 5.0 Resource Management Deep Dive
  • VSP2384 Distributed Datacenters with Multiple vCenter Deployments: Best Practices

Networking, parties and meet ups

The whole conference is a really big networking opportunity. When you are not attending sessions you should walk around the expo floor and talk to people, ask questions and interact. This is your chance to meet people face to face and get your questions answered and maybe answer someone else’s questions too. I’m really looking forward to meeting new people, seeing old friends and I might bump into old colleagues too. If you see me wondering around I’m always up for tech talk. I’m also fairly easy to be recognized wearing my shirts:

When there is a conference there is a party/parties. Vendors are usually organizing customer appreciation parties but there are also some individuals or small groups that are doing the same. These are really good networking opportunities. Here is my party/meetup list:

Tips and links

Here are a few of links to good blog posts about VMworld happenings and tips. My advice is to wear shoes that you know are comfortable, so no new shoes. Pedometer is also nice to carry around if you want to track how much you walk during the week

See you at the VMworld!


%d bloggers like this: