Tag Archives: Statistics

Hidden VNXe performance statistics revised


In my previous post covering the VNXe hidden statistics I explained where to find the “hidden” statistics files and how to extract the data into usable format. Now it seems that EMC has changed the statistics gathering interval from 5 minutes to 30 minutes. I started playing around with the new data and created a spreadsheet template that generates graphs for IOps and MB/s for the past 2 months, 1 month, 2 weeks, 1 week and 24 hours. In this post I will share the template and also explain how to use it.

Exporting data from SQLite database and importing it into the spreadsheet

The first two steps are explained in more detail in my previous post

  • Get stats_basic_summary.db and old.stats_basic_summary.db files from VNXe
  • Export data from those files to stats_basic_summary.txt and old.stats_basic_summary.txt files.
  • Download the spreadsheet template from here. (I have tested it with OpenOffice and Microsoft Excel and it works better with Excel. I was planning to use Google Docs but there is just too much data on the spreadsheet so it didn’t work.)
  • Import stats_basic_summary.txt content into the spreadsheets stats_basic_summary.db sheet starting from A1 and using Delimited data type and tab + comma delimiters

  • Import old.stats_basic_summary.txt content into the spreadsheets old.stats_basic_summary.db sheet starting from A1 and using Delimited data type and tab + comma delimiters

If you now go to the “2 months” sheet the graphs might look like this:

Removing zeros from the statistics

Sometimes VNXe seems to fail updating the statistics to the database and only zeros are added. When using the data without taking the zeros out it will produce graphs as shown above. Rows 43-2772 on the imported sheets are used for the statistics and it is important to find all zero rows to get the graphs working properly.

Data from the previous row should be copied to replace the zeros from column D onward. All the statistics values seem to be running numbers and for the graphs the latter value is deducted from the previous value. So replacing the zeros with the value from previous row will make the particular timestamp to be zero on the graph.

After all the zeros are replaced in most cases the statistics and graphs will show the correct values.

Performance counters reset during the data gathering

If the statistics and graphs are still not showing the correct values after removing the zeros from the data the issue might be that the performance counters were reset during the data gathering period. When this happens there might be a row of zeros before the reset.

To fix this the zero row should be replaced as described above and the row below the zeros should be deleted.

Disclaimer

This method for gathering and presenting the statistics is not approved or confirmed by EMC. This is something I have found and it seems to work in the environments I work with. So the statistics might not be accurate.


Hidden VNXe performance statistics


The latest Operating Environment upgrades have already brought some improvements to the statistics that are shown through the Unisphere GUI. The first VNXe OE that I worked with was showing only CPU statistics. Then along with update 2.1.0 Network Activity and Volume Activity statistics came available. I was still hoping to get some more statistics. IOps and latency graphs would have been nice additions. So I did some digging and found out that there is actually lots of statistics parameters that VNXe gathers but those are just stored in the database, maybe for support purposes.

Where is the data stored?

When logging in to the VNXe via SSH using service account and listing the content of the folder /EMC/backend/perf_stats you will see that there are several db-files in that folder.

Now when opening the file with notepad it is quite clear what kind of databases those are:

How to read the data?

Now that we know that the data is stored in SQLite database the next thing is to export the data to readable format. To do this SQLite shell is needed. SQLite is really simple to use, just download shell and run a couple of commands.

To open the database, to select the output file and to export all the data can all be done with using only three commands:

Now all the content of the database is exported to stats_basic_summary.txt. Data can now be imported to spreadsheet or to another database.

What data is stored in the databases?

Actually there is a lot of parameters and data in those databases. Here is just few of the parameters.

DART parameters in stats_basic_default.db:

SysClockUnixms
NetBasicBytesIn
NetBasicBytesOut
NetInPackets
NetOutPackets
TCPInPackets
TCPOutPackets
UDPInPackets
UDPOutPackets
StoreReadBytes
StoreWriteBytes
StoreReadRequests
StoreWriteRequests

DART parameters in stats_basic_summary.db:

NetBasicBytesIn
NetBasicBytesOut
NetInPackets
NetOutPackets
TCPInPackets
TCPOutPackets
UDPInPackets
UDPOutPackets
StoreWriteBytes
StoreReadBytes
StoreReadRequests
StoreWriteRequests
KernelBufCacheHits
kernelBufCacheLookups
CifsActiveConnections
CifsTotalConnections
CifsBasicReadBytes
CifsBasicReadOpCount
CifsBasicWriteBytes
CifsBasicWriteOpCount
FsDnlcHits
FsDnlctotal
FsOfCachehits
FsOfCachetotal
NfsActiveConnections
NfsBasicReadBytes
NfsBasicReadOpCount
NfsBasicWriteBytes
NfsBasicWriteOpCount
iSCSIBasicReads
iSCSIReadBytes
iSCSIBasicWrites
iSCSIWriteBytes

FLARE_SP parameters in stats_basic_summary.db:

HardErrorCount
HighWaterMarkFlushOff
IdleFlushOn
LowWaterMarkFlushOff
writeCacheFlushes
writeCacheBlocksFlushed
ReadHitRatio
SPTimestamp
SumOfQueueLengths
arrivalsToNonzeroQueue
SumOfLUNBlkRead
SumOfLUNBlkWrite
SumOfLUNDiskRead
SumOfLUNDIskWrite
SumOfLUNDiskBlkRead
SumOfLUNDiskBlkWrite
SumOfFRUBlkRead
SumOfFRUBlkWrite
SumOfFRUReadCount
SumOfFRUWriteCount

How can that data be used?

I take the StoreReadRequests parameter from stats_basic_default.db as an example. Some of the parameters have descriptions and this is one of those:

Total number of read requests on all DART volumes

Here is the format that the data is in after imported to spreadsheet:

There is a time stamp and also a value for the StoreReadRequests. It seems that the number of read requests that were recorded during the five minute period is added to the old value and then inserted as a new entry to the database. So basically subtracting the the earlier value from the new one we get the total number of read requests for all DART volumes for the specific five minute period of time:

4267021177 – 4266973002 = 48175

Now if we divide that result with 300 (seconds) we get the average number of read requests on all DART volumes per second during the specific five minute period:

48175 / 300 = 160.58

With some spreadsheet magic it is easy to create a nice “requests per second” graph from the data:

How can I be sure that my theory is correct?

Well, NetBasicBytesIn and NetBasicBytesOut parameter values in the stats_basic_default.db are also growing with every time stamp. These are also defined in the database: Total Bytes DART received/sent from all NICs. So I used the same math to do a graph showing network statistics for the past 24 hours. I then compared that graph with the Unisphere’s network activity graph and those were matching.

The graph that I put together using the values from the database and the formula  introduced earlier:

Unisphere network activity graph:

Conclusions

I really hope that EMC will bring more statistics to the GUI or introduce a way to export the data to readable format a bit easier. From what I’ve heard Clint Kitson from EMC has already wrote some scripts for pulling the stats from VNXe but it is not yet published for the customers. Digging into the databases is kind of a quick and dirty way to get more statistics out of the VNXe, but it seems to be working.


ESX 4.x losing statistics


Just noticed some weird behaviour on ESXi 4.1 U1 (build 348481) statistics while rescanning datastores. Wanted to share my findings and maybe also find an explanation or a fix for this problem.

I started “rescan for datastores” on a cluster that has two ESXi’s in it and noticed that while the scan was on the realtime statistics wasn’t showing any graph. Same issue with both hosts when using Virtual Center and also when connected directly to the ESXi. Also no graph on VM’s during the rescan. I did some tests with ESX/ESXi 4.1 (build 260247) using different storages and got the problem reproduced on those too.

Gaps on the graph while rescanning:

I also noticed that all of ESXi’s CPU was consumed during the rescan.

During the scan:

After the scan:

One host was showing zero CPU usage while rescanning:

Don’t know if this is a bug or what but I hope I can find a fix for this soon.

Has anyone seen this happen before?


%d bloggers like this: