I previously posted about the limits on iSCSI connections when using Equallogic arrays and MPIO. If you have lots of Datastores and lots of ESX hosts with multiple paths the numbers of connections multiplies pretty quickly. Now with VAAI support in the Equallogic 5.02 firmware (hopefully no recalls this time), the number of Virtual Machines per Datastore is not important. Among other improvements, the entire VMFS volume will not lock. As I understand VAAI the only the blocks (or files maybe?) are locked when exclusive access is needed.
Lets look at the improvement when using fewer larger EQ volumes:
Old way (with 500GB Datastores for example):
8Hosts x 2(vmkernel connections) x 10(Datastores) = 160 connections (already too many for the smaller arrays, PS 4000).
VAAI (with 1.9 TB* Datastores)
8 Hosts x 2(vmkernel connections) x 3(Datastores) = 48 connections
The scalability for Equallogic is much better with VAAI when trying to stay under the connection limits.
*Limit for VMFS is 2TB minus 512B so 1.9TB works out nicely.
I am wondering the following.
Beginning with Equallogic version 5.0, the PS Series Array Firmware supports VMware vStorage APIs for Array Integration (VAAI) for VMware vSphere 4.1 and later. The following new ESX functions are supported:
•Harddware Assisted Locking – Provides an alternative meanns of protecting VMFS cluster file system metadata, improving the scalability of large ESX environments sharing datastores.
Does this means the previous Best Practice NO LONGER APPLY? ie, Best Practice suggesting limiting per volume to 500GB and putting maximum 20 VM per volume. and create MULTIPLE volume to gain more performance. So with VAAI and FW5.0.2, I can simply create just ONE big volume say 1.9TB and put as many as VM as I want? Is it true? or the old Best Practice still apply? (ie, multiple smaller volume 500GB each and max 20 VM on each volume)
Btw, I have a section on my blog dedicated for VMware and Equallogic. http://www.modelcar.hk/?cat=26 enjoy.
Thanks,
Jack
I think the old best practice has been slowly going away since the release of vSphere. VAAI has fully converted the way we should think about our storage. I am not the only one saying this. I would say try it. You have Storage vMotion. Create a 1.9 TB datastore with 50 VM’s. Monitor the queue depths on the SAN and the ESX server. If you have problems svMotion the machines back to the 500GB volumes. With VAAI the move will be faster.
Check this out.
http://virtualgeek.typepad.com/virtual_geek/2010/10/top-5-best-practices-5-exceptions-and-5-cool-things.html
I think it applies to anyone using a VAAI enabled SAN.
Jon,
Check this out.
http://www.modelcar.hk/?p=2912
Update: Official Answer from Equallogic
Good morning,
So, the question is does VMware’s ESX v4.1 VAAI API allow you to have one huge volume vs. the standard recommendation for more smaller volumes while still maintaining the same performance?
The answer is NO.
Reason: The same reasons that made it a good idea before, still remain. You are still bound by how SCSI works. Each volume has a negotiated command tag queue depth (CTQ). VAAI does nothing to mitigate this. Also, until every ESX server accessing that mega volume is upgraded to ESX v4.1, SCSI reservations will still be in effect. So periodically, one node will lock that one volume and ALL other nodes will have to wait their turn. Multiple volumes also allows you to be more flexible with our storage tiering capabilities. VMFS volumes, RDMs and storage direct volumes can be moved to the most appropriate RAID member.
i.e. you could storage pools with SAS, SATA or SSD drives, then place the volumes in their appropriate pool based on I/O requirements for that VM.
So do you mean if we are running ESX version 4.1 on all ESX hosts, then we can safely to use one big volume instead of several smaller ones from now on?
Re: 4.1. No. The same overall issue remains. When all ESX servers accessing a volume are at 4.1, then one previous bottleneck of SCSI reservation and only that issue is removed. All the other issues I mentioned still remain. Running one mega volume will not produce the best performance and long term will be the least flexible option possible. It would similar in concept to taking an eight lane highway down to one lane.
In order to fully remove the SCSI reservation, you need VAAI, so the combination of ESX v4.1 and array FW v5.0.2 or greater will be required.
As a side note, here’s an article which discusses how VMware uses SCSI reservations.
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1005009
Here’s a brief snippet from the KB.
There are two main categories of operation under which VMFS makes use of SCSI reservations.
The first category is for VMFS data-store level operations. These include opening, creating, resignaturing, and expanding/extending of VMFS data-store.
The second category involves acquisition of locks. These are locks related to VMFS specific meta-data (called cluster locks) and locks related to files (including directories). Operations in the second category occur much more frequently than operations in the first category. The following are examples of VMFS operations that require locking metadata:
* Creating a VMFS datastore
* Expanding a VMFS datastore onto additional extents
* Powering on a virtual machine
* Acquiring a lock on a file
* Creating or deleting a file
* Creating a template
* Deploying a virtual machine from a template
* Creating a new virtual machine
* Migrating a virtual machine with VMotion
* Growing a file, for example, a Snapshot file or a thin provisioned Virtual Disk
Follow these steps to resolve/mitigate potential sources of the reservation:
a.Try to serialize the operations of the shared LUNs, if possible, limit the number of operations on different hosts that require SCSI reservation at the same time.
b.Increase the number of LUNs and try to limit the number of ESX hosts accessing the same LUN.
c.Reduce the number snapshots as they cause a lot of SCSI reservations.
d.Do not schedule backups (VCB or console based) in parallel from the same LUN.
e.Try to reduce the number of virtual machines per LUN. See vSphere 4.0 Configuration Maximums and ESX 3.5 Configuration Maximums.
f.What targets are being used to access LUNs?
g.Check if you have the latest HBA firmware across all ESX hosts.
h.Is the ESX running the latest BIOS (avoid conflict with HBA drivers)?
i.Contact your SAN vendor for information on SP timeout values and performance settings and storage array firmware.
j.Turn off 3rd party agents (storage agents) and rpms not certified for ESX.
k.MSCS rdms (active node holds permanent reservation). For more information, see ESX servers hosting passive MSCS nodes report reservation conflicts during storage operations (1009287).
l.Ensure correct Host Mode setting on the SAN array.
m.LUNs removed from the system without rescanning can appear as locked.
n.When SPs fail to release the reservation, either the request did not come through (hardware, firmware, pathing problems) or 3rd party apps running on the service console did not send the release. Busy virtual machine operations are still holding the lock.
Note: Use of SATA disks is not recommended in high I/O configuration or when the above changes do not resolve the problem while SATA disks are used. (ie, USE SAS 10K or 15K or EVEN SSD should greatly help!)
In your official answer from EQL the letter refers to CTQ as a bottle neck. Mentioning that VAAI helps with snapshots but “All the other issues I mentioned still remain.” The only “other” issue mentioned is Queue Depth. In my comments to you I mentioned that you should monitor the queue depth and if there is a machine actually causing queuing move it to a dedicated datastore. That machine probably needs to be in a pool with a specific IO (spindle speed) workload as he mentioned anyways.
An updated post is coming.
It appears you have to be careful when doing the calc and you have a multiple arrays in the pool. Eg you might have 2 phy nics on the server connecting to the EQL pool, then for each volume in a 3 array pool, this becomes six connections.
So if you had 4 hosts with 2 nics, 40 volumes across three shelves in a pool, it becomes approx 4*2*3*40
= 960… so pretty much at limit at that point, and your backup has not kicked in.
We had some pretty good reasons to run with a lot of volumes, but this limit is a bit of a killer as we scale out a pool to even 3 shelves.