Storage Caching vs Tiering Part 2

Recently I had the privilege of being a Tech Field Day Delegate. Tech Field Day is organized by Gestalt IT. If you want more detail on Tech Field Day visit right here. In interest of full disclosure the vendors we visit sponsor the event. The delegates are under no obligation to review good or bad the sponsoring companies.

After jumping in with a post last week on tierless caching I wanted to jump in with my thoughts on a second Tech Field Day vendor. Avere presented a very interesting and technical presentation. I appreciated being engaged on an engineering level and not a marketing pitch.

Avere tiers everything. It is essentially a scale out NAS solution (they called it a FXT Appliance) that can front end any existing NFS. Described to me by someone else as file acceleration. The Avere NAS stores data internally on a cluster of NAS units. The “paranoia meter” lets you set how often the mass storage device is updated. If you need more availability or speed you add Avere devices. If you need more disk space you add to your mass storage. In their benchmarking tests they basically used some drives connected to a CentOS machine running NFS front-ended by Avere’s NAS units. They were able to get the required IOPS at a fraction of the cost of NetApp or EMC.

The Avere Systems blog provides some good questions on Tiering.

The really good part of the presentation is how they write between the tiers. Everything is optimized for that particular type of media, SSD, SAS or SATA.
When I asked about NetApp’s statements about tiering (funny they were on the same day). Ron Bianchini responded, “that when you sell hammers, everything is a nail.” I believe him.

So how do we move past all the marketing speak to get down to the truth when it comes to Caching and Tiering. I am leaning toward thinking of any location where data lives for any period of time as a tier. I think a cache is a tier. Really fast cache for reads and writes is for sure a tier. Different kinds of disks are tiers. So I would say everyone has tiers. The value comes in when the storage vendor innovates and automates the movement and management of that data.

My questions/comments about Avere.

1. Slick technology. I would like to see it work in the enterprise over time. People might be scared because it is not one of the “big names”.
2. Having came from Spinnaker. Is the plan to go long term with Avere, or build something to be purchased by a big guy?
3. I would like to see how the methods used by the Avere FXT appliance can be applied to block storage. Plenty of slow inexpensive iSCSI products that would benefit from a device like this on the front end.

Storage Caching vs Tiering Part 1

Recently I had the privilege of being a Tech Field Day Delegate. Tech Field Day is organized by Gestalt IT. If you want more detail on Tech Field Day visit right here. In interest of full disclosure the vendors we visit sponsor the event. The delegates are under no obligation to review good or bad the sponsoring companies.

The first place hosting the delegates was NetApp. I basically have worked with several different storage vendors but I must admit I have never experienced NetApp in any way before. Except for Storage vMotioning Virtual Machines from an old NetApp (I don’t even know the model) to a new SAN.

Among the 4 hours of slide shows I learned a ton. One great topic is Storage Caching vs Tiering. Some of the delegates have already blogged about the sessions here and here.

So I am going to give my super quick summary of Caching as I understood it from the NetApp session. Followed by a post about Tiering as I learned from one of our subsequent sessions from Avere.

1. Caching is superior to Tiering because Tiering requires too much management.
2. Caching outperforms tiering.
3. Tiering drives cost up.

The NetApp method is to use really quick Flash Memory to speed up the performance of the SAN. Their software attempts to predict what data will be read and keep that data available in the cache. This “front-ends” a giant pool of SATA drives. The cache cards provide the performance the the SATA drives provide a single large pool to manage. With a simplified management model and using just one type of big disk the cost is driven down.

My Take Away in Tierless-Caching

This is a solution that has a place and would work well for many situations. This is not the only solution. All in all the presentation was very good. The comparisons against tiering were really setup against a “straw-man”. A multi-device tiered solution requiring manual management off all the different storage tiers is of course a really hard solution. It could cost more to obtain and could be more expensive to manage. I asked about fully virtual automated tiering solutions. Solutions that manage your “tiers” as one big pool. These solutions would seem to solve the problem of managing tiers of disks, keeping the cost down. The question was somewhat deflected because these solutions will move data on a schedule. “How can I know when to move my data up to the top tier?” was the question posed by NetApp. Of course this is not exactly how a fully-automated tiering SAN works, but is a valid concern.

My Questions for the Smartguys:

1. How can the NetApp caching software choices be better/worse than software that makes tiering decisions from companies that have done this for several years?
2. If tiering is so bad, why does Compellent’s stock continue to rise in anticipation of an acquisition from someone big?
3. Would I really want to pay NetApp sized money to send my backups to a NetApp pool of SATA disks? Would I be better off with a more affordable SATA solution for Backup to Disk even if I have to spend slightly more time managing the device?

Fast Don’t Lie – Tech Field Day

Apologies to the new Adidas Basketball youtube campaign. I am going to steal their title for this post.

Time has flown by and it is now time to get going to Gestalt IT’s Tech Field Day. Thursday and Friday will be full of some pretty exciting companies. I have some familiarity with three of them: Solarwinds, NetApp and Intel. I am excited to get some in depth information from them though.

Then Aprius, Avere Systems, Actifio, and Asigra are companies I have never really heard anything about so it will be interesting to see what they do and see how it fits in to my perspective as a Virtualization dude.

For now I have one question on my list (I will come up with others), Is it Fast? Watch the videos, because when we talk about the cloud, Fast dont’ lie.

I’m Fast

I’m Fast 2

Fast Don’t Lie

Equallogic, VAAI and the Fear of Queues

Previously I posted on how using bigger VMFS volumes helps Equallogic reduce their scalability issues when it comes to total iSCSI connections. There was a comment about does this mean we can have a new best practice for VMFS size. I quickly said, “Yeah, make em big or go home.” I didn’t really say that but something like it. Since the commenter responded with a long response from Equallogic saying VAAI only fixes SCSI locks all the other issues with bigger datastores still remain. ALL the other issues being “Queue Depth.”

Here is my order of potential IO problems on with VMware on Equallogic:

  1. Being spindle bound. You have an awesome virtualized array that will send IO to every disk in the pool or group. Unlike some others you can take advantage of a lot of spindles. Even then, depending on the types of disks some IO workloads are going to use up all your potential IO.
    Solution(s): More spindles is always a good solution if you have unlimited budget. Not always practical. Put some planning into your deployment. Don’t just buy 17TB of SATA. Get some faster disk and break your Group into pools and separate the workloads into something better suited to the IO needs.
  2. Connection Limits. The next problem you will run into if you are not having IO problems is the total iSCSI connections. In an attempt to get all of the IO you can from your array you have multiple vmk ports using MPIO. This multiplies the connections very quickly. When you reach the limit, connections drop and bad things happen.
    Solution: The new 5.02 firmware increases the total maximum connections. Additionally, bigger datastores means less connections. Do the math.
  3. Queue Depth. There are queues everywhere, the SAN ports have queues. Each LUN has a queue. The HBA has a queue. I would need to defer to a this article by Frank Denneman (a much smarter guy than myself.) That balanced storage design is best course of action.
    Solution(s): Refer to problem 1. Properly designed storage is going to give you the best solution for any potential (even though unlikely) queue problems. In your great storage design, make room for monitoring. Equallogic gives you SANHQ USE IT!!! See how your front end queues are doing on all your ports. Use ESXTOP or RESXTOP to see how the queues look on the ESX host. Most of us will find that queues are not a problem when problem one is properly taken care of. If you still have a queuing problem then go ahead and make a new datastore. I would also request Equallogic (and others) release a Path Selection Policy plugin that uses a Least Queue Depth algorithm (or something smarter). That would help a lot.

So I will repeat my earlier statement that VAAI allows you to make bigger datastores and house more VM’s per store. I will add a caveat, if you have a particular application that needs a high IO workload, give it a datastore.

Gestalt IT – Tech Field Day

I am honored to be included in the upcoming Gestalt IT Field Day. Looks like a great group from the community will be in attendanc. I am looking forward to the collection of presenters. With how busy I have been delivering solutions lately it will be really good to dedicate some time to learning what is new and exciting. I plan to take good notes and share my thoughts here on the blog. For more information on the Field Day check it out right here: http://bit.ly/ITTFD4

Random picture of my dog.
Random picture of my dog.

How VAAI Helps Equallogic

I previously posted about the limits on iSCSI connections when using Equallogic arrays and MPIO. If you have lots of Datastores and lots of ESX hosts with multiple paths the numbers of connections multiplies pretty quickly. Now with VAAI support in the Equallogic 5.02 firmware (hopefully no recalls this time), the number of Virtual Machines per Datastore is not important. Among other improvements, the entire VMFS volume will not lock. As I understand VAAI the only the blocks (or files maybe?) are locked when exclusive access is needed.

Lets look at the improvement when using fewer larger EQ volumes:
Old way (with 500GB Datastores for example):
8Hosts x 2(vmkernel connections) x 10(Datastores) = 160 connections (already too many for the smaller arrays, PS 4000).

VAAI (with 1.9 TB* Datastores)
8 Hosts x 2(vmkernel connections) x 3(Datastores) = 48 connections

The scalability for Equallogic is much better with VAAI when trying to stay under the connection limits.

*Limit for VMFS is 2TB minus 512B so 1.9TB works out nicely.

Update Manager Problem after 4.1 Upgrade

A quick note to hopefully publicize a problem I had which I see is discussed in the VMware Community Forums already.

After building a new vCenter Server and Upgrading the vSphere 4.0 databases for vCenter and Update Manager. I noticed I could not scan hosts that were upgraded to 4.1. To be fair, by upgrading I mean rebuilt with a fresh install but with the exact same name and IP addresses. Seems that the process I took to upgrade has some kind of weird effect in the Update Manager Database. The scans fail almost immediately. I searched around the internet and found a couple of posts on the VMware Forums about the subject. One person was able to fix the problem by removing Update Manager and when reinstalling selecting the option to install a new database. I figured I didn’t have anything important in my UM database so I gave it a try and it worked like a champ.

Right now there is not any new patches for vSphere 4.1 but I have some Extension packages that need to be installed (Xsigo HCA Drivers). I wanted to note that I like the ability to upload extensions directly into Update Manager. This is a much cleaner process than loading the patches via the vMA for tracking and change control purposes.

ESXi 4.1 pNics Hard Coded to 1000 Full

I have recently made the transition to using ESXi for all customer installs. One thing I noticed was after installing with a couple different types of media (ISO and PXE install) the servers come up with the NIC’s hard coded to 1000 Full. I have always made it a practice to keep Gigabit Ethernet at auto-configure. I was told by a wise Cisco engineer many years ago that GigE and Auto/Auto is the way to go. You can also check the Internet for articles and best practices around using auto-configure with gigabit ethernet. Even the VMware “Health Analyzer” recommends using auto. So it is perplexing to me that ESXi 4.1 would start to default to hard set. Is it just me? Has anyone else noticed this behavior?

The only reason I make an issue is I was ready to call VMware support a couple weeks ago because nothing in a DRS/HA cluster just built with 4.1 would work. One vMotion would be successful, the next would fail. Editing settings on the hosts would fail miserably when done from the vSphere Client connected to vCenter. After changing all the pNics to auto everything worked just fine (matching the switches).

Hit me up in the comments or on twitter if you have noticed this.

Vote for the top VMware Blogs

The vSphere Land top 25 is up for vote once again. I am low on the list of bloggers, I just want to get close enough to see the shoes of the guy at #25. Like the picture I took in San Francisco during VMworld, I can barely see the top of the hill. Hey though, very excited to be on the ballot once again. Get on over and vote. Vote for me if you like the blog.

DSC04652.png

Here are a couple of my top blog posts from the last few months.
1. The mini ESXi 4 Portable Server
2. Storage IO Control An Idea
3. You Might be a vDiva if…
4. Adaptive Queuing in ESX

VMworld 2010 Recap – Five Session Highlights

I thought I would get more into posting my thoughts on each session. To be completely honest I was in some really good and really bad sessions. My goal was to find sessions that would potential benefit my day to day work. Not just a session where they talk about features we may or may not see in the next year. More of that knowledge came from doing the labs. Next year I will make more time to check all the labs out. I do not really learn well listening to someone speak anyways. I am more of a hands on learner.

I was go over how I would address the sessions I didn’t like. I think the best way to comment is to just say there were some sessions that were not helpful, at all. Others were really good. Therefore I wanted to list out five good lessons I learned in the VMworld 2010 Breakout sessions.

1. A common theme to me was the distributed virtual Switch (dvSwitch) is required to do anything advanced. This convinced me to push more into using the dvSwitch on deployments when possible. I figure more and more network features will be depending on the dvSwitch. Included features now available are: Network IO Control, Private VLANS (needed for cross Host network fences, and will be important for Cloud networking in vSphere an vCloud Director)

2. Innovation is coming to the Network. Converged networks from Xsigo and Cisco are just the beginning to virtualizing the network and I/O.

3. Doing VDI and having happy users is going to be harder than Server Virtualization.

4. VMware is working hard to get View deployments right. The View Benchmarking tool is going to help validate the deployments in order to provide scale. Hoping for good things here.

5. There are so many moving parts in a virtual datacenter solution. Architecture when it comes to VMware is basically knowing to account for everything involved. Seeing how the lab datacenter was put together was encouraging. Knowing even the rock star Architects at VMware have the same challenges as the everyday folk. They did a great job, because in my opinion the labs rocked.

I learned a great deal during VMworld. It was once again a great experience. At the same time I hope the words “deep dive” are not misused like they were this year. VMware did a great job this year and hopefully will do better next year. See you all at PEX 2011 in Orlando?