storage | CrashLoopBackoff

Easy Storage Monitoring – Setting Up PureELK with Docker

[UPDATE June 2016: Appears this works with Ubuntu only, maybe a debian flavor. I am hearing RHEL is problematic to get the dependencies working.]

I have blogged in the past about setting up vROPS (vCOPS) and Splunk to monitor a Pure Storage FlashArray using the REST API. Scripts and GETs and PUTs are fun and all but what if there was a simple tool you can install to have your own on site monitoring and analytics of your FlashArrays?

Enter Pure ELK. Some super awesome engineers back in Mountain View wrote this integration for Pure and ELK and packaged it an amazingly easy insatllation and released it on Github! Open Source and ready to go!
https://github.com/pureelk

and

http://github.com/pureelk/pureelk

Don’t know Docker? Cool we will install it for you. Don’t know Kibana or elasticsearch? Got you covered. One line in a fresh Ubuntu install (I used Ubuntu but I bet your favorite flavor will suffice).

go ahead and try:

curl -s https://raw.githubusercontent.com/pureelk/pureelk/master/pureelk.sh | bash -s install

(fixed url to reflect no longer in Dev)

This will download and install docker, setup all the dependencies for Pure ELK and let you know where to go from your browser to config your FlashArrays.

I had one small snag:

Connecting to the Docker Daemon!

My user was not in the right group to connect to docker the first time. The Docker install when it is not automated actually tells you to add your user to the “docker” group in order to

$sudo usermod -aG docker [username]

Log out and back in that did the trick. If you know a better way for the change to be recognized without logging out let me know in the comments.

I re-ran the install
curl -s https://raw.githubusercontent.com/pureelk/pureelk/dev/pureelk.sh | bash -s install

In about 4 minutes I was able to hit the management IP and start adding FlashArrays!

Quickly add all your FlashArrays

Click the giant orange PLUS button.

This is great if you have more than one FlashArray. If you only have one it still works. Everyone should have more Flash though right?

Fill in your FlashArray information. You can choose your time-to-live for the metrics and how often to pull data from the FlashArray

Success!

I added a couple of arrays for fun and then clicked “Go to Kibana”
I could have gone to
https://[server ip]:5601

Data Already Collecting

This is just the beginning. The next post I will share some of the pre-packaged dashboards and also some of the cusotmizations you can make in order to visualize all the Data PureELK is pulling from the REST API. Have fun with this free tool. It can be downloaded and setup in less than 10 minutes on a linux machine, 15 minutes if you need to build a new VM.

PureStorage + REST API + Splunk = Fun with Data about Data

A few months back I posted a powershell script to post Pure Storage data directly into VMware vCenter Operations Manager (now called vRealize Operations). Inspiration hit me like a brick when a big customer of mine said, “Do you have a plugin for Splunk?”

He already wrote some scripts in python to pull data from our REST API. He just said, “Sure wish I didn’t have to do this myself.” I took the hint. Now I am not a python person, so I did the best I could with the tools I have.
You will notice that the script is very similar to the one I wrote for vCOPS. That is because open REST API’s rock, if you don’t have one for your product you are wrong. 🙂

The formatting in WordPress ALWAYS breaks scripts when I paste them. So head over to GitHub and download the script today.
https://github.com/2vcps/post-rest2splunk/tree/master

Like before I schedule this as a task to run every 5 minutes. That seems to not explode the tiny Splunk VM I am running in VMware Fusion to test this out.

Dashboards. Check.

Some very basic Dashboards I created. I am not a Splunk ninja, perhaps you know one? I am sure people that have done this for a while can pull much better visuals out of this data.

Pivot Table

Stats from a Lab array some Averages computed by Splunk.

Gauge Report of Max Latency (that is micro seconds)

A 1000 of these is 1 millisecond 🙂 pretty nice.

From Wikipedia
A microsecond is an SI unit of time equal to one millionth (0.000001 or 10−6 or 1/1,000,000) of a second. Its symbol is μs. One microsecond is to one second as one second is to 11.574 days. A microsecond is equal to 1000 nanoseconds or 1/1,000 milliseconds.

Even if everything else didn’t help you at least you learned that today. Right?

The link to github again https://github.com/2vcps/post-rest2splunk/tree/master

Provision vSphere Datastores on Pure Storage Volumes with Powershell

A week or so ago our Pure Storage powershell guru Barks @themsftdude sent out some examples of using Powershell to get information via the Pure Storage REST API. My brain immediately started to think how we could combine this with PowerCLI to get a script to create the LUN on Pure and then the datastore on vSphere. So now provision away with Powershell! You know, if that is what you like to do. We also have a vCenter plugin if you like that better.

So now you can take this code and put it into a file New-PSDataStore.ps1

What we are doing:

1. Login to vCenter and the REST API for the Array.
2. Create the Volume on the Flash Array.
3. Place the new volume in the Hostgroup with your ESX cluster.
4. Rescan the host.
5. Create the new Datastore.

Required parameters:

-FlashArray The name of your array
-vCenter Name of your vCenter host
-vCluster Name of the cluster your hosts are in. If you don’t have clusters (what?) you will need to modify the script slightly.
-HostGroup The name of the hostgroup in the Pure Flash Array.
-VolumeName Name of the volume and datastore
-VolumeSize Size of the volume. This requires denoting the G for Gigabytes or T or Terabytes
-pureUser The Pure FlashArray username
-pureUser The Pure FlashArray password

[powershell]
# example usage
#.\new-PSdatastore.ps1 -FlashArray "Array" -vCenter "vcenter" -vCluster "clustername" -HostGroup "HostGroup" -VolumeName "NewVol" -VolumeSize 500G -pureUser pureuser -purePass purepass
#On the Volume Size parameter you must include the letter after the number I have tested <number>G for Gigabytes and <number>T for Terabytes
#Special thanks to Barkz www.themicrosoftdude.com @themsftdude for the kickstart on the API calls.
#Find me @jon_2vcps on the twitters. Please make this script better.
# If you do not have a stored PowerCLI credential you will be prompted for the vCenter credentials.
#Not an official supported Pure Storage product, use as you wish at your own risk.
#

Param(
[Parameter(Mandatory=$true)]
[ValidateNotNullOrEmpty()]
[string] $FlashArray,
[string] $VCenter,
[string] $vCluster,
[string] $HostGroup,
[string] $VolumeName,
[string] $VolumeSize,
[string] $pureUser,
[string] $purePass

)

Add-PSSnapin VMware.VimAutomation.Core

#cls
$vname=$VolumeName
$vSize=$VolumeSize
[System.Net.ServicePointManager]::ServerCertificateValidationCallback = { $true }
$FlashArrayName = $FlashArray
$vCenterServer = $VCenter
$esxHostGroup = $HostGroup
Connect-viserver -Server $vCenterServer

$workHost = get-vmhost -Location $vCluster | select-object -First 1

$AuthAction = @{
password = $purePass
username = $pureUser
}
$ApiToken = Invoke-RestMethod -Method Post -Uri "https://${FlashArrayName}/api/1.1/auth/apitoken" -Body $AuthAction

$SessionAction = @{
api_token = $ApiToken.api_token
}
Invoke-RestMethod -Method Post -Uri "https://${FlashArrayName}/api/1.1/auth/session" -Body $SessionAction -SessionVariable Session

Invoke-RestMethod -Method POST -Uri "https://${FlashArrayName}/api/1.1/volume/${vname}?size=${vSize}" -WebSession $Session
Invoke-RestMethod -Method POST -Uri "https://${FlashArrayName}/api/1.1/hgroup/${esxHostGroup}/volume/${vname}" -WebSession $Session
$volDetails = Invoke-RestMethod -Method GET -Uri "https://${FlashArrayName}/api/1.1/volume/${vname}" -WebSession $Session
$rescanHost = $workHost | Get-VMhostStorage -RescanAllHba
$volNAA = $volDetails.serial
$volNAA = $volNAA.substring(15)
$afterLUN = $workHost | Get-scsilun -CanonicalName "naa.624*${volNAA}"
New-Datastore -VMhost $workHost -Name $vname -Path $afterLUN -VMFS
[/powershell]

Virtual Storage Integrator 5.6 – What’s New

The Virtual Storage Integrator or VSI has been around for a while. Seems every release something new and exciting gets added that customer have asked for. The VSI 5.6 plugin for EMC is the latest version (9/13/2013) of the plugin to help streamline and simplify interactions between the vSphere client and the EMC storage used to support your Virtual Data Center/Private Cloud/Software Defined Data Center.

The VSI plugin can be downloaded for no extra charge if you have a current support.emc.com account (BTW so glad it is not powerlink anymore).

VSI Support and Downloads Page

You may just want to post a question on the EMC Community about the VSI. You can do that here.

Yeah community!

Enough background already what is new in the new version 5.6?

XtremIO Support

Awesome provisioning and visibility for the new all flash array from EMC. Ready now for the people with XtremIO and for the many waiting to get one. Coming soon!

Here is a quick demo of the XtremIO functionality. Select 720p for better viewing.

VPLEX Support

Our data mobility team is super excited about now supporting VPLEX provisioning in the VSI plugin. So now you are able to create the VPLEX datastores straight from the vSphere client. Very cool.
Update 9/23/13 Demo of VPLEX Provisioning with VSI

VMAX Provisioning with Striped Meta

We were all very excited when VMAX provisioning was added to the VSI plugin and now it is able to use the striped meta volume, which is a big deal for some VMAX users. This is an option now and you can select either method when provisioning to the VMAX.

Update 9/19/13 -> a demo from @drewtonnesen

Did you hear there is a new VNX?

The newest versions of the VNX are supported in VSI 5.6 and as you see in the slide the some of the coolest new features of the VNX will be available for use with the new VSI 5.6

I hope you are as excited about the newest release of the plugin. Remember that is supports vSphere 5.5 too!

If you have any questions please leave a comment of better yet start a thread on the community.

Some Reality for us Infrastructure Peeps or Apps are cool too

Don’t’ you just love double titles?

For many years I have been an infrastructure guy. I really liked how the cables, and processors and Memory and blinking lights worked. Applications were often the necessary evil tolerated so that I can play with cool technology. During my own journey toward learning about the cloud it becomes increasingly important to consider the function of the application. Six years ago me would totally punch me in the face right now. Traitor. J

1 – Don’t get your App messed up in my resource buckets of awesomeness

So the reality check to the Infrastructure geek in me is this: The application teams really think of what you do as the network. That is why when anything is ever wrong it is always “the network’s” fault. What we love to do is getting abstracted more and more. I will still contend that is very important and very hard to do. Whether you are building reference architectures or deploying a converged infrastructure appliance almost no one but us cares. They just want the data to do their jobs. So while we have really great discussions about speeds and feeds, the guy in the picture below just wants the app. From the hypervisor down we need to design with the application in mind or we will risk becoming like that goth dude locked in the server room on IT Crowd.

2 Honey badger don’t care about FCoE

My next post will get into what I have been researching regarding what is out there and hopefully help us (infra. peeps) understand our App/Dev brothers better.

You are probably an Infrastructure person if:

You read this blog.
You work mainly with Virtualization
Storage Admin
Network Admin
You like to make fun of DBA’s

Extents vs Storage DRS

I was meeting with a customer today and had to stop for a second when they said they were using 10 TB datastores in vSphere 4.1.

At first I was going through my head of maybe NFS? No they are an all block shop. Oh wait yeah, extents. They were using 2 TB -512 byte luns to create a giant Datastore. I asked, why? The answer was simple, “so we only manage one datastore.”

I responded with well check out Storage DRS in vSphere 5! It gives you that one point to manage and automatic placement across multiple datastores. Additionally you actually can find which VM lives where, and use Storage Maintenance mode to do storage related maintenance. Right now they are locked into using extents. If they change their datastores into a Cluster the gain flexibility while not losing the ease of management.

I wanted to use the opportunity to list some information I think about Extents with VMware.

Extents do not equal bad. Just have the right reason to use them, and running out of space is not one.
If you lose one extent you don’t lose everything, unless that one is the first extent.
VMware places blocks on extents in some sort of even fashion. It is not spill and fill. While not really load balancing you don’t kill just one lun at a time.

An extent with a datastore is like a stack of luns. Don’t knock out the bottom block!

Some points about Storage DRS.

Storage DRS places VMDK’s based on IO and Space metrics.
Storage DRS and SRM 5 don’t play nice, last time I checked (2/13/12).
Combine Storage DRS with Storage Policy and you have a really easy way to place and manage VM’s on the storage. Just set the policy and check if it is compliant.

A Storage DRS cluster is multiple datastores appearing as one.

Some links on the topics:

Some more information from VMware on Extents
More on Storage DRS (SDRS)

In conclusion, SDRS may be removing some of the last reasons to use an extent (getting multiple lun performance with single point of management). Add that to being able to have up to 64 TB Datastores with VMFS and using extents will become even rarer than before. Unless you have another reason? Post it in the comments!

Storage Caching vs Tiering Part 2

Recently I had the privilege of being a Tech Field Day Delegate. Tech Field Day is organized by Gestalt IT. If you want more detail on Tech Field Day visit right here. In interest of full disclosure the vendors we visit sponsor the event. The delegates are under no obligation to review good or bad the sponsoring companies.

After jumping in with a post last week on tierless caching I wanted to jump in with my thoughts on a second Tech Field Day vendor. Avere presented a very interesting and technical presentation. I appreciated being engaged on an engineering level and not a marketing pitch.

Avere tiers everything. It is essentially a scale out NAS solution (they called it a FXT Appliance) that can front end any existing NFS. Described to me by someone else as file acceleration. The Avere NAS stores data internally on a cluster of NAS units. The “paranoia meter” lets you set how often the mass storage device is updated. If you need more availability or speed you add Avere devices. If you need more disk space you add to your mass storage. In their benchmarking tests they basically used some drives connected to a CentOS machine running NFS front-ended by Avere’s NAS units. They were able to get the required IOPS at a fraction of the cost of NetApp or EMC.

The Avere Systems blog provides some good questions on Tiering.

The really good part of the presentation is how they write between the tiers. Everything is optimized for that particular type of media, SSD, SAS or SATA.
When I asked about NetApp’s statements about tiering (funny they were on the same day). Ron Bianchini responded, “that when you sell hammers, everything is a nail.” I believe him.

So how do we move past all the marketing speak to get down to the truth when it comes to Caching and Tiering. I am leaning toward thinking of any location where data lives for any period of time as a tier. I think a cache is a tier. Really fast cache for reads and writes is for sure a tier. Different kinds of disks are tiers. So I would say everyone has tiers. The value comes in when the storage vendor innovates and automates the movement and management of that data.

My questions/comments about Avere.

1. Slick technology. I would like to see it work in the enterprise over time. People might be scared because it is not one of the “big names”.
2. Having came from Spinnaker. Is the plan to go long term with Avere, or build something to be purchased by a big guy?
3. I would like to see how the methods used by the Avere FXT appliance can be applied to block storage. Plenty of slow inexpensive iSCSI products that would benefit from a device like this on the front end.

Storage Caching vs Tiering Part 1

The first place hosting the delegates was NetApp. I basically have worked with several different storage vendors but I must admit I have never experienced NetApp in any way before. Except for Storage vMotioning Virtual Machines from an old NetApp (I don’t even know the model) to a new SAN.

Among the 4 hours of slide shows I learned a ton. One great topic is Storage Caching vs Tiering. Some of the delegates have already blogged about the sessions here and here.

So I am going to give my super quick summary of Caching as I understood it from the NetApp session. Followed by a post about Tiering as I learned from one of our subsequent sessions from Avere.

1. Caching is superior to Tiering because Tiering requires too much management.
2. Caching outperforms tiering.
3. Tiering drives cost up.

The NetApp method is to use really quick Flash Memory to speed up the performance of the SAN. Their software attempts to predict what data will be read and keep that data available in the cache. This “front-ends” a giant pool of SATA drives. The cache cards provide the performance the the SATA drives provide a single large pool to manage. With a simplified management model and using just one type of big disk the cost is driven down.

My Take Away in Tierless-Caching

This is a solution that has a place and would work well for many situations. This is not the only solution. All in all the presentation was very good. The comparisons against tiering were really setup against a “straw-man”. A multi-device tiered solution requiring manual management off all the different storage tiers is of course a really hard solution. It could cost more to obtain and could be more expensive to manage. I asked about fully virtual automated tiering solutions. Solutions that manage your “tiers” as one big pool. These solutions would seem to solve the problem of managing tiers of disks, keeping the cost down. The question was somewhat deflected because these solutions will move data on a schedule. “How can I know when to move my data up to the top tier?” was the question posed by NetApp. Of course this is not exactly how a fully-automated tiering SAN works, but is a valid concern.

My Questions for the Smartguys:

1. How can the NetApp caching software choices be better/worse than software that makes tiering decisions from companies that have done this for several years?
2. If tiering is so bad, why does Compellent’s stock continue to rise in anticipation of an acquisition from someone big?
3. Would I really want to pay NetApp sized money to send my backups to a NetApp pool of SATA disks? Would I be better off with a more affordable SATA solution for Backup to Disk even if I have to spend slightly more time managing the device?

B.Y.O.P – The Alternative Vblock

In college I often would be invited to a get together that could often include the letters BYOB, Bring Your Own Beer. Sometimes a cookout would be BYOM, Bring Your Own Meat (or meat alternative for the vegetarians). So today I want to leverage this to push my new acronym B.Y.O.P. Bring Your Own Pod. Lately I have been seeing people talk about Vblocks. If I can venture a succinct definition a Vblock is a pre-configured set of Cisco, EMC and VMware products tested by super smart people, approved by these people to work together, then supported by these organizations as a single entity. Your reseller/solutions provider really should already be doing this very thing for you. You may choose to buy just the network piece, or the hypervisor but your partner should be able to verify a solution to work from end to end and provide unified support.

So You can’t call it BYOPCVCEP

Why not Vblock? This might get me blacklisted by the Elders of the vDiva council, but VCE doesn’t exist to make your life in the datacenter easier, they exist to sell you more VMware, Cisco and EMC. Vblock for sure simplifies your buying experience. I believe they are all great products and may very well do just what you need. Without competition though the only winner is VCE. Do not by forced into a box by the giant vendors. Find someone that can help determine your end goal, provide you vendor neutral analysis of the building blocks needed to achieve your end goal. Then provide the correct vendors and unified support to Build Your Own Pod.

So What is the Alternative Vblock

Originally I was going to draw up a sweet solution of 3par, Xsigo and Dell R610’s and say, “Hey everyone! This is some cool stuff. Try to quiet the overwhelmingly loud voice calling from VCE and give this Alternative Vblock a try.” As I thought more and more about it I think doing that is contrary to my main point. I would like more to provide the discussion points or some possible products among others that can be used to Build Your Own Pod. I am a firm believer in getting what is right for your datacenter needs. So here is a few links to help begin the discussion.

Xsigo and Pod – Jon Toor
3par and iBlocks – Marc Farley

Adaptive Queuing in ESX

While troubleshooting another issue a week or two ago I came across this VMware knowledge base article. Having spent most of the time with other brand arrays in the past, I thought this was a pretty cool solution verses just increasing the queue length of the HBA. I would recommend setting this on your 3par BEFORE you get QFULL problems. Additionally, Netapp has an implementation of this as well.

Be sure to read the note at the bottom especially:

If hosts running operating systems other than ESX are connected to array ports that are being accessed by ESX hosts, while the latter are configured to use the adaptive algorithm, make sure those operating systems use an adaptive queue depth algorithm as well or isolate them on different ports on the storage array.

I do need to dig deeper how this affects performance as the queue begins to fill, not sure if one method is better than another. Is this the new direction that many Storage Vendors will follow?

Until then, the best advice is to do what your storage vendor recommends, especially if they say it is critical.

Here is a quick run through for you.

In the vSphere Client

Select the ESX host and go to the configuration tab and click on the Advanced Settings under Software.

In the Advanced Settings

Select the option for Disk and scroll down to the QFullSampleSize and QFullThreshold.
Change the values to the 3par recommended values:
QFullSampleSize = 32
QFullThreshold = 4