Pure ELK Dashboards

Previously I blogged about getting PureELK installed with Docker in just a couple of minutes. After setting up your intitial array’s you may ask what is next?

Loading a preconfigured PureELK Dashboard

media_1450716163834.png

Click load saved dashboard and select one of the Pure Storage dashboards.

media_1452264626343.png

Remember there is a 2nd page page of Dashboards.

Pure Main Dashboard

media_1452264823321.png

Top 10 Volumes

media_1452264957200.png

Alert Audit

media_1452265011249.png

Max vs Average Pure Performance

media_1452265066174.png

Pure Space Analysis

media_1452265121910.png

Space Top – Bar Charts

media_1452265175795.png

Volume List View – Space and Performance

media_1452265229517.png

You can see there are several pre-made dashboards that you can take advantage of. What if you wanted to make your own Dashboard.

Create your Own Dashboard

media_1452265517171.png

To create your own Dashboard:
1. Click the Plus to Add a Visualization
2. Select a visualization
3. Once you have all of your Visualizations added you can click the Save icon and keep your new Dashboard for later use.

Some tips is you can resize and place the visualization anywhere you like on the dashboard. Just remember to click save. Also, You can can use the powerful seach feature to create tables of useful information that you are looking for.

 

Dynamic Cluster Pooling

Dynamic Cluster Pooling is an idea that Kevin Miller ( @captainstorage) and I came up with one day while we were just rapping out some ideas on the whiteboard. It is an incomplete idea, but may have the beginnings of something useful. The idea is that clusters can be dynamically sized depending on expected workload. Today a VMware Cluster is sized based on capacity estimates from something like VMware Capacity Planner. The problem is this method requires you apply a workload profile across all time periods or situations. What if only a couple days of the month require the full capacity of a cluster. Could those resources be used elsewhere the rest of the month?

Example Situation
Imagine a scenario with a Virtual Infrastructure with multiple clusters. Cluster “Gold” has 8 hosts. Cluster “Bronze” has 8 hosts. Gold is going to require additionally resources on the last day of the month to process reports from a database (or something like that). In order to provide additional resources to Gold we will take an ESX host away from the Bronze cluster. This allows us to deploy additional Virtual Machines to crunch through the process or allow less contention for the existing machines.

You don’t have to be a powercli guru to figure out how to vMotion all the machines off of a ESX host and place it in maintenance mode. Once the host is in maintenance mode it can be moved to the new cluster, removed from maintenance mode and VM’s can be redistributed by DRS.

Sample Code more to prove the concept:
#Connect to the vCenter
Connect-VIServer [vcenterserver]
#indentify the host, you should pass the host or hosts you want to vacate into a variable
Get-Cluster Cluster-Bronze | get-vmhost

#Find the least loaded host(skipping for now)

#Vmotion the machines to somewhere else in that cluster
Get-VMHost lab1.domain.local | Get-VM| Move-VM -Destination [some other host in the bronze cluster]

#Move the host
Set-VMHost lab1.domain.local -State Maintenance
Move-VMHost lab1.domain.local -Destination Cluster-Gold
Set-VMHost lab1.domain.local -State Connected

#Rebalance VM's
Get-DrsRecommendation -Cluster Cluster-Gold | Apply-DrsRecommendation

I was able to manually make this happen in our lab. Maybe if this sparks any interest someone that is good with “the code” can make this awesome.

All out of HA Slots

A few weeks a go I was moving a customer from an old set of ESX servers (not HA clustered) to a new infrastructure of Clustered ESX hosts. After building, testing and verifying the hosts we started moving the VM’s. It became apparent after a little while there were some resource issues. After just a few VM’s were moved an alert appeared that we could not start any new machines. I start looking at the cluster and there is plenty of extra Memory and CPU. Still nothing will start.
I say to myself, “Self, we have read about this before.” I thought back to this HA Deep Dive article by Duncan Epping.
Lets check the HA slots! (on a side note, if you use HA and have never read the Deep Dive, go do it now!)

media_1276972861425.png

As you can see here the slot size is rather giant. We have the largest CPU and Memory reservation plus some overhead (for simplicity) and that blows the size of the slot way up. I didn’t set the reservation, but surely they were there. 8GB of reserved memory. 4000MHz of CPU. Ouch. Where did that come from? It followed the VM from the old host to the new one. One of the reasons I was there was to setup a new cluster since the older ones were performing so slow on the local storage. It seems like someone tried to help some critical VM’s along the way by adding the reservations. I removed the reservations and had plenty of slots as you see below.

media_1276973677553.png

Yeah! I was able to power on another VM!

The new cluster blew away the old one. Went from older Xeon’s to 6 core Nehalem’s, from local disks to 48 disks of Equallogic Storage. The reservation was no longer needed.

Lessons:
1. Be careful with reservations, it can impact your failover capacity.
2. Reservations set on the machine will follow it to a new host.

Operational Readiness

One thing I am thinking about due to the VCDX application is operational readiness. What does it mean to pronounce this project or solution good-to-go? In my world it would be to test that each feature does exactly what it should be doing. Most commonly this will be failover testing, but could reach into any feature or be as big as DR plan that involves much more than the technical parts doing what they should. Some things I think need to be checked:

Resources

Are the CPU, Memory, Network and Storage doing what they should be? Some load generating programs like IOmeter can be fine to test network and storage performance. CPU busy programs can verify Resource Pools and DRS are behaving the way they should.

Failover

You have redundant links right? Start pulling cables. Do the links failover for Virtual Machines, Service Console, and iSCSI? How about the redundancy of the physical network, even more cable to pull! Also test that the storage controllers failover correctly. Also, I will make sure HA does what it is supposed to, instantly power off a host and make sure some test virtual machines start up somewhere else on the cluster.

Virtual Center Operations

Deploy new virtual machines, host and storage VMotion, deploy from a template, and clone a vm are all things we need to make sure are working. If this is a big enough deployment make sure the customer can use the deployment appliance if you are making use of one. Make sure the alarms send traps and emails too.

Storage Operations

Create new luns, test replication, test storage alarms and make sure the customer understands thin provisioning if it is in use. Make sure you are getting IO as designed from the Storage side. Making use of the SAN tools to be sure the storage is doing what it should.

Applications

You can verify that each application is working as intended within the virtual environment.

There must be something I am missing but the point is trying to test out everything so you can tell that this virtualization solution is ready to be used.

Ask Good Questions

This happened a long time ago. I arrived at a customer site to install View Desktop Manager (may have been version 2). This was before any cool VDI sizing tools like Liquidware Labs. I am installing ESX and VDM I casually ask, “What apps will you be running on this install?” The answer was, “Oh, web apps like youtube, flash and some shockwave stuff.” I thought “ah dang” in my best Mater voice. This was a case of two different organizations thinking someone else had gathered the proper information. Important details sometimes fall through the cracks. Since that day, I try to at least uncover most of this stuff before I show up on site.

Even though we have great assessment tools now, remember to ask some questions and get to know what is your customers end goal.

Things I learned that day. As related to VDI.

1. Know what your client is doing, “What apps are you going to use?”

2. Know where your client wants to do that thing from, “So, what kind of connection do you have to that remote office with 100+ users?”

This is not the full list of questions I would ask, just some I learned along the way.

VMware View – User Profile Options

All the technology and gadgets for managing desktops are worthless if your users complain about their experience with the desktop. Something I learned administering Citrix Presentation Server. Differing methods exist to keep the technical presentation of the desktop usable, for example the mouse being in sync and the right pixels show the right colors. What is also included in the user experience is a consistent environment where their personal data and settings are where they should be. Here are a few methods for managing those bits when using VMware View.

Mandatory Profiles
This profile is kept on the a central file share. The profile is copied to the machine on login, when the user logs out the changes are not kept. Great way to keep a consistent profile on kiosk type and data entry desktops. Where customization is not needed and most likely not wanted mandatory profiles are worth exploring. Main change is you set up the profile just like you want it then rename the NTUSER.dat to NTUSER.man. A lot exists on the internet about setting up man profiles.

Local Profiles
If you go through life never changing a thing in your Windows environment, you are using a Local Profile. Not to say you don’t change settings, save files or customize your background. You just have Windows running as the default. This is an option I will usually discourage because it is hard to backup data that is often kept in the local profile. VMware View will redirect user data to a User Data Disk (or whatever it is called today) on Persistent Desktop Pools. This is a good way to get the data on another VMDK. This introduces problems when looking at data recovery. There is solutions, but just something you will need to remember to look into.

Roaming Profiles
Roaming profiles is a great way to redirect current profiles to a central location. In theory this works great. In a View environment you can keep a local copy on a users desktop profile  and the changes are copied back and forth. I have often seen this work just great. Then from time to time, the profile will become corrupt, many times it does not unload correctly when users disconnect, or log out. Then you may have to pick through folders trying to find their “My Documents”. This is why I would suggest using this with Group Policy and Folder redirection which I will cover next.

Redirecting Folders
You may end up using a folder redirection group policy. This will move folders like the Desktop and My Documents for a user to a file server. This slims down the roaming profile as those locations are redirected to another location outside of the profile. This data is not copied from the machine to the server over and over. More information here.

Other Options
Immidio Flex Profiles
I really liked this option it was a way to combine mandatory profiles and a Roaming profile. This program would run some scripts on logon and log off to save files and settings. A really great paper on how to use it can be found here. Just like any great program that takes a new way to solve an annoying old problem, this is now not free.

RTO Virtual Profiles
I have never implemented this solution before. I have used it as part of a few training labs. I liked the feel. Now that VMware has purchased this software from RTO, the website redirects to a transition page. So I am looking for a way to test it in the lab, hoping the next set of bits of View includes RTO. Check this FAQ out for more information.

Maybe once it is built into View this will no longer be a serious issue. Profiles will be one of those things we tell stories to young padawan VM admins about, “We used to have to fight profiles, they were big and slow, and sometimes they would disappear!” Until that day…

Random Half Thoughts While Driving

So I often have epiphany teasers while driving long distances or stuck in traffic. I call them teasers because they are never fully developed ideas and often disappear into thoughts about passing cars, or yelling at the person on their cell phone going 15 MPH taking up 2 lanes.

Here is some I was able to save today (VMware related):

1. What if I DID want an HA cluster to be split in two different locations, Why?
2. Why must we over-subscribe iSCSI vmkernel ports to make the best use of the 1gbe phyical nics. Is it a just the software iSCSI in vSphere? Is just something that happens with IP storage? I should test that sometime…
3. If I had 10 GB nics I wouldn’t use them on Service Console or Vmotion that would be a waste. No wait, VMotion ports could use it to speed up  your VMotions.
4. Why do people use VLAN 1 for their production servers? Didnt’ their Momma teach em?
5.  People shouldn’t fear using extents, they are not that bad. No, maybe they are. Nah, I bet they are fine, how often does just 1 lun go down. What are the chances of it being the first lun in your extent? Ok maybe it happens a bunch. I am too scared to try it today.

VMware View and Xsigo

*Disclaimer – I work for a Xsigo and VMware partner.

I was in the VMware View Design and Best practices class a couple weeks ago. Much of the class is built on the VMware View Reference Architecture. The picture below is from that PDF.

It really struck me how many IO connections (Network or Storage) it would take to run this POD. Minimum (in my opinion) would be 6 cables per host with ten 8 host clusters that is 480 cables! Let’s say that 160 of those are 4 gb Fiberchannel and the other 320 are 1 gb ethernet. The is 640 gb for storage and 320 for network.

Xsigo currently uses 20 gb infiniband and best practice would be to use 2 cards per server. The same 80 servers in the above cluster would have 3200 gb of bandwidth available. Add in the flexibility and ease of management you get using virtual IO. The cost savings in the number director class fiber switches and datacenter switches you no longer need and the ROI I would think the pays for the Xsigo Directors. I don’t deal with pricing so this is pure contemplation. So I will stick with the technical benefits. Being in the datacenter I like any solution that makes provisioning servers easier, takes less cabling, and gives me unbelievable bandwidth.

So just in the way VMware changed the way we think about the datacenter. Virtual IO will once again change how we deal with our deployments.

iSCSI Connections on EqualLogic PS Series

Equallogic PS Series Design Considerations

VMware vSphere introduces support for multipathing for iSCSI. Equallogic released a recommended configuration for using MPIO with iSCSI.   I have a few observations after working with MPIO and iSCSI. The main lesson is know the capabilities of the storage before you go trying to see how man paths you can have with active IO.

  1. EqualLogic defines a host connection as 1 iSCSI path to a volume. At VMware Partner Exchange 2010 I was told by a Dell guy, “Yeah, gotta read those release notes!”
  2. EqualLogic limits the number of hosts in the to 128 per pool or 256 per group connections in the 4000 series (see table 1 for full breakdown) and to 512/2048 per pool/group connections in the 6000 series arrays.
  3. The EqualLogic MPIO recommendation mentioned above can consume many connections with just a few vSphere hosts.

I was under the false impression that by “hosts” we were talking about physical connections to the array. Especially since the datasheet says “Hosts Accessing PS series Group”. It actually means iSCSI connections to a volume. Therefore if you have 1 host with 128 volumes singly connected via 1 iSCSI path each, you are already at your limit (on the PS4000).

An example of how fast vSphere iSCSI MPIO (Round Robin) can consume available connections can be seen this this scenario. Five vSphere hosts with 2 network cards each on the iSCSI network. If we follow the whitepaper above we will create 4 vmkernel ports per host. Each vmkernel creates an additional connection per volume. Therefore if we have 10 300 GB volumes for datastores we already have 200 iSCSI connections to our Equallogic array. Really no problem for the 6000 series but the 4000 will start to drop connections. I have not even added the connections created by the vStorage API/VCB capable backup server. So here is a formula*:

N – number of hosts

V – number of vmkernel ports

T – number of targeted volumes

B – number of connections from the backup server

C – number of connections

(N * V * T) + B = C

Equallogic PS Series Array Connections (pool/group)
4000E 128/256
4000X 128/256
4000XV 128/256
6000E 512/2048
6000S 512/2048
6000X 512/2048
6000XV 512/2048
6010,6500,6510 Series 512/2048

Use multiple pools within the group in order to avoid dropped iSCSI connections and provide scalability. This reduces the number of spindles you are hitting with your IO. Using care to know the capacity of the array will help avoid big problems down the road.

*I have seen the connections actually be higher and I can only figure this is because the way EqualLogic does iSCSI redirection.

New VMware KB – zeroedthick or eagerzeroedthick

Due to the performance hit while zeroing mentioned in the Thin Provisioning Performance white paper this article in the VMware knowledge base could be of some good use.

I would suggest using eagerzeroedthick for any high IO tier 1 type of Virtual Machine. This can be done when creating the VMDK from the GUI by selecting the “Support Clustering Features such as Fault Tolerance” check box.

So go out and check your VMDK’s.