Posts

Showing posts from 2016

Validating LUN Path Consistency via PowerCLI

One of my customers needed some help with making some zoning changes on their fiber switches after standing up a batch of new ESXi servers.  I already had a script to create 1:1 fiber channel zones on Brocade switches , so that part was easy, but zoning changes to an existing environment are a little scary.  As in, if you really mess it up, the storage is going to disappear and every VM is going to crash, scary.  Fortunately, you've got to really mess it up to cause an issue, and so this customer was willing to allow changes during business hours as long as we promised not to cause an outage ;) So, how can I enforce that promise?  Well, I've got my script to create accurate zones for the new hosts, but that's not really the dangerous part.  If that's messed up, it just means that the new hosts won't work... and since they're still being configured, they're obviously not in production yet.  The dangerous part is when you enable the new zones, in case you so

Memory Leak on the April HP ESXi Image

One of my customers had a whole collection of ESXi 6 hosts that were all installed from the April 2016 HP ESXi ISO image... and they hadn't been patched since then.  Well, one day they called me because their Splunk server had started sending out alerts from an alarm that we'd set up to monitor the ESXi hosts for memory leaks. So, I logged into one of the affected hosts to try and figure out what was going on.  After poking around in a bunch of logs and a fair amount of google work, I came across this article about a memory leak over at CPU Ready.  It sure looked promising.  So, I followed their instructions to check the version of the broadcom driver in one of the affected ESXi hosts and, sure enough, it was an older version.  Fortunately, HP has a fix available, so I just needed to get it installed on all of the ESXi hosts (since full on ESXi patching wasn't necessarily available, unfortunately). I needed some way to figure out exactly which hosts needed this new dri

Finding VMs with Duplicate MAC Addresses

At one of my customers' sites today, I saw an error message that I've not seen before: VM MAC Conflict.  "Well, that's certainly not good," I thought, as I poked around at the error message.  To my chagrin, I could only find that error message for a single VM in the environment, and that error message wouldn't tell me with which other VM it was conflicting.  So, I could only think of one way to figure out what was going on with this conflict: look at the MAC Address assigned to every NIC on every VM in the environment, and figure out what was causing the conflict.  Easy! No, really, it was easy.  Had I done it by hand, I would certainly have driven myself crazy, but PowerCLI made it nice and easy.  I just used this command: (get-vm | get-networkadapter | ? {$_.MacAddress -eq "<offending MAC Address>").parent Lo-and-behold, it returned 2 VMs.  One was the known VM that had flagged the error and the other was a powered-off VM.  Maybe that&#

Using Parallel Operations in PowerShell to Write a Port Scanner

Recently, I've written several scripts that need to perform relatively simple operations on a large set of objects (such as moving a bunch of VMs onto a given Port Group or reconfiguring NTP for a bunch of ESXi hosts).  In general, I approach these challenges by generating a list of all of the objects that I want to manipulate, and then I ForEach my way through that list until I've finished all of my work. This approach obviously works just fine; it's the way that we'e written scripts for ages.  Just as you might expect from something that's been done the same way for a long time (particularly something IT related...), that's not really the best way to do it any more.  With PowerShell version 3, Microsoft introduced the concept of Parallel operations.  Starting with PowerCLI 6, VMware changed PowerCLI to make it much easier to use with PowerShell Parallel operations. So, what is a parallel operation?  Well, a simple (and very practical!) example is that For

vCenter Server Appliance Crash due to Full /Storage/SEAT Partition

One of my customers recently had one of their vCenter 6 Server Appliances go offline.  The VM was still running and responding to pings, but the service wasn't working.  I established an SSH session to the server and went through the basics, and what do you know, "df -h" revealed that the /storage/seat partition was 100% full. Well, VMware has a fine KB Article about a full seat partition and how to solve it.  At least, mostly how to solve it.  The problem that I ran into is that the truncate commands (that free up space) were failing to run because there wasn't enough space on the partition.  When I tried to execute them, I got the following message: "ERROR: could not extend file ... No space left on device" "Hint: Check free disk space." I'll admit to chuckling when I saw the "hint" line.  So, I had to free up some disk space so that I could free up some disk space.  I did a bit of research into how to free up some space on

Adding vMotion VMKernel Interfaces En Masse

Here's a quick one-liner that I've been particularly pleased by.  I put this together when I was reconfiguring several c7000 chassis at a time and wanted to minimize the amount of typing that I was doing per host.  This is a big ugly one liner, but what it does is to create a new VMK interface on the specified Port Group, assign it a valid IP Address and enable vMotion for that interface.  The command performs those tasks on all ESXi hosts in the "unconfigured" folder. How does it get the valid IP Address?  Well, it takes the first 2 octets of the host's management IP address, uses a customized 3rd octet, then appends the last octet from the host's management IP address.  So, for example, if the host is managed at 192.168.1.101, the script will use 192.168.#.101 for the vMotion interface. In this example, it uses a port group called "vMotion" on a VDSwitch that ends with Intranet, and has a standard /24 network with 2 in the third octet for vMoti

Speeding Up your PowerCLI Scripts

So, I post a lot of scripts here, and I'm sure that you can see the progress that I've made as I've learnt more and more about PowerCLI, PowerShell and scripting in general.  One of the things that I've recently been considering is how to make my scripts run faster.  I've got some scripts that are designed to make lots of changes, like changing the Port Group assignment of every VM in an environment.  And they take a long time to run.  As in, depending on the size of the environment, several hours.  But, they don't necessarily have to take so long to run... I just wasn't clever enough when I wrote them. There are two major techniques that I'm trying to learn that would seriously speed up those scripts.  The first one is parallel execution of For Each loops... I'm still learning about that one, so will write more about it once I've learnt something worth sharing.  The other though, is much easier, and can generally be worked into any script with

ESXi Root Partition Full inode Table

One of my customers recently experienced a strange issue.  One of their ESXi hosts had entered a problem state where Storage vMotion and vMotion were failing for all VMs on the host (vMotion was failing at 13%, which is an interesting spot).  We initially noticed the issue when Storage vMotion repeatedly threw an error for one of their VMs: A general system error occurred: Failed to create journal file provider: Failed to open "/var/log/vmware/journal/..." for write: There is no space left on the device. Well, that error seemed self explanatory.  I connected to the host's CLI (SSH was already enabled, which made that easier) and did a "vdf -h" to look at its file system.  I was surprised to find that none of the partitions were full, so I dug deeper. I decided to take a look at the vmkwarning log file, which is frequently a gold mine when troubleshooting ESXi host issues.  So, I did a quick "tail /var/log/vmkwarning.log" and, lo and  behold, we h

Migrating from vCenter 5.5 on Windows to the vCenter 6 VCSA

Towards the start of the year, I made a post about my vCenter to vCenter migration process , which can be used to update from vCenter 5.5 to vCenter 6 (or for that matter, to downgrade, or whatever).  Since then, I've done a lot of these migrations and the scripts/procedure have matured.   Jeff and I have fixed many bugs, but the reason that I'm posting about this (again) is that I recently added a new feature and have polished our migration process. The latest versions of the migration scripts (the ones that get the settings from the source vCenter and then recreate those settings on the destination vCenter) now support DRS Rules!  The native support for DRS Rule manipulation through PowerCLI is a bit lacking... but fortunately, the community has solved that problem!   Matt Boren and Luc Dekens have created a module called DRSRule that is built to help with the reading and manipulation of DRS Rules (pretty self explanatory, right?).  I took advantage of that module to mo

Detecting and Grouping Ungrouped VMs

One of my customers uses DRS groups heavily.  The environment contains a set of specialized ESXi servers for a specific subset of machines and a set of general ESXi servers for everything else.  However, in order to maximize availability, all of these ESXi servers are part of one big cluster.  That means that every VM in their environment must be a member of at least one DRS group, placing the VM on either these specialized ESXi servers or the general ESXi servers.  Until we get an automated solution in place (probably through Orchestrator), we need to depend on the administrators to place the VM in the correct group (almost always the general group) upon machine creation.  As you might expect, that's not going too well. So, I put together a quick PowerCLI script to detect any VMs that are not grouped and (optionally) place them into the specified group.  This is what I came up with! The script is easy to use, but it requires a non-standard PowerCLI module.  To get it working,

Testing Virtual Machine Network Connectivity En-Masse

Last year, I wrote a post with a quick one-liner about how to ping all VMs on a given ESXi host .  Since then, I've been doing a lot of work with vSphere 6 upgrades, which has involved migrating many VMs between various switches.  As part of that process, I like to have an established basic test that I can run before and after the migration, so that I can record that the migration was successful. So, needless to say, I've expanded on that old one-liner a little bit... and now it's a full blown script.  I've built this script based around my migration validation use case, and so it has some very specific behaviors based around that.  It accepts four parameters, but really only two of them are required: -vmHost and -results.  As you certainly expect, -vmHost is the name of the host on which you want to ping all VMs and -results is a path to a file where the script will store its results (I just use a .txt file, myself). When the script is executed for the first time,

Import-ValidCSV Powershell Function

I often find myself working with CSV files when I'm writing PowerShell or PowerCLI scripts.  Of course, PowerShell has the native Import-CSV cmdlet , which works well... but it doesn't have much error checking.  After writing a bunch of script specific error checking, I've finally broken down and put together a function to do it for me.  Now, I can either add this function to my PowerShell profile, or just include it in any scripts that need it. The function is called Import-ValidCSV.  It basically just calls the normal Import-CSV cmdlet, but accepts a second arguement: -RequiredColumns.  The -ReqiredColumns argument is set up to accept an array of strings, listing all of the column headers that are required (duh!).  The function iterates through the list of RequiredColumns and checks that each column exists in the supplied CSV file.  If a column does not exist, it throws an error and quits.  If all columns exist (really, just if it doesn't throw an error and quit), i

Restarting VMs after a Datacenter Down Event

One of my customers recently had a catastrophic thermal event in one of their datacenters and so had to shut down all of their infrastructure at that site.  After the cooling issue was resolved, we were asked to help them to get their infrastructure back online.  Fortunately, we have included several small details as best practices in our vSphere designs, and one of those really paid off for us.  We always create a VM to Host affinity rule that keeps one Domain Controller, the vCenter server, its PSC and its Database (if external) on a known host in the management cluster. So, after the SAN was powered back on and we restarted the physical ESXi servers, we knew exactly what to do.  I fired up the vSphere client and logged into that ESXi server in the management cluster as root.  From there, I was able to easily find those core infrastructure VMs and powered them all on.  Once they were running, I logged into vCenter... and found that I had an interesting challenge. We needed to tur

Deploying VCSA via the CLI

I was recently deploying a series of VMware vCenter Server Appliances for a customer who wanted to migrate to that platform from their windows based vCenter 5.5 environment.  Rather than deploying all of these by hand, we figured that this was an excellent time to check out the VCSA command line install options. The first thing that we had to do was to figure out our architecture.  In this case, it was pretty easy.  Each site was getting a vCenter appliance and a PSC appliance.  We decided that we wanted our PSC appliances to replicate with each other, so that we could use the enhanced Linked Mode functionality in vSphere 6, and so designed a ring topology to reduce the impact of a given site being offline. The first thing that we had to do was to prepare some JSON answer files.  To do that, open your vCenter Server Appliance install ISO and browse to vcsa-cli-installer\templates\install and examine the bounty contained within. Copy the example .JSON file(s) that most closely m

Parsing HP CLI Output

I've been working on an environment audit for one of my customers (expect to see some of those scripts popping up in the not-too-distant future).  In addition to auditing ESXi host configurations, I've been looking at their HP C7000 Chassis configurations and comparing them against enterprise standards.  Since there are a lot of chassis in this environment, I've been leveraging scripts to collect data via plink. It's not always easy to parse data from an SSH session into PowerShell objects, but HP gave us a great tool in their CLI (way better than adjusting column size, which is what I was looking for when I came across this).  It's already  well documented , but it saved me so much time that I figured that it was worth mentioning anyway... the HP CLI commands support this great option: -output=script2.  There's actually a couple of "script#" options, but I'm particularly fond of #2.  So, what's it do?  Well, it turns this style of human frie

On the vCenter HTML 5 Web Client

I've been using the HTML 5 web client for the past couple of weeks now, and I wanted to post my impressions so far.  It's obviously not feature-complete yet, but most of the "day to day" functions are there.  I'm really impressed by what I've seen, it's snappy and intuitive; everything that we wanted the old web client to be. It kindof reminds me of the whole Windows 8 to 8.1 to 10 progression.  In Windows 8, Microsoft introduced a lot of new ideas... but they really changed the way people worked and they did so in ways people didn't want.  It was too new.  In 8.1, they scaled it back and then in 10 they scaled it back again, until I feel like Windows 10 has a nice balance of new features/philosophy while still respecting the way that we do our work. The Flash Client was too new; it was Windows 8.  We had completely new work flows that felt awkward and alien.  They weren't bad, but they didn't respect the way that we had all learnt to work.

App Volumes 2.10 vs. 3.0

One of my customers recently asked us to pilot App Volumes for them.  We decided to go with the latest and greatest, and so used App Volumes 3.0 ... and that's turned out to have been a bit of a mistake.  Unfortunately, it appears that the 3.0 release may have been a bit premature, as it has some issues.  Specifically, you can't delete an AppStack once created (without a phone call with VMware technical support)... which is particularly difficult, as there's no way to upgrade an AppStack, instead the work flow is to copy the existing one and modify the copy.  Also, and this may have been an issue that was site specific, but we found that whenever we restarted our Capture machine, our AppStack creation process failed. After a few weeks of wrestling with these issues, we decided to roll back to version 2.10 (ie. reinstall and go back to the older version).  2.10 worked much better for our purposes (we were able to capture applications that required reboots and could delete

Changing vSphere Clients

Well, we all knew that this was coming, but VMware has announced that, as of the next release of vSphere, the C# client will no longer be supported.  So, what can you do if you hate Flash?  Well, there's a bit of good news on that front: VMware is moving to an HTML5 web client, instead of the Flash web client that we currently use.  And, more good news, if you'd like to try it out today, VMware has released an HTML5 Client Fling !  It can't do everything yet, but it can do many of the most common administrative actions and will hopefully see the rapid addition of functionality. Just like all Flings, this is not supported by VMware.  That doesn't mean that using it will cancel your support or something, but it does mean that if you run into issues support is going to ask you to go back to the Flash client.  Since the HTML5 client Fling is available in addition to the Flash client, that should be rather painless. Of course, other management tools (like PowerCLI) will

Displaying a list of ESXi Hosts and their Syslog Configurations

One of my customers recently found that their ESXi hosts were not uniformly configured for syslog shipping, so asked me to help them audit their environment.  They have hundreds of ESXi hosts, so going through the advanced settings in the GUI for each of their hundreds of hosts didn't seem like the best approach... PowerCLI to the rescue! In this case, I put together a quick one-liner.  It gets all of the VMHosts, then generates a string in the format of "<ESXi Hostname>,<Configured SysLog Host>".  That string is then parsed by the convertfrom-csv cmdlet, which adds the "VMHost" and "SysLogServer" headers to the line.  That line is then output and the system moves on to the next host in the list.  When the command is completed, the system has generated a nice table with a "VMHost" column that is populated by ESXi hostnames and a "SysLogServer" column that is populated with their configured Syslog servers. Importantly

Replacing the PSC and/or vCenter in an Enhanced Linked Mode vSphere 6 Environment

One of my customers is planning on upgrading to vSphere 6.  They have many sites and require that each site be able to operate independently, but want them to be centrally managed under normal conditions.  So, they have a great use case for Enhanced Linked Mode! We're exploring different architectures and how to handle various situations.  One of the situations that we wanted to explore very thoroughly was the loss of a vCenter or PSC at one of the sites and how the recovery process might work.  To that end, we've set up three VCSA + PSC pairs in a single SSO domain and have been ripping them apart and generally abusing them. We've come up with a generally good procedure that covers, start to finish, how to decommission a PSC all the way to how to restore a replacement PSC/vCenter to full functionality.  We've come up with this process by combining the steps from a few VMware KB articles, and so I've decided to go ahead and cover it here in a single workflow.

Finding Unused Active Directory Accounts

One of my customers recently asked me for some help developing a script to search his Active Directory for user accounts that hadn't been used for more than 90 days.  He had already found that the get-aduser "LastLogon" parameter was domain controller specific, meaning that whichever DC is responding to the request will tell you when it last authenticated that user account.  Of course, since you have multiple DCs (you do, right!?), that isn't guaranteed to give you their actual last logon time. So, we put together a script that will get a list of all active AD accounts from a particular OU, then query each DC (filterable to a given site by DC name, if necessary) for each account's last logged in time.  Whichever DC returns the most recent last logged in date is the winner, and that date is stored.  At the end, the script returns a list of all users who haven't logged in to the network in X days (we used 90 days).  The script returns some basic info about the

ESXi 5.5 u3b Compatibility

We came across an unexpected situation recently at a customer site that I wanted to briefly discuss.  Typically, when applying ESXi host updates, you're safe to install whatever patches are available for your current version of ESXi.  The December 2015 patch, ESXi550-201512001, as described in this VMware blog entry , does not fall into that category.  You may have heard about the POODLE exploit; this patch updates the openSSL implementation and disables SSL version 3 in order to block that vulnerability. Blocking vulnerabilities is well and good, but this patch requires many vsphere administrators to modify their normal patching workflow.  If you just apply the available ESXi updates, your ESXi hosts will lose their connection to vCenter and enter an unmanaged state.  The proper way to apply this update is to update vCenter first, then apply updates to the ESXi hosts.  That workflow is well understood, when updating ESXi versions (say, from 5.1 to 5.5 or 5.5 to 6.0)... but as t

HA fails to restart a virtual machine error...

One of my customers recently had a host throw a PSOD.  It's a large environment with appropriate spare capacity, so that wasn't a major issue.  I mean, we never want a host to go down, but we designed the environment to accommodate that situation and it largely responded well... except for 2 VMs.  They errored out, with a message saying that "vSphere HA unsuccessfully failed over this virtual machine... ...Reason: An error occured during host configuration".  We found a related event that said, "Operation failed, diagnostics report: Failed to open file /vmfs/volumes/<Datastore UUID>/.dvsData/<DVS UUID>/<Port Number> Status (bad0003)= Not found".  True to the error message, that file was not there. VMware has a KB Article about this issue, but it's related to vSphere 5.0, when Storage vMotion fails to move a file that HA requires for tracking the VM on the Distributed vSwitch.  This customer is on vSphere 5.5 (and has been for a signi

vExpert 2016

We at ENS, Inc.  got some good news this year!   Jeff , Dennis and I were all awarded vExpert status for 2016!  I'm proud that this is my third year as a vExpert and am looking forward to posting more of the scripts and tips that have gotten me so far.

Migrating from one vCenter to Another, Improved

9/15/2016 Update:  I've updated these scripts with some new functionality.  I just posted a description of the new functionality as well as how to use the scripts in the context of a vCenter Upgrade . A couple years ago, I made a blog post about a process that I use to migrate from one vCenter to another , and then I later created some scripts  to facilitate that process.  Those scripts worked fine for me, but others have found some weaknesses.  Namely, they don't work if there are multiple folders in the inventory with the same name (such as VM\Project1\Production and VM\Project2\Production, where there are two folders named Production).  Also, I never thought about VMs in vApps, which don't technically have a parent folder. So, I've been working on updated scripts.  I'm a little hesitant to post these, as I haven't actually used them to perform a migration yet, but there seems to be a lot of people who need them.  I've put together two scripts again -

VDI Layering Naming and Change Control

A few of my customers have large teams managing their VDI and all of the desktop layers that make up those desktops.  One of them asked for help creating a policy to track changes to those layers, as they had run into trouble with accidentally rolling out desktops using a layer version that was still undergoing testing or not knowing what changes had been made on a specific version of a specific layer.  This is a little outside of the normal content that I like to post about on this blog, but we came up with a solid policy that has helped them to get their environment under control, and so I’m going to do my best to explain what we came up with here. The first important detail to the policy was a strong naming standard.  We decided that layer names should be based on two factors, a group specific notation and the name of the application (or set of applications) in the layer.  Most applications are not group specific and so do not need a group notation, but occasionally there will be

Remap Dedicated Desktops

One of my customers needed to do a complicated migration recently.  They wanted to move their View + Unidesk VDI environment from vCenter 5.1 on Windows to the vCenter Server Appliance running vCenter 6.  We moved the ESXi hosts and used them to carry the desktops over, just like we do with servers in this sort of situation, but the View stuff added to the complexity.  On the Unidesk side, there's a nice KB article that we followed, but there isn't such guidance for View. You may not have noticed it (I sure didn't until this migration), but on the list of desktop pools, there's a column that lists the vCenter server on which that pool was created.  It turns out that the pools are irrevocably associated with the vCenter server itself and so, when you migrate vCenter servers, you have to recreate all of your desktop pools.  This customer had about 2 dozen pools; an arduous task, but ultimately not all that time consuming... until you think about the ramifications of de