Posts

Showing posts from 2017

Moving Clustered VMs with Shared Physical Mode RDMs

This is probably one of those articles that's only going to apply to a tiny percentage of people within an already miniscule niche subset of a small population... but I'm proud of my work and so am going to post it here anyway.  One of my customers needs to move a bunch of VMs off of one SAN onto another.  Storage vMotion for the win, end of story, right?  Yes*

* 99% of the problem is absolutely solved with Storage vMotion, but I'm not in the business of leaving 1% unfinished.  In this case, that 1% was a bunch of older SQL Servers, set up in 2 node pairs using Microsoft Clustering Services via shared Physical Mode RDMs.  Yikes.

In theory, this process isn't too bad.  Just record the vital statistics about the RDMs, then detach them from the VMs, move the VMs (during an outage window, obviously), create new RDMs using the recorded data, and power everything back up.  This process depends on the new-harddisk cmdlet... but given that I'm writing about it here, you…

How to use SSH and SCP with VCSA

I was replacing some vCenter Server Appliance (VCSA) self-signed certificates with signed certs from an Active Directory Certificate Authority and I came across a minor issue that I wanted to document here.  I was using the /usr/lib/vmware-vmca/bin/certificate-manager tool to generate the CSR, and then PSCP to download the CSR and hand it off to the security team.

When I first tried to use pscp to get the file, I encountered an error that I hadn't seen before:

Fatal: Received unexpected end-of-file from server

Some quick googling didn't turn up any hits on this issue, but I thought of something as I was poking around.  When I connected to the VCSA via SSH, it didn't drop me to a BASH shell until I did the usual "shell.set --enabled True" "shell" operation that it prompts you with.  Since PSCP (and SCP in general) is just establishing an SSH connection to the host and then doing a copy command, I figured that my issue was probably that the default root s…

Parsing Palo Alto Config XML into PowerShell Objects

One of my customers is converting into an NSX-based network design.  In order to facilitate this conversion, they need to understand the rules that exist on their Palo Alto firewall and then recreate those desired behaviors in the NSX microsegmentation.  Their challenge was that their Palo Alto had a fairly complex ruleset, one that no one wanted to try and recreate by hand in NSX.  I'm sure that you can see where this is going.

Before we could create anything in NSX (via the ever-evolving PowerNSX module), we had to understand the configuration of the existing firewall.  When I asked about exporting the configuration, the networking team told me that they had two options: JSON or XML.  Not knowing what I was likely to get working, I asked for them both, then tried convertfrom-JSON and import-clixml on the provided files.  Neither worked, so I had to do some digging.

After banging my head into a wall for a while, one of my coworkers gave me a copy of a script that he got from Palo…

PowerCLI's RunAsync Parameter Rocks!

I've recently been playing around with the -RunAsync parameter in some of my PowerCLI scripts, and I'm super impressed!  I'm also super late to the party; I mean, LucD was writing about it back in 2010, but still!  So, what's it do?  It speeds up tasks that don't need to be run sequentially, that's what it does.

For example, if I have a list of VMs that all need to move into a new folder, I could do it like this:

$folder = get-folder "New Folder" $vmNames = get-content MyList.txt foreach ($vmname in $vmNames){ get-vm $vmname | move-vm -destination $folder }
And that would move one VM, then the next, then the next, etc.  Depending on the number of VMs, it could take a real long time.  This process could take a while because, the way this script is written, the system will wait for each "move" to complete before initiating the next.  That's where -RunAsync comes in.

$folder = get-folder "New Folder" $vmNames = get-content MyLi…

Testing Many Suspected Root Passwords on Many vCenter Appliances

One of my customers ran into a situation where they had lost track of the root passwords for their vCenter and Platform Services Controller appliances.  As they logged into devices with expired passwords, they changed them, but they had lost track of which devices had had their passwords changed and which password each device was using.  Since there was a decent sized list of potential passwords and quite a few devices, I decided that we'd all be better served by writing a script to test them for us, rather than trying them all by hand.  Aside from the boredom that would come from running the tests by hand, I was concerned about human error introducing false negatives to our results.

Well, such a script is pretty trivial - I can just make an array of server names and an array of potential passwords, then nest a foreach inside of another foreach to try each password against each server.  And that's true, but then I got to thinking about security.  I really didn't want to ty…

Truth in PowerShell

PowerShell does a lot of hand holding for you, which generally makes using it really easy.  For example, the concept of "true" is very important when building logical structures in a script, and PowerShell does its best to help you out.  And it generally does a good job, but there are some details that you should probably be aware of.  The below are all true statements:
"true" -eq $TRUE"false" -eq $FALSE"false" -ne $TRUEif("true"){$TRUE}if("false"){$TRUE} So, what's happening with that last one, if we know that "false" does not -eq $TRUE?  Well, -eq is your buddy, and when you ask it if a string that reads "true" is equal to the boolean $TRUE condition, it says, "sure!".  Same thing, when you ask it if a string that reads "false" is a boolean $FALSE, it knows what you're asking and will tell you that it is indeed $FALSE.  It reads your string and figures that, if you're askin…

Transporting Custom VM Fields Between vCenter Servers

One of my customers recently migrated a bunch of VMs between two vCenter servers.  When they finished, they realized that none of their VMs' custom attributes followed the VMs into the new vCenter environment; apparently those attributes are owned by the vCenter server, rather than stored in the VMX file.  Fortunately, they had an export of their VM inventory that included those custom fields, but they had no idea how to repopulate that data across their horde of VMs.

You can see where this is going... and, as expected, I helped them put together a quick PowerCLI script that would set things straight.  This script takes 2 parameters as input: a CSV file that contains the data from the VM inventory (including each VM Name and columns for each of the custom attributes), and an array of which custom attributes need to be set (since that inventory export file is probably going to contain a lot of columns that aren't custom attributes).

Once it's executed, the script goes throu…

Finding All VMs with Multiple IPv4 Addresses

Here's a quick PowerCLI one-liner.  I recently had to find all of the VMs in a customer's environment that had multiple IPv4 IP Addresses assigned to them.  Here's the command that I ended up using:

Get-VM | Get-VMGuest | select VM, IPAddress | ? {($_.ipaddress = $_.ipaddress | ? {$_ -match '\.' -and !($_ -match '^169\.254\.')}).count -gt 1}

That guy's a little dense, so lets break it down.  Get-VM gives me a list of all VMs in the environment, which is piped to Get-VMGuest which returns the VM name and all IPs associated with it (looking at the VM's extensiondata.guest.ipaddress only shows the first address).  From there, we select the VM and its IP Address and pass that to a Where clause that has some logic.

That Where clause is going to return each VM that has more than 1 IPv4 address that is not an APIPA address.  It does that by finding only the IP Addresses that have a '.' in them (since IPv4 addresses use . delimiters whereas IPv6 add…

vRealize Network Insight Overview

"It's the network!" seems like the battle-cry for some server teams.  I come from a system administration background; I get it.  When all of your services are running and the event logs look clear, it must be something external to the system... which is either the network, or something is being talked to through the network.  I've seen so many server guys chant that mantra, toss the problem over the fence and then wash their hands of the whole situation that I want to scream.  That said, I've also been in plenty of situations where I've brought a problem to the network team, asking them for help when I can't find anything wrong with a system.

One reason I like to go to the network team when I have a seemingly intractable issue is perspective.  Within a single server, I can drill down deep and get a good idea about what the applications on that server are doing, but it's much more difficult to get a picture of a whole solution.  The network team, with …

VMware Logon Monitor

When rolling out a VDI solution (or really, anything that touches on the user experience), it's crucial to understand how the change might impact the users and to ensure that they are left with a good impression of the solution.  They say that first impressions are most lasting, and the first impression that your users are going to see (for most solutions) is logon time.  That means that it's crucial that your solution does not negatively impact logon times, as that will color the entire experience.  So, how do you accurately measure it?

Well, VMware released a Fling called Logon Monitor (and, it's now baked into the Horizon 7.2 agent).  It's a tool that's sole purpose is to measure the logon process and to report on what's happening during a user logon.  After it's installed, it logs (with excruciating detail) everything that occurs during a logon, storing the file in a default location of C:\ProgramData\VMware\VMware Logon Monitor\Logs

It creates a file f…

vSAN RAID Levels and Fault Domains

One of my customers is considering implementing vSAN, so I've been researching it quite a bit lately.  The interactions of vSAN RAID levels (for all-flash configurations) and Fault Domains is fairly complex, so I figured that I should post some notes about what I've learned here.

First, the concept of RAID is a little different in vSAN than it is in a traditional array.  Traditionally, RAID specifies the algorithm used to spread data (or parity data) across a set of disks.  For example, RAID 5 specifies that data will be striped across all of the disks in a set, with a single disk's capacity used for parity.  This means that a 3 disk RAID 5 set will store data on 66% of its disks' capacity.  A 5 disk RAID 5 set will store data on 80% of its disks' capacity.

vSAN treats RAID differently.  There are 3 different RAID types that vSAN supports: RAID 1, 5 and 6.  Like in a traditional array, these RAID levels describe the data redundancy algorithm used, but the members a…

Group Policy Loopback Processing on Windows Server 2012

Every now and then (especially in a VDI situation), I need to enable Group Policy Loopback Processing.  This Group Policy setting can do a lot of things; I usually use it to allow me to create Group Policy Objects that contain User Configuration settings that only apply when the users log into a certain subset of computers (such as my VDI desktops).  When that setting is enabled, it basically instructs windows to process its computer GPOs again at user logon, so as to catch any User Configurations that are specified.

This is a setting that I configure once for each VDI deployment that I do, and I always need to look up where it is (who bothers to memorize where specific settings are amongst the thousands of options!?).  No problem, that's literally what google was made for.  So, a quick search for Group Policy Loopback Processing is in order, which brings me to a technet article about Windows Server 2003 that calls the setting simply Loopback processing.  Well, Loopback Processin…

Finding Stale Brocade Zone Configurations

I recently wrote about a situation where I was creating a zoning configuration and had to figure out which fiber channel devices were active.  After we finished that project, we decided that we should go through and actually remove the inactive aliases and zones.  We had a list of active devices, so we were all ready to move forward and say "delete everything that isn't on this list!".  That'll work great, right?

Of course not; we needed the opposite.  "Delete everything that is on this list" is a far better instruction that is way less likely to lead to painful mistakes.  Even better is "run these commands to delete all of these unnecessary objects" and I know one good way to generate such a list of commands: a script (I feel like I'm developing a battle-cry...).

I put together a script that does a few basic things.  First, it uses the nsshow command to get a list of all of the active WWNs on a given Brocade fiber switch.  Then, it compares a…

Decommissioning Specific SAN Datastores En Masse

One of my customers recently purchased a new SAN, with the goal of decommissioning the old one.  They used Storage vMotion to migrate all of their VMs over to the new SAN and adjusted all of the ESXi hosts to put their scratch space on a new LUN, and were ready to proceed.

Many people, at this point, would just turn off the old SAN... and they might be ok.  Maybe.  At that point, the ESXi hosts are going to seriously freak out, because they just encountered an unexpected SAN failure... and we've all seen that sometimes, ESXi doesn't respond well to losing datastores unexpectedly.

So, the more cautious people would right click on each datastore and unmount it, then turn off the old SAN.  While not as bad as just turning off the SAN, the ESXi hosts still expect those LUNs to be there (even if they're no longer mounted as datastores) and can still run into issues.

Miss Manners insists that people follow the procedure detailed in KB 2004605.  That article includes a lot of im…

Getting Active Brocade Fiber Switch Aliases via PowerShell

A while back, I posted a quick script to create commands for 1:1 Zoning on a Brocade Fiber Switch.  I was recently helping someone go through that exact process on a set of switches that had a lot of aliases already defined on them.  Their challenge was that they weren't sure which aliases were for their current SAN vs. a retired SAN.  Rather than just creating zones for both SANs, I decided to put together a quick script that would scrape their current Aliases and check which ones have active WWNs currently in the system.  They could then use this information to prune the Aliases that are no longer needed, in addition to only creating the required zones for our project.

In order to do this task, I had to do some quick string parsing into PowerShell objects.  Good thing I already know how to do that ;)  So, I put together this script which does two things:
1) It parses the Brocade Fiber Switch's configuration to look for any Aliases
2) It checks the WWNs for those Aliases agai…

Nested Progress Bars in PowerShell

I've been working on some scripts lately and just learned about nested progress bars, which are really cool!  In fact, progress bars are a tool that I'm going to use far more often in my scripts, for a few reasons.  First though, let's talk about script output.  In my opinion, there are three basic types of output that a script generates: information as the result of the script, run-time errors, and information about the progress of the script.  We're going to ignore the run-time errors and just talk about the output that is generated by a successful script execution: information about what the script is doing right now, and information that the script has retrieved as a result of its actions.

In general, it's super helpful to be able to store the results of a script in a variable for future manipulation/archival/whatever.  I do this by using syntax like $a = ./myScript.ps1.  That will take whatever output the script generates and store it in the variable $a, which…

Parsing GPOs for Drive Mappings

One thing that we always have to do (and people often overlook) when planning a VDI project is to understand the user environment and how to gracefully recreate their current desktop environment on the virtual desktop.  This is a big challenge, as you can tell from the fact that there are so manytoolsavailable to solve it.

In my experience the best solution is usually a combination of purpose built tools, of Group Policy Objects, and of the occasional login script.  Before you can even start figuring out which combination of tools and techniques might be most appropriate, you need to understand what currently exists in the environment... and you need a fairly accurate picture of that.  If the environment is already sophisticated with heavy use of GPOs for drive mappings, printer mappings, and critical registry settings, transitioning into VDI will be far easier than if new desktops are configured by an IT guy walking over and making all of those things by hand.  Of course, most organi…

Port Mirroring by SPAN or RSPAN on an HP C7000 Blade

This is just a heads-up to hopefully save someone else a bit of time and pain... but the HP Virtual Connect doesn't support SPAN or RSPAN to mirror traffic from a physical device into the chassis to, for example, a Virtual Machine.  Basically, Port Mirroring, such as through SPAN or RSPAN, uses unicast to duplicate network traffic from a source port or ports onto a destination port.  This technique is useful for troubleshooting, in case you can't get a packet capture running on either end of a network flow, or for monitoring (as was our intended use case).

IANANG (I Am Not a Networking Guy), but my understanding is that the problem is due to the nature of a SPAN port and how those packets look to the Virtual Connect.  When Port Mirroring is configured, all traffic is duplicated and sent out to the Virtual Connect.  These packets are not changed in this process, keeping their original source and destination MAC addresses; the SPAN port is forwarding these packets to the VC desp…

PowerShell String Manipulation of Formatted Text in Columns

Every now and then, I find myself needing to use a utility like plink in order to interface with a system, such as a switch or a chassis, during a script.  If I'm just sending configuration commands (and am taking it on faith that they worked...), then it's nice and easy, but if I actually want to extract information from the device, then I've got a bit of a challenge, because those devices (via plink) are not going to give me back an object that PowerShell understands.

For example, if I use get-vm in PowerShell, I will get back a vm object that has a bunch of properties, which I can easily access using dot notation.  If I use plink to pull a brocade switch configuration, all I'm going to get back (from PowerShell's perspective) is a great big long string with lots of New Line characters, tabs and spaces.  So, how do I extract data from a formatted text string, in order to more easily work with it in PowerShell?  Well, there's a lot of different tricks availabl…

HP c7000 Chassis Administration Tips and Tricks

Several of my customers use HP C7000 Blade Chassis for their ESXi hosts.  I've picked up a few tips and tricks for working with that chassis over the years, so I figured that I'd post them here.

The Virtual Connect (the blade chassis's networking component) has a feature that can prevent pause frames from flooding a network by disconnecting a blade that is sending an excessive number of them.  Unfortunately, every now and then, it detects an ESXi host's uplink as sending such a number of pause frames and so disconnects that network adapter.  Fortunately, it's really easy to allow traffic to flow through that port once again.  Just SSH into the Virtual Connect (you can get the address by looking at the "Virtual Connect Manager" link in the Onboard Administrator interface.  Once you're connected, use the show port-protect command to see if there are any ports that are in a blocking state.  If so, you can use the reset port-protect command to reset the p…

Checking Distributed Switch PNICs for Invalid VLAN Traffic

4/26/17 Update: I changed this script so that it no longer uses the min/max VLAN numbers and instead discovers a list of valid VLANs based on the Port Groups that are defined on the VDS.  It then alerts if it sees any VLANs that are not in that list.

One of my customers has several physical uplinks going into their ESXi hosts, each carrying different sets of VLANs.  They recently had an issue where an uplink with one set of VLANs was accidentally attached to a VDS that was configured for the other set of VLANs.  This wasn't a catastrophic issue, as the VDS didn't have port groups defined for those invalid VLANs and so any traffic was dropped into the bit bucket, but it did mean that 1 of the links going into that switch was useless.

After we corrected the issue, we decided that we should audit the environment to see if this problem had occurred anywhere else but not been detected.  We decided that the best way to perform an initial scan of the environment would be to leverage …

Getting VM EVC Mode Requirements via PowerCLI

One of my customers was preparing to do some major ESXi host reconfiguration and so needed to shift VM workload from one cluster to another.  They had a challenge in that their clusters were running with different EVC modes, and they wanted to move VMs from the newer cluster to the older cluster.  "Impossible!" the strawman says, "it can't be done!"

Well, yes and no.  That's absolutely correct that you can't vMotion a VM that powered up on an Ivy Bridge CPU back onto an ESXi host with a Sandy Bridge processor.  The reason for this is that the VM, during its power on operation, scans the CPU of its host for a list of CPU features that are available and begins potentially using those features, which means that it can't be moved to a processor that doesn't have those features.  The VM, in effect, inherits its host's EVC Mode for the lifespan of this power cycle.  Until the VM goes through a complete new power cycle (not a reboot from within the…

2017 vExpert

I'm proud to announce that I've been selected as a 2017 vExpert!  Thanks for the recognition and congrats to all of the other vExperts, particularly my coworkers Jeff and Dennis!

Invalid VDS PortID Preventing vMotion

One of my customers had an issue where a bunch of VMs were not able to vMotion, despite the hosts being configured correctly in all regards (other VMs using the same VDS Port Groups, for example, could vMotion onto and off of the host where these VMs were running).  When DRS (or an administrator) attempted a vMotion, a generic "A general system error occurred: vim.fault.NotFound" error message would be displayed.

When I took a look at these VMs, I noticed something interesting (besides the fact that they were all on the same host); their VDS Port numbers were universally high, like in the 5000s.  This was particularly interesting because when I looked at the VDS itself, the highest numbered port on it was 4378.  I supposed that these ephemeral ports had somehow been assigned invalid port numbers, which was causing vMotion to fail when the new destination was unable to reserve that invalid number on the VDS.  Interestingly, all of these VMs were communicating just fine on the…

PSODs and the iovDisableIR Setting

One of my customers recently came across an issue where their ESXi hosts were randomly crashing with a PSOD.  They had recently applied the latest SPP from HP and the latest ESXi 6.0 patches, and were now occasionally seeing these crashes with messages like "LINT1/NMI (motherboard nonmaskable interrupt), undiagnosed.  This may be a hardware problem..."

As the PSOD implied, they had called HP support for help, but weren't making much progress.  I did some googling and found a really interesting blog post from Jason Whitelock about a recent ESXi update causing HP servers to PSOD.  He had come across the exact same issue and had tracked it down to the value of the iovDisableIR setting, which had changed in this latest ESXi update.  When he set it back to its original setting, the PSOD issue went away.
As VMware explains it, Interrupt Remapping (the technology that's affected by this setting) enables more efficient IRQ routing and thus improves performance.  Unfortunatel…

Using PowerCLI to get a Datastore from an NAA ID

This is just a quickie (mainly for my own notes in the future): if you ever need an easy way to figure out which datastore is being referenced by a given naa number (like if you're troubleshooting datastore access issues and the logs all reference that ID type), you can use this command to search it out:

get-datastore | ? {$_.ExtensionData.Info.Vmfs.Extent.Diskname -match "NAA NUMBER"}

Creating VICredentialStore Items without Typing Your Password into the Command Line

I use PowerCLI a lot.  Like, when VMware said to stop using the C# client, I just started using PowerCLI instead of learning the Flash based web client.  As such, I log into many vCenter servers many times each day, and creating a VICredentialStore item for each vCenter that I use is one trick that saves me a lot of typing and therefore time.

The New-VICredentialStoreItem cmdlet is super easy to use, which creates these credential store items.  Once you have an item created, those credentials get used automatically when you connect to a vCenter server, making the logon faster and easier.  To use it, just follow this syntax:

New-VICredentialStoreItem -Host vCenterServer -User JColeman -Password SuperSecretPassword

And there you go, next time you use connect-viserver vCenterServer, it will automatically pass JColeman as the username and SuperSecretPassword as the password.

Of course, no one ever wants to do this.  Who in their right mind would want to type their password, in plain text…