2017 vExpert

I'm proud to announce that I've been selected as a 2017 vExpert!  Thanks for the recognition and congrats to all of the other vExperts, particularly my coworkers Jeff and Dennis!

Invalid VDS PortID Preventing vMotion

One of my customers had an issue where a bunch of VMs were not able to vMotion, despite the hosts being configured correctly in all regards (other VMs using the same VDS Port Groups, for example, could vMotion onto and off of the host where these VMs were running).  When DRS (or an administrator) attempted a vMotion, a generic "A general system error occurred: vim.fault.NotFound" error message would be displayed.

When I took a look at these VMs, I noticed something interesting (besides the fact that they were all on the same host); their VDS Port numbers were universally high, like in the 5000s.  This was particularly interesting because when I looked at the VDS itself, the highest numbered port on it was 4378.  I supposed that these ephemeral ports had somehow been assigned invalid port numbers, which was causing vMotion to fail when the new destination was unable to reserve that invalid number on the VDS.  Interestingly, all of these VMs were communicating just fine on the…

PSODs and the iovDisableIR Setting

One of my customers recently came across an issue where their ESXi hosts were randomly crashing with a PSOD.  They had recently applied the latest SPP from HP and the latest ESXi 6.0 patches, and were now occasionally seeing these crashes with messages like "LINT1/NMI (motherboard nonmaskable interrupt), undiagnosed.  This may be a hardware problem..."

As the PSOD implied, they had called HP support for help, but weren't making much progress.  I did some googling and found a really interesting blog post from Jason Whitelock about a recent ESXi update causing HP servers to PSOD.  He had come across the exact same issue and had tracked it down to the value of the iovDisableIR setting, which had changed in this latest ESXi update.  When he set it back to its original setting, the PSOD issue went away.
As VMware explains it, Interrupt Remapping (the technology that's affected by this setting) enables more efficient IRQ routing and thus improves performance.  Unfortunatel…