The Nutanix deployment with GPU cards installed is no different than without, you still go thru the process of imaging the nodes with Foundation just like you’d do without GPU cards. In this case, each site was configured with 2 Nutanix clusters, one for Server VMs and a second cluster specific to VDI. The VDI cluster was configured in a 3 node cluster, using the NX-3055-G5 nodes, running Horizon View 7.2.0 specifically.
I’ll touch on some details of the M60 card below, and then get into some of the places where I had a few issues with the deployment and how I fixed them, and finally some Host/VM configuration and validation commands.
NVIDIA M60 Card and Requirements
The installed M60 cards from NVIDIA are workhorses, and the 3055-G5 nodes were installed with 2 cards each, and we used the M60-1B profile to maximize density but still provide the application performance required.
As you can see the M60 card provides many different profiles that can be used based on the type of end user, and this can vary from VM to VM and Desktop Pool to Desktop Pool.
There was a few items in regards to deployment with the M60 card (and M10) for that matter that differ from the older K1 cards from NVIDIA.
First is that the M60 card also requires that users connecting to VMs with vGPU capabilities obtain a license. Accompanying the M60 card in an installation is the NVIDIA License Server, which can be run on Windows or Linux and is a low usage VM that runs the NVIDIA licensing component. Within the VDI VMs the NVIDIA Control Panel software is configured to look at the License Server (in my case, a backup License Server was also installed to provide a Highly-Available solution for licensing). Downloading the License Server software and getting it installed and configured on a VM is pretty easy, nothing unexpected there.
The second deployment item is that the M60 by default comes in 2 different operating modes; compute mode and graphics mode. The different modes are provided for different configuration options, and the mode of the GPU is established at Power-on. Graphics mode is what’s typically used in GPU scenarios where graphics are considered a high requirement, as opposed to using the compute mode in certain HPC scenarios. Per NVIDIA, Graphics Mode should be used in the following scenarios:
- GPU passthrough with hypervisors that do not support large BARs. At the time of publication, this includes Citrix XenServer 6.2, 6.5, VMware ESXi 5.1, 5.5, 6.0, Red Hat Enterprise Linux 7.0, 7.1.
- GPU passthrough to Windows VMs on Xen and KVM hypervisors.
- GRID Virtual GPU deployments.
- VMware vSGA deployments.
In order to change the operating mode of the M60 card, NVIDIA provides a utility called gpumodeswitch that will update the M60 card from Compute to Graphics mode. The utility can be installed on an ESXi host directly using the NVIDIA provided .vib file, making the change pretty easy to do so.
Modifying the Operating Mode
One of the nice things during the Nutanix imaging process is that it detects the model of nodes that are being imaged, and installs all of the components, including VMware ESXi and associated .vib files, which also include the NVIDIA_Host_Driver and GPUModeSwitch_Drver .vibs, GREAT!
When I went to run the gpumodeswitch utility, I got the below results – odd that I know the M60 cards are in there but the commands not showing the M60.
Enter NVIDIA Support’s KB entry… The reason behind this is that while Nutanix is awesome at taking care of installing the associated .vib files for us, you cannot run the gpumodeswitch utility because the NVIDIA Driver is loaded with the OS, so the utility cannot modify the GPU mode since the OS has taken control of the card.
Since Nutanix took care of the imaging for us, the VMware NVIDIA Host Driver was installed and was the reason why the GPU Mode couldn’t be changed. So to fix this, we now have to:
- Remove the NVIDIA Host Driver
- Run the gpumodeswitch utility to change the operating mode
- Reinstall the NVIDIA Host Driver
I’ve included the commands I ran thru SSH on each of the VMware hosts to be able to get the utility to run and change the operating mode.
vim-cmd hostsvc/maintenance_mode_enter esxcli software vib remove -n NVIDIA-VMware_ESXi_6.5_Host_Driver esxcli software vib remove -n NVIDIA-VMware_ESXi_6.0_GpuModeSwitch_Driver reboot esxcli software vib install -v /tmp/NVIDIA-GpuModeSwitch-1OEM.600.0.0.2494585.x86_64.vib --no-sig-check gpumodeswitch --gpumode graphics --auto reboot esxcli software vib install -v /tmp/NVIDIA-VMware_ESXi_6.5_Host_Driver_384.73-1OEM.6188.8.131.5298673.vib reboot vim-cmd hostsvc/maintenance_mode_exit
vSphere Host Modification
Now that we’ve got the host all prepared and ready to be used (Host Driver reinstalled, Correct GPU mode), there was still a few steps that we needed to take on the host itself to ensure that the M60 cards can be fully utilized.
To be able to take advantage of our M60 GPU, we need to modify the graphics configuration on each host from Shared to Shared Direct, and either reboot the host or restart the xorg service by issuing the command /etc/init.d/xorg restart.
VM Configuration for GPU
Now that we have our GPU cards in the proper mode, our hosts configured to allow GPU passthru, the final step was to prepare the master images to have GPU capabilities. These steps are pretty simple, however there was 2 items that caught me off guard with the VM configuration requirements.
To provide vGPU capabilities to our VMs, it’s as simple as adding a Shared PCI device to the VM, and then selecting the NVIDIA Grid vGPU and profile we want to use.
The first item that caught me off guard was the fact that to enable the GPU to support NVIDIA Grid vGPU, you must reserve 100% of the memory for the VM. Simple to do, yet I just wasn’t expecting this requirement.
The second item that caught me off guard was the fact that once the VM is powered on and using the NVIDIA vGPU capabilities, one cannot connect to the VM Console thru vCenter anymore, as a black screen is shown. The workaround for this is to also install the Horizon View Direct Connect Agent to the VMs who will be using the vGPU capabilities – think your Master Image here!
Validating our GPU Usage
One of the benefits with the M60 card is being able to get the utilization of the card (memory and processes) via CLI, but the biggest benefit I see from this capability as well is being able to see the individual VMs using the GPU resources, shown below thru the command Nvidia-smi.
Nutanix has a great table of troubleshooting commands when dealing with vGPU installations.
To highlight back on the above, a few lessons learned with deploying Nutanix nodes with VMware and the NVIDIA Grid Cards.
- While the Nutanix imaging process is great, you have to remember to remove the Host Drivers for NVIDIA so you can do the gpumodeswitch utility
- This takes a few reboots, so plan accordinly
- Remember to do the Host and VM modifications to take advantange of the vGPU capabilities
- DON’T FORGET the NVIDIA License Server
- Not touched on above but also DON’T FORGET to install the NVIDIA drivers into the VMs that will use vGPU capabillties
- Last but not least, here’s some links (some from above) that helped me with this process and most of the credit goes to folks who’ve gone thru this process before:
Thanks for reading! As this was my first time running thru a NVIDIA M60 installation with Nutanix, I might have not done this the most efficient way, so feel free to drop me a note if you’ve got any feedback or questions!