updating nvidia drivers to cuda 12.4 on a brev machine

Steps:

Get an H100 on brev. has to be the fluidstack one that you can reboot. yes, it only has 100 GB of disk space.
get your keys

wget <https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.1-1_all.deb>
sudo dpkg -i cuda-keyring_1.1-1_all.deb

just blow up their drivers and install better ones

sudo apt-get purge 'nvidia-.*'
sudo apt-get install cuda-drivers-550 nvidia-container-toolkit -y
sudo reboot

configure docker

sudo nvidia-ctk runtime configure --runtime=docker
sudo rm /etc/cdi/nvidia.yaml 
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
sudo systemctl restart docker

You may be good to go!
But, if you get errors like RuntimeError: cuDNN Frontend error: [cudnn_frontend] Error: No execution plans support the graph. or Could not load library [libcuda.so](<http://libcuda.so/>). Error: [libcuda.so](<http://libcuda.so/>): cannot open shared object file: No such file or directory, then you need to link libcuda.so within your docker container. (I’ve seen this sometimes but not on every machine) If so, you’ll have to cog -p 5000 run bash or docker run ... your container and then

ln -s /usr/lib/x86_64-linux-gnu/libcuda.so.1 /usr/lib/x86_64-linux-gnu/libcuda.so
python -m cog.server.http (or whatever command you actually want to run)

liveblog, preserved for posterity:

section for ubuntu here:https://docs.nvidia.com/datacenter/tesla/driver-installation-guide/index.html#ubuntu-installation