On the fourth day of Christmas my true love gave to me… four CUDA tips…

This post is from a blog called Adventures in GraphicsLand that I’m writing with two fellow CS grad students, Chris Gibson and Ryan Schmitt. Articles about anything related to my graduate work in graphics or my thesis will be posted there and then cross-posted here. Articles about handy tips (like fixing bugs with VirtualBox or software setup on Fedora) will remain here. This post that I wrote for AIGFX, originally appeared here.

Learning CUDA has definitely been an interesting experience. As much as they make it sound like it’s simple to get started (and for the most part, it is), there are lots of little traps that can keep you frustrated for hours… or days. Here are four tips that stumped me during initial development of Haste (which is now on GitHub!) that might be helpful to you.

Long running kernels on a desktop workstation

In Linux, X’s driver watchdog will kill a process that leaves a driver hanging for too long, so to prevent that from happening you can’t launch a GPU kernel unless it returns within a couple milliseconds. (This happens in Windows, too, but I’m working mainly in Linux at the moment.) However, you might want to test kernels on your workstation. The way around this is to switch to a text-only terminal before running your CUDA program. On most Linux distributions, you can swap between terminals using Ctrl-Alt-F2 through Ctrl-Alt-F6, where each is a different terminal. If you hit Ctrl-Alt-F1 in Fedora 14, it will take you back to your X session (you’re still logged in and everything).

So, all you need to do is write code in your graphical desktop, compile, hit Ctrl-Alt-F2 to switch to a text-only terminal, then run your program for testing. When you want to go back to graphical mode to fix bugs, just Ctrl-Alt-F1 back and off you go.

Slow device info queries

If you’re doing doing development on a headless compute box (like our Tesla machine at Cal Poly), you might have noticed that querying device information takes a long time. This is compounded if it’s a multi-device machine. Our box at Poly has four Tesla GPUs, and Haste startup was frustratingly slow. All we did is query the device list once, then query each device individually using cudaGetDeviceProperties(). It usually take on the order of 30 to 45 seconds at program startup to get all the device information and allocate memory before we were off to the races launching kernels.

The problem is that the NVIDIA drivers normally maintain a lot of state about the GPUs in memory. However, this state is only there if there’s some resident process keeping it there, like X. If X is not running (or not even installed, like on our headless compute box), that state will need to get reinitialized every time you make a call that requires it. This can be excruciatingly slow, especially on multi-device machines.

The solution? Well, the easiest one is to just install and leave X running, even on a headless machine. Just make sure it’s not driving a display, or better yet switch it over to a text-only terminal with Ctrl-Alt-F2 to keep X around but not have it interfere with your kernels.

Printing debug info in device kernels

I must admit, while debuggers are neat, I tend to like printf() debugging. It’s not that I don’t see the value of debuggers; for some problems they’re really the only way to solve things. Maybe it has something do with the fact that cuda-gdb inexplicably crashes on every machine and kernel I try to run it on.

With the Fermi architecture, available in cards of compute capability 2.0 and higher, you can actually do printf()‘s directly from your device code now, without having to jump through any strange library hoops. Initially, however, I was never able to get it to work. I couldn’t find which CUDA header I needed to include to get things off the ground, and even when it seemed to compile it didn’t print anything.

Well, it sounds silly, but just #include <stdio.h> and away you go. I never tried this initially because I thought that didn’t make any sense. The C standard library doesn’t have CUDA device code! The best I can tell, nvcc is rewriting these standard calls from device code behind the scenes.

The device info’s maximumThreadsPerBlock lies!

This one really irks me. If you query a device’s properties, it reports the maximum number of threads per block in a cudaDeviceProp struct member called, shockingly, maxThreadsPerBlock. The problem is that this is not the actual number of threads you can launch. That depends entirely on your kernel’s occupancy, which you can figure out using the difficult-to-find occupancy calculator spreadsheet. You’ll also want to compile your kernel with the nvcc option --ptxas-options=-v to see the shared memory and register usage for your kernel. You’ll need it in the spreadsheet.

The occupancy limit doesn’t bug me so much as the fact that this is not mentioned anywhere in the documentation where maxThreadsPerBlock is mentioned. Once would think that would be a great place to throw up a warning flag, letting developers know that that number is purely speculative, and that they need to do some real benchmarking of their kernel to find the best occupancy and thread launch combination. Essentially, the maxThreadsPerBlock element is entirely superfluous, since it’s only real use would be in scaling kernel launch sizes by number of device threads available. However, instead we should apparently embed the Excel worksheet in our program and have the device properties chug through the macros to provide any runtime adjustments based on the hardware we’re running on. (</sarcasm>) Yeesh.

Hopefully these tips help you out. As I continue to bang my head against the wall and find new tidbits I’ll be keeping track of them on my GitHub wiki page. Happy holidays!

Murmur (Mumble server) on Fedora 13

Just a quick note if you’re finding yourself stumped when installing murmur (the server component of Mumble) via yum on Fedora 13.

The version currently in the yum repositories has a broken init script, so if you try to sudo service murmur start you’ll get all sorts of nasty errors. The version currently in updates-testing works great, though. Install it from there like so:

sudo yum --enablerepo=updates-testing install murmur

That should do it. If you still have problems, try installing the redhat-lsb and qt-sqlite packages and see if that helps.

Make vim save files with Ctrl-s like Windows

Have you ever been working in vim over ssh and hit Ctrl-s by accident? This happens to folks who also work on Windows all the time, because Ctrl-s is the standard Windows keyboard shortcut for saving a file. What happens over ssh is that you issue a terminal stop command, and your ssh session appears to lock up.

The good news is, it’s not lost — just frozen. You can “unstop” the terminal by hitting Ctrl-q. Good as new! But we can do more…

If you’re a Windows user who ssh’s into *nix boxes frequently, we can actually make Ctrl-s in vim save the file like you’re intending. First, add the following to your .bashrc file to disable terminal stopping:

stty stop ''

You’ll notice that now the Ctrl-s doesn’t lock up vim anymore, but it doesn’t do anything yet. Let’s add that functionality now with two mappings in our .vimrc file.

map <C-s> :w<CR>
imap <C-s> <Esc>:w<CR>i

Boom, done! Now when you hit Ctrl-s in vim, rather than locking up your terminal, it saves the file. In command mode, it just executes the traditional :w command, and in insert mode, it hits escape (to get to command mode), does the :w, and then hits “i” to get you back into insert mode where you left off.

I’m not a vim expert by any means, so if anyone has a better way to do it, I’m all ears.

Installing TrueType Fonts in Fedora

I’ve haven’t written recently because I was completely bogged down finishing up my Bachelors degree and applying to grad schools, but here’s a quick tip for those of you looking for a painless way to install third party TrueType fonts in Fedora. This may work in other Linux distros, but Fedora is my distro of choice so that’s why it’s used here.

If you need to install fonts accessible to all users on the system, you have to do some more complicated voodoo. I hate voodoo, so I’m not going to cover that.

If all you need is to install a font for your own user, it’s just this simple:

Put your myfont.ttf file in your home directory, under directory called /.fonts/. So in other words, your font lives at:

/home/youruser/.fonts/myfont.ttf

Create the directory if it doesn’t already exist. Finally, restart any application you want to use that font in, and you should see it show up. You’re good to go.

Fixing the Fedora 12 VirtualBox Guest Additions problem

I’m a frequent VirtualBox user, and as I’ve noted in my previous posts, I’m an avid fan of Fedora as well.

However, there is a nasty bug in the most recent version of VirtualBox (3.1.2) when combined with Fedora 12. After installing the Guest Additions kernel modules as per the user docs, the system boots to a black screen with a cryptic error message that looks like a SELinux labeling problem (it’s not).

type=1305 audit(12587840002.571:32444): audit_enabled=0 old=1 auid=4294967295 ses=4294967295 subj=system_u:system_r:readahead_t:s0 res=1

The problem is actually with the Guest Additions video driver, the one that gives you the nice resizable desktop window. Once the driver is built and installed, for some reason the X server can’t find any screens and refuses to start.

Until the bug gets fixed in the video driver, here’s how you can fix the system so that it will boot correctly, although you’ll lose the dynamic resizing ability. You’ll have to stick with fixed, predefined resolutions for now.

  1. Mount a Fedora 12 ISO, such as the full or network install discs, and boot to it. Boot into Rescue Mode from the GRUB bootloader screen.
  2. Breeze through the language and network options, but be sure to have it mount your hard disk image (it will mount under /mnt/sysimage).
  3. Drop into a shell and change into your hard disk’s X11 config directory, so that would be:
    cd /mnt/sysimage/etc/X11
  4. Edit your xorg.conf file… but wait! In Fedora 12, they switched to HAL for X configuration, so there is no xorg.conf file! Never fear, you just need to create one and it will override the HAL:
    vi xorg.conf
  5. Now, use the following settings for the new xorg.conf file:
    Section "Device"
        Identifier "Configured Video Device"
        Driver "vboxvideo"
    EndSection
    
    Section "Monitor"
        Identifier "Configured Monitor"
    EndSection
    
    Section "Screen"
        Identifier "Configured Screen"
        Monitor "Configured Monitor"
        Device "Configured Video Device"
        SubSection "Display"
            Depth 24
            Modes "1440x900" "1680x1050"
        EndSubSection
    EndSection
    
    Section "InputDevice"
        Identifier "vboxmouse"
        Driver "vboxmouse"
        Option "CorePointer"
        Option "Device" "/dev/input/mice"
    EndSection
    
    Section "ServerLayout"
       Identifier   "Default Layout"
       Screen      "Configured Screen"   0 0
       InputDevice   "vboxmouse"
    EndSection
    
  6. You’ll see I’ve defined two resolutions, 1440×900 and 1680×1050. What this allows me to do is work windowed at 1440×900 and if I want to go full screen (remember, dynamic resizing won’t work) I can hit the full screen shortcut in VirtualBox (Host+F) and change the resolution within Fedora to match my screen res.
  7. Save from vi (:wq) and reboot the system. Remember to unmount the install disc! The system should boot correctly now, albeit without dynamic resizing.

A huge thanks goes out to Jits in the VirtualBox forums for this fix!