<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>SIGTTOU &#187; C/C++</title>
	<atom:link href="http://sigttou.com/category/c/feed" rel="self" type="application/rss+xml" />
	<link>http://sigttou.com</link>
	<description>Just another background process...</description>
	<lastBuildDate>Wed, 22 Dec 2010 00:12:06 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>On the fourth day of Christmas my true love gave to me&#8230; four CUDA tips&#8230;</title>
		<link>http://sigttou.com/four-cuda-tips</link>
		<comments>http://sigttou.com/four-cuda-tips#comments</comments>
		<pubDate>Wed, 22 Dec 2010 00:12:06 +0000</pubDate>
		<dc:creator>Bob Somers</dc:creator>
				<category><![CDATA[Adventures in GraphicsLand]]></category>
		<category><![CDATA[C/C++]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[Linux]]></category>

		<guid isPermaLink="false">http://sigttou.com/?p=282</guid>
		<description><![CDATA[This post is from a blog called Adventures in GraphicsLand that I&#8217;m writing with two fellow CS grad students, Chris Gibson and Ryan Schmitt. Articles about anything related to my graduate work in graphics or my thesis will be posted &#8230; <a href="http://sigttou.com/four-cuda-tips">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><strong><em>This post is from a blog called <a href="http://aigfx.com">Adventures in GraphicsLand</a> that I&#8217;m writing with two fellow CS grad students, Chris Gibson and Ryan Schmitt. Articles about anything related to my graduate work in graphics or my thesis will be posted there and then cross-posted here. Articles about handy tips (like fixing bugs with VirtualBox or software setup on Fedora) will remain here. This post that I wrote for AIGFX, originally appeared <a href="http://www.aigfx.com/2010/12/four-cuda-tips/">here</a>.</em></strong></p>
<p>Learning CUDA has definitely been an interesting experience. As much as they make it sound like it&#8217;s simple to get started (and for the most part, it is), there are lots of little traps that can keep you frustrated for hours&#8230; or days. Here are four tips that stumped me during initial development of Haste (which is <a title="Haste on GitHub" href="https://github.com/cphaste/haste" target="_blank">now on GitHub</a>!) that might be helpful to you.</p>
<h3>Long running kernels on a desktop workstation</h3>
<p>In Linux, X&#8217;s driver watchdog will kill a process that leaves a driver hanging for too long, so to prevent that from happening you can&#8217;t launch a GPU kernel unless it returns within a couple milliseconds. (This happens in Windows, too, but I&#8217;m working mainly in Linux at the moment.) However, you might want to test kernels on your workstation. The way around this is to switch to a text-only terminal before running your CUDA program. On most Linux distributions, you can swap between terminals using Ctrl-Alt-F2 through Ctrl-Alt-F6, where each is a different terminal. If you hit Ctrl-Alt-F1 in Fedora 14, it will take you back to your X session (you&#8217;re still logged in and everything).</p>
<p>So, all you need to do is write code in your graphical desktop, compile, hit Ctrl-Alt-F2 to switch to a text-only terminal, then run your program for testing. When you want to go back to graphical mode to fix bugs, just Ctrl-Alt-F1 back and off you go.</p>
<h3>Slow device info queries</h3>
<p>If you&#8217;re doing doing development on a headless compute box (like our Tesla machine at Cal Poly), you might have noticed that querying device information takes a long time. This is compounded if it&#8217;s a multi-device machine. Our box at Poly has four Tesla GPUs, and Haste startup was frustratingly slow. All we did is query the device list once, then query each device individually using <code>cudaGetDeviceProperties()</code>. It usually take on the order of 30 to 45 seconds at program startup to get all the device information and allocate memory before we were off to the races launching kernels.</p>
<p>The problem is that the NVIDIA drivers normally maintain a lot of state about the GPUs in memory. However, this state is only there if there&#8217;s some resident process keeping it there, like X. If X is not running (or not even installed, like on our headless compute box), that state will need to get reinitialized every time you make a call that requires it. This can be excruciatingly slow, especially on multi-device machines.</p>
<p>The solution? Well, the easiest one is to just install and leave X running, even on a headless machine. Just make sure it&#8217;s not driving a display, or better yet switch it over to a text-only terminal with Ctrl-Alt-F2 to keep X around but not have it interfere with your kernels.</p>
<h3>Printing debug info in device kernels</h3>
<p>I must admit, while debuggers are neat, I tend to like <code>printf()</code> debugging. It&#8217;s not that I don&#8217;t see the value of debuggers; for some problems they&#8217;re really the only way to solve things. Maybe it has something do with the fact that <a title="cuda-gdb goes kaboom" href="http://forums.nvidia.com/index.php?s=91b8cd119e65d54ab921f4415fc4fcfc&amp;showtopic=188223" target="_blank">cuda-gdb inexplicably crashes</a> on every machine and kernel I try to run it on.</p>
<p>With the Fermi architecture, available in cards of compute capability 2.0 and higher, you can actually do <code>printf()</code>&#8216;s directly from your device code now, without having to jump through any strange library hoops. Initially, however, I was never able to get it to work. I couldn&#8217;t find which CUDA header I needed to include to get things off the ground, and even when it seemed to compile it didn&#8217;t print anything.</p>
<p>Well, it sounds silly, but just <code>#include &lt;stdio.h&gt;</code> and away you go. I never tried this initially because I thought that didn&#8217;t make any sense. The C standard library doesn&#8217;t have CUDA device code! The best I can tell, <code>nvcc</code> is rewriting these standard calls from device code behind the scenes.</p>
<h3>The device info&#8217;s maximumThreadsPerBlock lies!</h3>
<p>This one really irks me. If you query a device&#8217;s properties, it reports the maximum number of threads per block in a <code>cudaDeviceProp</code> struct member called, shockingly, <code>maxThreadsPerBlock</code>. The problem is that this is not the actual number of threads you can launch. That depends entirely on your kernel&#8217;s occupancy, which you can figure out using the difficult-to-find <a href="http://developer.download.nvidia.com/compute/cuda/3_2_prod/sdk/docs/CUDA_Occupancy_Calculator.xls" target="_blank">occupancy calculator spreadsheet</a>. You&#8217;ll also want to compile your kernel with the <code>nvcc</code> option <code>--ptxas-options=-v</code> to see the shared memory and register usage for your kernel. You&#8217;ll need it in the spreadsheet.</p>
<p>The occupancy limit doesn&#8217;t bug me so much as the fact that this is not mentioned anywhere in the documentation where <code>maxThreadsPerBlock</code> is mentioned. Once would think that would be a great place to throw up a warning flag, letting developers know that that number is purely speculative, and that they need to do some real benchmarking of their kernel to find the best occupancy and thread launch combination. Essentially, the <code>maxThreadsPerBlock</code> element is entirely superfluous, since it&#8217;s only real use would be in scaling kernel launch sizes by number of device threads available. However, instead we should apparently embed the Excel worksheet in our program and have the device properties chug through the macros to provide any runtime adjustments based on the hardware we&#8217;re running on. (&lt;/sarcasm&gt;) Yeesh.</p>
<p>Hopefully these tips help you out. As I continue to bang my head against the wall and find new tidbits I&#8217;ll be keeping track of them on my <a title="GitHub wiki" href="https://github.com/bobsomers/haste/wiki/Cuda-notes" target="_blank">GitHub wiki page</a>. Happy holidays!</p>
]]></content:encoded>
			<wfw:commentRss>http://sigttou.com/four-cuda-tips/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Include dependencies</title>
		<link>http://sigttou.com/include-dependencies</link>
		<comments>http://sigttou.com/include-dependencies#comments</comments>
		<pubDate>Tue, 22 Dec 2009 06:56:46 +0000</pubDate>
		<dc:creator>Bob Somers</dc:creator>
				<category><![CDATA[C/C++]]></category>
		<category><![CDATA[Code]]></category>

		<guid isPermaLink="false">http://sigttou.com/?p=139</guid>
		<description><![CDATA[Generally when I write software, I try to keep things relatively well organized. Inevitably, however, things are going to get a bit messy, especially if you&#8217;re working on a large, disorganized codebase that you didn&#8217;t write to begin with&#8230; say, &#8230; <a href="http://sigttou.com/include-dependencies">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Generally when I write software, I try to keep things relatively well organized. Inevitably, however, things are going to get a bit messy, especially if you&#8217;re working on a large, disorganized codebase that you didn&#8217;t write to begin with&#8230; say, oh&#8230; something like the <a href="http://developer.valvesoftware.com">Source SDK</a>.</p>
<p>Frequently you have some class which is composed inside another class, but occasionally needs to access the class it&#8217;s composed inside of. Basically, the classes are composed inside each other, though the abstraction really only makes sense in one direction. Confused yet?</p>
<p>In this example, we&#8217;ll use a Refrigerator class which stores inside it an instance of a Cheese class. Why cheese, you ask? Because cheese is delicious. Also, our refrigerator is from the future and can slice and serve cheese just like the built in ice maker and water dispenser. It&#8217;s a pretty sweet fridge.</p>
<p>Now, we were all taught to keep our <code class="syntax cpp">#include</code>s in our header files, not the implementation files, so like good little programmers we construct our classes like so:</p>
<p><strong>refrigerator.h</strong></p>
<pre class="syntax cpp">#include &quot;cheese.h&quot;

class Refrigerator {
private:
    Cheese *pCheese;
    int temp = 35;

public:
    void ServeCheese();
    int GetTemp();
};</pre>
<p><strong>refrigerator.cpp</strong></p>
<pre class="syntax cpp">#include &quot;refrigerator.h&quot;

void Refrigerator::ServeCheese() {
    printf(&quot;Now dispensing %s cheese!\n&quot;, pCheese-&gt;GetFlavor());
}

int Refrigerator::GetTemp() {
    return temp;
}</pre>
<p><strong>cheese.h</strong></p>
<pre class="syntax cpp">class Cheese {
private:
    Refrigerator *pFridge;
public:
    char *GetFlavor();
    void CheckTemp();
    void BeginMolding();
};</pre>
<p><strong>cheese.cpp</strong></p>
<pre class="syntax cpp">#include &quot;cheese.h&quot;

void Cheese::CheckTemp() {
    if (pFridge-&gt;GetTemp() &gt; 45) {
        BeginMolding();
    }
}

char *Cheese::GetFlavor() {
    return &quot;cheddar&quot;;
}
</pre>
<p>I&#8217;ve left out the constructors in this example for brevity, but let&#8217;s assume that they get the pointers set up correctly so that our instance of the Refrigerator class has a correct pointer to an instance of the Cheese class and vice versa.</p>
<p>Now, at this point you may be screaming that this needs to be refactored and reorganized. Yes, it probably does. But there are many instances where you simply can&#8217;t, and in fact the abstraction really only makes sense one way. The fridge has cheese in it, but the cheese certainly doesn&#8217;t have a fridge in it. We just need that pesky reference around so we can check the temperature of the fridge every so often.</p>
<p><em>(Yes, I am aware that the fridge could push it&#8217;s temperature down to all the items in it, ala the <a href="http://en.wikipedia.org/wiki/Observer_pattern">Observer Pattern</a>. Yes I am aware that would be a better solution. But this is a contrived example anyway, so stick with me here.)</em></p>
<p>Now, the code given above doesn&#8217;t compile, because the Cheese class has no idea what the heck this Refrigerator class is, so we either need to include it or forward declare it. If we try to do this:</p>
<p><strong>cheese.h</strong></p>
<pre class="syntax cpp">#include &quot;refrigerator.h&quot;</pre>
<p>our compiler (more specifically, the preprocessor) is going to get very angry at us, depending on which order it decides to compile refrigerator and cheese. The solution, is a forward delcaration:</p>
<p><strong>cheese.h</strong></p>
<pre class="syntax cpp">class Refrigerator;

class Cheese {
    // ...etc...
};</pre>
<p>Basically what this does is tell the compiler, &#8220;Hey! There&#8217;s this class called Refrigerator that I might talk about, so here&#8217;s an empty declaration of it!&#8221;</p>
<p>The problem, though, is that this is rather limiting. Within the Cheese class, we can declare pointers to Refrigerator class, no problem. Pointers are of fixed size, so the compiler doesn&#8217;t much what care what it&#8217;s a pointer <strong><em>to</em></strong>, since it knows how much memory it needs to hold a pointer to it. When we try to access members of the class, though, like properties or methods, it falls apart because as far as the compiler knows, the class is empty. After all, we told it the Refrigerator class didn&#8217;t have anything in it.</p>
<p>So if we can&#8217;t <code class="syntax cpp">#include</code> it and forward declaring it doesn&#8217;t give us what we want, what can we do?</p>
<p>Well, we can do <strong>both</strong>. Kind of.</p>
<p>The solution is to forward declare in your header file, and <code class="syntax cpp">#include</code> in your implementation file. This will avoid the preprocessor headaches of of the chicken and egg <code class="syntax cpp">#include</code>, while allowing us to access the members of forward declared class in the implementation. In other words, here&#8217;s the fix:</p>
<p><strong>cheese.h</strong></p>
<pre class="syntax cpp">class Refrigerator;

class Cheese {
    ...
};</pre>
<p><strong>cheese.cpp</strong></p>
<pre class="syntax cpp">#include &quot;cheese.h&quot;
#include &quot;refrigerator.h&quot;
</pre>
<p>Again, it goes without saying, the better solution is to refactor or rearchitect your code if you can. These kind of hacks can get really out of hand and are usually a good <a href="http://en.wikipedia.org/wiki/Code_smell">code smell</a> that something needs to be fixed at a deeper level. However, if you&#8217;re working on a large codebase that you can&#8217;t change, this can help out a lot.</p>
]]></content:encoded>
			<wfw:commentRss>http://sigttou.com/include-dependencies/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Removing entities in the Source SDK</title>
		<link>http://sigttou.com/removing-entities</link>
		<comments>http://sigttou.com/removing-entities#comments</comments>
		<pubDate>Sat, 24 Oct 2009 23:54:49 +0000</pubDate>
		<dc:creator>Bob Somers</dc:creator>
				<category><![CDATA[C/C++]]></category>
		<category><![CDATA[Code]]></category>
		<category><![CDATA[Gaming]]></category>

		<guid isPermaLink="false">http://sigttou.com/?p=133</guid>
		<description><![CDATA[I haven&#8217;t written for a while, mainly because I&#8217;ve been busy with classes and studying for the GRE for my grad school applications, but here&#8217;s a quick tip for those of you meddling around with the Source SDK. It&#8217;s well &#8230; <a href="http://sigttou.com/removing-entities">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I haven&#8217;t written for a while, mainly because I&#8217;ve been busy with classes and studying for the GRE for my grad school applications, but here&#8217;s a quick tip for those of you meddling around with the Source SDK.</p>
<p>It&#8217;s well documented on how you go about spawning entities, but I couldn&#8217;t find a good place explaining how to <em>remove</em> spawned entities through code.</p>
<p><strong>Don&#8217;t</strong> try meddling with the global entity list (<code class="syntax cpp">gEntList</code>) or calling its <code class="syntax cpp">RemoveEntity()</code> method. It doesn&#8217;t do what you want.</p>
<p><strong>Instead</strong>, used one of the super-handy <code class="syntax cpp">UTIL_*</code> functions. Given a pointer to the entity you want to remove, simply use:</p>
<pre class="syntax cpp">
UTIL_Remove(pEntity);
</pre>
<p>Poof. Entity gone. Remember, entities are created and destroyed on the <strong>server side only.</strong> The server will automatically broadcast any changes to the entity list to its connected clients for you.</p>
]]></content:encoded>
			<wfw:commentRss>http://sigttou.com/removing-entities/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

