Know your text editor

Let’s examine why it’s important as a programmer to really know your text editor, but in a different light that usual. Traditionally programmers are concerned with knowing all the tricks of their editor so that they can make use of them. Instead, let’s look at an example of why you should know what your editor does so it doesn’t screw you over.

Today at work a coworker received a large volume of joke emails telling them how bad their coding style was, with links to example code telling them to “learn how to program”. This was in response to what he thought was a simple 6-line change, but was actually a 22,000-line “oops”.

He opened up a file that’s roughly 25,000 lines (let’s not get into why it’s that long…) to make a simple change. After adding about 6 lines, he saved, quit, and committed the change to the repository. His editor of choice is vim (as is mine).

For those of you unfamiliar with vim, it’s a text-based editor that has a fairly steep learning curve. It takes years to truly master, but the power it offers you is incredible. In vim you start in command mode, which let you enter vim commands much like you would on a command line. You can switch to insert mode to actually insert text into your document, or other modes for formatting and whatnot.

Well in this case, the programmer accidentally brushed the escape key while he was typing, which brought him out of insert mode and into command mode. What he thought was being typed into a comment in his code was actually being typed into the vim command line. He must have brushed some other keys as well, because here’s the unfortunate key sequence that he ended up entering:

:% <<<<<<<<

For those of you not well-versed in vim commands, here’s an explanation. The colon starts a command, and the % symbol is like a wildcard for the command, applying it to the entire file instead of the current selection. Each < character shifts the current selection one tab stop to the left. Similarly, the > character shifts the current selection one tab stop to the right. It’s phenomenally handy for quickly adjusting the level that a block of code is indented.

However, what that above command will do is basically crush every level of indention against the left margin, and then apply that to the entire file. What you end up with is having every line of your beautifully indented code smashed down to column 0. No indention. None.

As vim was updating this change to the swap file (remember, there were 25,000 lines) he saved and quit. The file had been updated, but his screen had not, so he had no idea he’d accidentally just nuked the indention of the entire file.

It was his commit to the repository that threw up the red flag. This was supposed to be a small bug fix. However, the commit email, sent out to the entire mailing list, contained a 22,000 line diff. Woops.

After looking at the changes and chuckling over the fact that the file was completely unreadable now, many people on the coding team sent him joke emails about “what crappy coding style” he had, with links to “learn how to program”.

Now, naturally this took only about 2 seconds to fix, since the code was under source control. But the point still stands. Not only learn your editor, but watch it with a careful eye when you’re working with it, lest all those features that are there to increase productivity turn against you.

Ban programmers, not functions

So my daily travels around the intertubes landed me on a very interesting blog post by Microsoft’s Security Development Lifecycle team (which they call SDL, not to be confused with the arguably more useful Simple DirectMedia Layer library). The post centered around them adding memcpy() to the banned functions list in favor of their more “secure” variant, memcpy_s(), which takes and checks the size of the destination buffer.

Before I explain why I think this is another example of Microsoft spending their time doing something incredibly useless instead of innovating, let me explain that all these blasted _s functions are one of the reasons I detest the Windows API so much.

I had the unfortunate “pleasure” of digging rather deep into the Windows API for a project I was working on this past spring quarter. For those of you who haven’t ventured into the Windows API, let me say this: It’s so incredibly confusing that it doesn’t even look like C anymore.

Almost everything uses custom types, even when there’s no logical reason to do so. The Linux API does this to some extent, but not nearly as bad as Microsoft.

Secondly, there seems to be no rhyme or reason as to what these types are named. Some are named as ALL_CAPITALS_TYPE, others _use_this_strange_underscore_prefix, and some use the standard type_t. If you start to use almost any standard library C function, you’ll inevitably be told by the compiler that you’re doing it wrong, and should use strcpy_s(), or _strcpy_s(), or _s_t_r_c_p_y_s_(). Seriously, their API has got more underscores than Bill Gates has dollar bills.

What this gives you is this strange, alien language that vaguely resembles C, but is so ugly and hideous that you’re afraid to touch it. Apple has Objective-C. Microsoft has Franken-C.

So let me back up and explain this blog post I mentioned earlier. I’m a bit behind on this one (I’ll admit I’m not often found venturing into the MSDN blogs) but back in May the SDL announced that they were adding memcpy() to their banned functions list, to join strcpy(), strcat(), strncpy(), strncat(), gets(), and others.

They announced it’s replacement, memcpy_s() (soon to be replaced by _memcpy_s() and _m_e_m_c_p_y_s_() I’m sure), which takes one additional argument: the size of the destination buffer.

This is aimed make usages of memcpy() more secure, by only copying up to the size of the destination buffer bytes, even if that’s less than the length of the bytes you want to copy. You go from using this:

memcpy(dst, src, len);

to using this:

memcpy_s(dst, sizeof(dst), src, len);

This sounds reasonable, except most Windows programmers will just do this:

memcpy_s(dst, len, src, len);

which makes your “secure” version useless.

The problem here is not that memcpy() doesn’t check the size of the destination buffer, but rather that some programmers are using it without thinking. A 50 caliber sniper rifle is a very powerful tool in the hands of a marksman, but in a cage full of chimpanzees, the results could be disastrous.

If nothing else, memcpy_s() makes you think about the size of the target buffer.

I suppose, unless you’re one of the mindless programmers using memcpy() unsafely before, in which case you’ll learn the new and improved mindless version memcpy_s(dst, len, src, len) and continue on your merry way.

My point here is that banning functions that are the common source of security vulnerabilities doesn’t fix the problem, because the problem isn’t with the functions. These functions are well documented and we know exactly how they work and what their dangers are. The problem is with the programmers.

You’ve got to teach your programmers how to use these functions securely, or at least evaluate when they should ask for someone to review their code. If training isn’t an option, there’s a better option than banning these functions.

Ban programmers who use them wrong. Yes, banish them to the land of C# and other fluffy dynamic languages with garbage collectors and infinite buffers. They’ll do far less harm there.

The key to doing memory management correctly (which includes using memcpy(), strcpy(), etc. in safe ways) is to completely engage your brain when you’re doing it. You cannot zone out when writing memory managing code. Although given the quality of code coming out of Redmond, I would not be surprised if most of the programmers have their brains permanently switched off.

As classic-Microsoft as this blog post was, the best line was last one.

I wonder when Larry, Steve and Linus will start banning strcpy() in their products?

Words cannot express the hilarity that ensued when I read this line. Maybe, just maybe, the reason they haven’t found the need to ban them is because they’re using them correctly. Perhaps if Microsoft tried that every once in a while, they would churn out more secure products themselves without having to resort to Franken-C.

Simulating Monopoly

Last summer I spent a good deal of time playing Monopoly. It was my final summer of marching drum corps and we were taking Amtrak out to the midwest for the last leg of our summer tour. There isn’t really much to do on a train for 3 days except eat, sleep, and sit in the observation car playing Monopoly.

One of the other corps members was absolutely destroying us. He clearly knew what he was doing when it came to fictional property management. He let me in on a little secret, though. Not all properties are created equal.

What he meant was that certain properties are statisically more likely to be landed on than others. It made sense when he explained it, but I wasn’t sure just how much of an actual difference it made. When I got home from the summer tour, I sat down and wrote some code to find out.

Thankfully in the game of Monopoly, a player’s movement around the board is very much decoupled from their financial transactions. What I mean by that is that they continue to move around the board in the same way, regardless of which properties they chose to buy, how much they have to pay to other players, or which properties they trade. The only exception is when a player goes bankrupt, and ceases to circle the board.

What this allowed me to do is write a very simple simulation in a few hours without having to worry about realistic AI. The players didn’t have to make financial transactions at all. They merely had to roll the dice and move according to the rules of Monopoly. I did need to account for the actions of the Chance and Community chest cards, but only when it impacted their position on the board or their future turns.

Certain spaces on the board can also affect the player’s position and future, so that had to be accounted for as well. For example, landing on Go to Jail will send a player directly to the jail square, and a player in jail has to go through a series of dice rolls or use a Get of Jail Free card to move again.

The game board used in a game of monopoly.

The game board used in a game of monopoly.

As I was finishing up writing the simulation, I began to suspect that the advice I had received on the train was correct. The claim was that the orange properties (St. James Place, Tennessee Avenue, and New York Avenue) were the most lucrative on the board because players were more likely to land on them. This indeed makes sense, because several Chance cards send you directly to jail and so does the Go to Jail square. Thus, the most common starting point for a dice roll is Jail. Since the most common dice roll for 2 six-sided dice will be in the 6-8 range, it only makes sense that those are the most commonly landed-on properties.

Indeed, this is exactly how it plays out. My simulation played 2000 games with 6 players and 1000 turns per game, and dumps the “landed-on” percentages out for each space along with a heat map overlayed onto the above Monopoly board. Let’s take a look at the results.

Raw results from the Monopoly simulation.

Raw results from the Monopoly simulation.

Squares landed on more often are "hotter" (tinted red) and squares landed on less often are "cooler" (tinted blue).

Squares landed on more often are "hotter" (tinted red) and squares landed on less often are "cooler" (tinted blue).

The orange property tract is certainly the most lucrative. But when you look at the raw percentages, it’s a little disappointing. The most landed-on property on the board (Community Chest #2, exactly 7 spaces away from Jail) is less than 1% more trafficked than the least landed-on property (Mediterranean Avenue).

Keep in mind though, that’s when it’s averaged out over 1000 turns, which is a massively long game of Monopoly. On top of that, those results are averaged over 2000 games. This smooths out the heat map quite a bit.

In a single game of Monopoly, the percentage swings will be much higher. The Orange properties could be red hot. They could also be colder than Mediterranean. However, given that in the long run they are hotter than the other spaces, it doesn’t hurt to try to score the orange properties. More often than not, they will see more action than any other tract on the board.