2013-06-24

Bitmask Basics

Working directly with bits is one of my favorite things to do as a programmer. I just find it fun! I haven't used these techniques recently in any of my projects, but I wanted to take some time to explain how bitmasks work, and some of their uses and advantages. Hopefully you'll agree they're very useful, and you'll see why I like them so much!

Higher level bitwise operations (also known as bit masking, bit flagging, etc) are great for a number of reasons. I find them quite fun, in addition to being useful and more efficient than other methods of Boolean comparisons and assignments. But before going into the methods and uses of bitmasking I should give a little bit of background on the topic. I'll be giving my examples in C++ because that's the language I'm most comfortable and familiar with, but bitmasking is possible in many languages.

First of all, what is a bitmask (or, what do I mean when I say bitmask)? In C++ (or any typed language), there are many data types that you are probably familiar with. These data types represent collections of bits, organized into bytes, stored in memory. We normally think about our variables and constants in terms of a given type (for example, int, double, or maybe even float), but behind the scenes an int or double still represents a contiguous set of bits.

Let's take the char data type in C++ as an easy example (it only represents a single byte, 8 bits). We can manipulate and compare data types such as chars in terms of their constituent bits. One way of terming such a collection of bits in this context is as a bitmask. This is often the case when using a char or int not to represent an actual character or integer, but rather to represent a condensed set of Boolean values.

At this point, you may be wondering: why is this useful? Isn't it simpler to just make a bunch of bool variables and compare those? Well, yes; that would be simpler, and easier for a human to read in the code. But it would necessitate long, convoluted argument lists if those bools ever needed to be passed around; not to mention that the bool type is very wasteful in languages like C++. Remember, the smallest size for a data type is one byte, 8 bits. So for C++ to store a bool in memory, it must allocate a full byte - 8 bits - even though only one bit is really needed to represent a Boolean value. Then, when making comparisons, an entire operation is wasted on each individual check against each individual bool. Holy smokes!

Alternatively, if we have a large number of Boolean variables that we need to track, instead of wasting an entire byte on a bool for each Boolean variable we want to track, we can track up to 8 Boolean values using a char's 8 individual bits, in the form of a bitmask! (Or, to expound on that idea, up to 16 values in a short int, 32 in a standard int, etc.) In addition to the efficiency achieved by storing several Boolean values' worth of information under a single primitive data type, this also means we can save a lot of time by comparing or manipulating several Boolean values in one go, as well! And of course, we can pass a bitmask as a single arg, which communicates a lot of Boolean data in a single variable, as opposed to needing to go through the hassle of passing multiple bools or encapsulating them in a container of some kind. So how does all this work...?

Example Time

Let's start by visualizing a C++ char's bits.

As an empty bitmask, with no bits turned on, our char would look like this:
00000000

If all the bits were flipped / turned on, it would look like this:
11111111

And with some bits on and some off, the bitmask would look like this:
00101101


Comparing two bitmasks against each other, or checking if a bitmask is either zero or non-zero works more or less as you would probably expect. Keep in mind, the following bitmasks are contained in char type variables. That's why they have 8 bits each.


C++ doesn't know or care that we're using our char as a bitmask. It just so happens that we are able to use bitwise operations and comparisons on our char. This means that while we may choose to perform bitwise operations and comparisons on our char, typical C++ comparisons like those on line 10 (comparing identical bitmasks) and line 14 (comparing differing bitmasks) return true and false results just as we would expect. Of course, zero / non-zero checks (such as on line 20) work as we are used to. And finally, assignment works as you would expect as well (such as on line 25).

The real value in bitmasks starts to become clear when we look at some of the special operators we can use to compare bitmasks against each other. These are the ampersand, the bar, and the carrot (& | ^). Let's take a look at what each of these operators does.

When using these operators with two bitmasks, the bit in each position of the first operand will be compared to the corresponding bit of the second operand. I call this vertical comparison, because it looks like this:
00101101
vvvvvvvv
10110110
vvvvvvvv
????????
The return value (the question marks) will be a new bitmask, whose bits will be flipped off or on (to 0 or 1 respectively) depending on the rules associated with the operator you are using. The ampersand (and, '&') checks to see if the corresponding bits in both bitmasks are both switched on, and flips the resultant bit on if so. The bar (or, '|') checks to see if the corresponding bits in either (or both) of the compared bitmasks are switched on, and flips the resultant bit on if so. Finally, the carrot (exclusive or, '^') checks specifically to see if one of the compared bitmasks has the bit in a given position switched on, while the corresponding bit in the other bitmask is switched off, and sets the resultant bit on if so.

This is a bit tricky to explain concisely, and a bit awkward to digest if you're not used to these concepts, so check out this Gist snippet for examples and visual representations. Here I show how each of these very useful operators works, and can be combined with a variety of comparisons and checks:

Real-World Example


Now that we have some idea of the mechanics of bitmask assignment and comparison, let's look at some real-world application. One challenge that lends itself particularly well to handling via bitmask is control input checking in video games. Video games, especially action-intensive games, must run at high frame rates, so performance is key and wasting a full bool data type's worth of data on each individual check is a monumental waste of space. As you'll see below, we can efficiently contain and check all the control input Boolean values we need in a single bitmask.

In this simple example, imagine a simple game with a perspective looking down onto the top of the player, who can move up, down, left or right, who can perform a regular attack and a special attack. Let's check out how we can manage the input for this control scheme using a bitmask! Note the use of constants here; having these constants available for use in assignment and checks is what makes bitmasks usable as arrays of Boolean values. Without the constants to check against, we wouldn't know what the individual bits in a bitmask represent! Also note the use of hexadecimal expressions to define the constants. I'll talk about this more in the example, but for now, just bear in mind that this is the easiest way (that I know of) to define a bitmask with a single bit flipped on in the position you want. (If you want to understand why I'm using hexadecimal the way I am, you need an understanding of how integers are represented in binary, which I won't explore in this post. An understanding of hexadecimal isn't required to understand this bitmask example, though!)
There is a lot of omitted error checking and pseudocode in this example, but it should give you an idea of how bitmasks can be useful in a real-world application.

Thanks for reading!
-- Steven Kitzes

3 comments:

Anonymous said...

Hey there, nice article.

just one little thing, that might make things easier.

Instead of
[code]const char MOVE_LEFT = 0x01;
const char MOVE_RIGHT = 0x02;[/code]

i would use

[code]const char MOVE_LEFT = 1 << 0;
const char MOVE_RIGHT = 1 <<1;
// ...
// ...[/code]

That makes it even easier to see what bit is used by a given mask.


regards.

Steven Kitzes said...

You're right! Thanks for reminding me! That's a great technique, and would make things a lot more clear. I'm going to update my post with this. Thanks again!

Anonymous said...

if you use gnu compiler, then this is better:
const char MOVE_LEFT = 0b00000001
const char MOVE_RIGHT = 0b00000010
const char MOVE_UP = 0b00000100
const char MOVE_DOWN = 0b00001000