2013-09-03

Understanding Java from a C++ Background

A short note of thanks before I begin. Some friends showed me a lot of shortcomings in the way I was thinking about this problem. That led to a huge, sweeping revision of this article. Hopefully it is more correct, coherrent and usable now.



I was born and raised in a world of C++, which treats class objects and primitives the same way. You can pass any variable around between functions either by value, or by reference. My dependency on this paradigm never became more apparent to me than when I finally made the switch to another language: Java.

C++

In C++, you can instantiate a class object, let's call it MrObject. You can then also create a pointer to it. Let's call that MrPointer. If you then decide you want access to MrObject's data in another function, there are a lot of ways you can do this. You can:
Pass a reference to MrObject into the function using the '&' operator, giving you direct access to the original instance of MrObject himself.

void MrFunction(MrObjectClass &MrObjectRef) {...}

Within MrFunction, you can use MrObjectRef just the same as if it were MrObject himself, because this is literally just creating another alias for the same exact object in memory.
You can also:
Pass by value, creating a new instance of a MrObjectClass inside MrFunction that is a duplicate of MrObject.

void MrFunction(MrObjectClass MrObjectDupe) {...}

When you pass MrObject into this function by value like this, a new class object called MrObjectDupe is created within the function. The original MrObject stays where he is, and is isolated from anything that is going on within MrFunction.
Things get a little more tricky when you create pointers to MrObject and try passing those into functions. For example, let's say you make a pointer to MrObject, like so:
MrObjectClass * MrPointer = &MrObject;

If you're not fluent in C++, what we are doing here is not creating a new MrObject. The asterisk (*) tells us that we are creating a pointer to an object of type MrObjectClass. The name of the pointer is MrPointer. The assignment here (=), combined with the ampersand (&), tells us that the specific object in memory that MrPointer points to is the one named MrObject (which must be declared elsewhere, but can, if you wanna be fancy, be defined through MrPointer via the new keyword, for example).
Now, if you want to pass MrPointer into a function, C++ also allows you to pass that either by value, or by reference. In other words, you can:
Pass a pointer by value; create a new pointer with the same value as the pointer you passed into the function. In other words, a new pointer, but one that points to the same address as the pointer that was passed into the function.

void MrFunction(MrObjectClass * MrPointerDupe) {...}

The asterisk must still be included so that C++ knows MrPointerDupe is a pointer. The side effect of this is that we can access MrObject through this duplicated pointer, but if we point MrPointerDupe at a new address (to a different object of MrObjectClass, or a new one, or to null), then the original MrPointer will remain unaffected, still pointing at (and modifying, if dereferenced) our original MrObject.
Alternatively, you can:
Pass a pointer by reference; in other words, we are not creating a new pointer to MrObject, we are using the same original MrPointer; we are just creating a new alias for MrPointer to be used within MrFunction.

void MrFunction(MrObjectClass * &MrPointerAlias) {...}

The asterisk in this case tells us that we are declaring a pointer to an object of type MrObjectClass, and the ampersand tells us that the pointer we are creating is not a new pointer, but just an alias (another name) for the existing pointer that is getting passed in to MrFunction.

Java

The nature of Java variables may not seem clear at first, coming from a C or C++ background. In Java, every variable you declare is a reference (except for primitives like int or boolean). Even primitive types such as int have Java class equivalents that give you reference type variables for handling primitive types (such as Java's Integer class).

Why is this so important? Just taking one important example, in C++, you might be used to creating a new instance of a class object, then being able to copy it into another second class object, like so:
// C++ code to create a new instance of MrObjectClass called MrObject.
MrObjectClass MrObject();

// C++ code to create a second, separate instance of MrObjectClass called
// MrDuplicate, by copying MrObject. Note that this results in two
// distinct class objects in memory.
MrObjectClass MrDuplicate = MrObject;
By contrast, in Java, when you instantiate a class object like so:
MrObjectClass MrObject = new MrObjectClass();
you are not creating a class object called MrObject. You are implicitly creating a reference to an object of type MrObjectClass, and that reference is called MrObject. In the following Java snippet, the two declarations result in only one distinct class object instance, which has two reference variables referring or "pointing" to it. (Though these variables are not pointers, I found it useful to think about them as such in the beginning, as a sort of "training wheels" approach to getting going in Java. However, of course you should try to build an understanding of Java's implicit referential nature as early as possible.)
// Java instantiation of a reference called MrObject to a nameless object
// of type MrObjectClass
MrObjectClass MrObject = new MrObjectClass();

// Java instantiation of a second reference called MrDuplicate, which
// actually refers or "points to" the same nameless object as MrObject
// refers to, resulting in only a single nameless MrObjectClass object
MrObjectClass MrDuplicate = MrObject;
It's very important to understand this so that you will understand what happens the first time you attempt to duplicate a class object, then modify your new variable only to see the changes reflected in your first variable as well! In other words, in Java, you cannot create a MrObject that you have "direct" C-style access to. Also, perhaps most importantly, it's critical to remember that reassigning MrObject or MrDuplicate does not overwrite any data! MrObject and MrDuplicate are references, and reassigning them simply points them to new memory locations. This can, in turn, potentially leave data unreferenced, in which case Java automatically cleans it up (but this unpredictable, sometimes seemingly sporadic process, often called garbage collection is another topic for another day).

Another reason this concept is so important to grasp early on in a programmer's exposure to Java, is because Java does not support pass-by-reference! In other words, even if you get used to the fact that everything in Java is a reference, you then have to accept that all functions in Java are pass-by-value! In other words, if I pass my MrObject reference from the example above into a function, the result is that within that function I cannot directly manipulate the original MrObject, ever!

What's happening within the function is we are creating a new reference variable whose value (referred or "pointed" address) is the same as MrObject's, because that is what's passed by value. Using the new reference inside the function, we can access our referred, nameless class object's public members. But if I try to reassign my reference variable from within my function, only the duplicated reference variable within the function will refer to the new destination address (nameless class object instance) I designate (including, possibly, null); the original MrObject reference outside the function will still point to (and protect from garbage collection) the original address (nameless class object) it originally referred to in the first place.

Are there ways to simulate pass-by-reference for those instances where you really want that behavior? Sort of (storing reference variables in an array or container class and passing the array or container itself into functions by value to preserve the original pointer-like variables within is the closest I can come up with so far, though these can feel like clumsy substitutes for true pass-by-reference). But when possible, it's better to just be aware of Java's idiosyncrasies and think in a Java-like way; its lack of pass-by-reference, and its requirement that all non-primitive class object variables are references; and to design around these features to the extent possible.

Thanks for reading!
-- Steven Kitzes

No comments: