A little while back I answered a post on the PH forums about why you might need to use casts, and particularly why you might need to upcast. This article is a tidied up and extended version of my answer.
Variable vs. value types
Suppose we write a class B that inherits from class A. We can then say that B is a subclass of A. That means that you can use an object of type B anywhere that you can use an object of type A.
To understand upcasting and downcasting better, we need to introduce the idea of container or variable types and contrast them to value types. A variable may contain a value. For reference types, the variable has a type that states what type of value it can contain. However, the type of the value may be different; namely, it may be a subtype of the variable type. Let's clarify this with some examples.
A foo = new A(); // Variable type A, value type A
B bar = new B(); // Variable type B, value type B
A baz = new B(); // Variable type A, value type B
B gah = new A(); // Error, A is not a subclass of B
Downcasting
Take the case where we have a value of type B stored inside a variable of type A.
A baz = new B(); // Variable type A, value type B
Because the variable baz is of type A, we can only call any methods or access any properties that A has. Note that if B overrides a method that A has, we call the method inside B; what I really mean here is that we can only call things that the interface of A and B have in common.
If we want to be able to do things that an object of type B can do, but not an object of type A, then you'd have to put it into a B container. For that you need a (dynamic) runtime check that the value in baz really is of type B, so you have to write a downcast:
B ok = (B)baz;
You often need to downcast stuff when using non-generic collections; thankfully, generics save us from that. It is also possible to downcast without having to use an intermediate variable if you just want to do one thing with baz when you are treating it as a B:
((B)baz).DoSomething();
Upcasting
Earlier I said that you can use a B wherever you can use an A without having to write a cast. If this is the case, why would you ever need to upcast? Let's write two classes with public fields of the same name and put them in an inheritance relationship.
class A {
public int x;
}
class B : A {
public int x;
}
Now, as we did before, we will look at various combinations of variable and value types.
A foo = new A();
B bar = new B();
A baz = new B();
What happens if we try and access the field x for each of these?
- foo.x is the int x defined in A
- bar.x is the int x defined in B
- baz.x is the int x defined in A (because of the container type)
The first two are quite obvious, but the third one is more of a surprise. We know that we instantiated an object of type B, but when we access a field it is the variable type A that determines what we look at. This is different to methods, where you will call the most overridden method no matter what the container type is.
We'll look at why in a moment, but first let's deal with the practical problem that leads to upcasting. If we have:
B bar = new B();
Then bar.x will always refer to the field x defined in the class B. What if we need to get at the field x in class A, though? In that case, we can use an upcast.
((A)bar).x
Here we have said "let's treat bar as if it's an A", and then accessed the field x that was defined in A.
Object layout in memory
The last section may have left you wondering, "why on earth does it work that way". The answer lies in the way that fields of objects are stored in memory.
Suppose we have the following classes:
class A {
int x;
}
class B : A {
int y;
}
class C : B {
int x;
int y;
int z;
}
If we instantiate classes A, B and C, in memory they will be laid out like this.
+---------+ +---------+ +---------+
| v-table | | v-table | | v-table |
+---------+ +---------+ +---------+
| A.x | | A.x | | A.x |
+---------+ +---------+ +---------+
| B.y | | B.y |
+---------+ +---------+
| C.x |
+---------+
| C.y |
+---------+
| C.z |
+---------+
Basically, we store the fields of the most derived class last and the fields of the least derived class first. The clever part is that if you were to store an instance of C in a variable of type A or B, then the fields of A and B are at the same position in memory, relative to the start of the object, as if the object was of type A or B too. This is why we can safely use a subclass in place of its parent.
The compiler cannot always know what type of object will be stored in a variable. However, it always knows the type of the variable. Therefore, when the program is compiled field offsets are always computed based upon the variable type.
The final question is, why are methods different? The answer lies in the v-table. This is a lookup table of methods that is dependent on the type of the object rather than the class. When a method call is done, the method to call is looked up in the v-table. The offset of a method in the vtable is always known at compile time, but which method to call will not be known until runtime. This is essential for polymorphism.
Silly casts
There are some casts that are not upcasts and downcasts. Instead, they try and cast between two types that have no inheritance relationship, and will therefore always result in a failure at runtime. These are often known as silly casts.
Coercions
Note that some things that look like casts are actually coercions, not casts. This can be confusing since the syntax for casting and coercion is the same. The following code demonstrates type coercion.
float a = 4.5;
int b = (int)a;
Here we take a float and transform it into an integer. Floats are usually stored in IEEE single precision floating point format, whereas integers are usually stored in twos complement. Additionally, integers have no fractional part. Therefore, the actual representation of the data in memory is changing.
Casts don't actually change the data - they just change the type that we are viewing the data as being. Coercions differ in that they will change the underlying representation, and this conversion may be lossy.
Summary
Here's a quick recap on the main points.
- A variable and the value that it holds may not have the same type, but they will always be in an inheritance relationship
- If B is a subclass of A, you can use an object of type B anywhere than you can use an object of type A
- Downcasts move us down the inheritance hierarchy, while upcasts move us up it
- When accessing fields, the field to access is decided based upon the variable type and not the value type
- Method calls, however, depend on the value type
- Casts do not change representation or data in any way, just the way we view it
- Coercions do change the representation as well as the type