Pointers and Arrays - Part II
by Vijay Kumar R. Zanvar
4. Address Arithmetic
4.1 Address Constants
An expression is said to be a constant expression, if can be evaluated during translation (compilation) rather than runtime. An address constant, therefore, is the address of an object that is known during translation.
Following are the address constants in the C programming language:
- A null pointer. The macro, '''NULL''', is the null pointer. It is '''#defined''' in <stddef.h>, <stdio.h>, <locale.h>, <stdlib.h>, <time.h> and <wchar.h>.
The macro NULL is generally defined as:
#define NULL 0
or
#define NULL ((void*) 0)
- A pointer to an '''lvalue''' of an object of static storage duration. For example, in the following artificial sequence, the addresses of a and b can be determined during translation.
int a;
void func ( void )
{
static int b;
int *ptr = &a;
...
ptr = &b;
}
- A function designator, such as '''func''' in the example above, has a constant address during the program's life.
- An integer constant cast to proper type. In the example below, the expression (unsigned long*) 0x1000 is an address constant.
/* reg_x: a fictitious memory-mapped register */
const volatile unsigned long *reg_x = (unsigned long*) 0x1000;
- The unary '''&''' (address) operator also creates an address constant. See the function '''func''' above.
4.2 Why Pointers Should Have Data Types?
Let us assume that an address in a hypothetical machine is 32 bits long. The addressing of a byte or word will, therefore, require a 32-bit address. This suggests that a pointer (as pointers store addresses) should be capable enough to store, at least, a 32-bit value; no matter if it points to an integer or a character. This brings in a question: Why pointers should have data types when their size is always 4 bytes (in a 32-bit machine), irrespective of the target they are pointing to?
Before we see why pointers should have data types, it would be beneficial to understand the following points.
The C programming language:
- has data types of different size; i.e., objects of different types will have different memory requirements
- supports uniformity of arithmetic operations across different (pointer) types
- does not maintain data type information, unlike C++, in the object or executable image
When objects of a given data type are stored consecutively in the memory (that is, an array), each object is placed at a certain offset from the previous object, if any, depending on its size. A compiler that generates code for a pointer, which accesses these objects using the pointer arithmetic, requires information on generating offset. The data type of the pointer provides this information. This explanation gives a good reason for the point '''1''' above.
The point '''2''', above, is also a reason why pointers should have data types. Sizes of various data types are basically decided by the machine architecture and/or the implementation. And, if arithmetic operations were not uniform, then the responsibility of generating proper offset for accessing array elements would completely rest on the programmer, which, in turn, has the following drawbacks:
- A programmer is prone to commit mistakes: typo mistakes, providing wrong offsets, etc.
- Porting the code to other implementations would have required changes, if data type sizes differ. This would have lead to the portability issues.
Once the translation of the C code completes, the compiler leaves out putting the data type information in the final object code.
4.3 Operations on Pointers
4.3.1 Multiplicative Operations on Pointers
Multiplicative operations (
*, % and
/) on pointers or arrays are not allowed. Give your careful attention to the following description (given by Barry Schwarz), which is a dependable answer to a question on pointer multiplication.
On Tue, 6 Jan 2004 12:37:08 +0530, "Vijay Kumar R Zanvar"
<vijoeyz@hotpop.com> wrote:
>Hi,
>
> Why multiplication of pointers is not allowed?
>Till now I only know this, but not the reason why!
>
>PS: As a rule, I searched the FAQ, but could not
>find an answer.
Adding an int to a pointer results in pointing to something a
specified distance further to the "right" in memory.
Subtracting an int from a pointer results in pointing to something a
specified distance further to the "left" in memory.
Subtracting one pointer from another results in how far apart the two
memory locations are.
If your program and data were to be magically relocated as a unit in
memory, each of the above expressions would still produce the same
result.
Until you can define a concept of either adding two pointers or
multiplying two pointers that meets the constraint in the previous
paragraph, the two operations make no sense. (Hint: others have
thought this through and decided such a definition is either not
possible or of no programming value.)
<<Remove the del for email>>
It, however, is possible to cast a pointer type to an integral type before a multiplicative operation; but it is a not an advisable thing to do. See the Section
6.1.
4.3.2 Additive Operations on Pointers
When an operand of the + (plus) operator has a pointer type, the other operand ''must'' be of integer type; the type of result is that of the pointer operand. The pointer operand could be a pointer to a non-array object, or an array element; the former case, however, does not make a good logic.
In the following example, '''ip''' is analogous to a pointer to an element of an array of length one.
int i;
int *ip = &i;
*ip++; /* undefined behaviour */
Consider the expression '''ptr''' points the last element of '''int ia[4]'''. Following code fragments illustrate few facts:
ptr = &ia[4]; /* See point 1 */
if ( *(ptr+1) ) > 1 ) /* See point 2 */
{ /* ... */ }
Address of the one past the last element:
- can be taken for computation purposes
- can not be used to access the location for modification; doing so is an undefined behaviour
An integer can be subtracted from a pointer type; the result has the type of the pointer operand. A pointer, however, can be meaningfully deducted from another pointer, if and only if they both point to the members of an array. For a well defined behavior, the result of the subtraction of pointers should point to an element or one past the last element of the array.
Chapter 5 Multi-dimensional Arrays
C provides multi-dimensional arrays in concurrence with one-dimensional arrays. A Multi-dimensional array can be visualized as a matrix with row and columns. The following statement, for instance, is a declaration of a two-dimensional array of '''int''', which has three rows and four columns.
int ia[3][4];
A pictorial visualization of the array above is shown below:
http://www.programmersheaven.com/articles/vijoeyz/Pointers-and-Arrays-part-2/image001.gif
Multi-dimensional ''arrays are actually arrays'' of arrays. The general interpretation of '''ia''' as a two-dimensional array of '''int'''s with three rows and four columns is not proper, if not wrong. The next section, in addition to interpretation of '''ia''', also describes various aspects of multi-dimensional arrays in more detail.
See also: Sections '''3.2, 5.1.'''
5.1 How Multi-dimensional Arrays are Stored in Memory?
Technically speaking, the only category of arrays the C language has is one-dimensional arrays.
The array
ia above (Section 5) is actually a one-dimensional array (of arrays). Its actual internal interpretation is like this:
ia is a one-dimensional array of three elements, and each of the three elements is an array four of
ints. (That is,
ia is a one-dimensional array three of array four of
ints.)
A matrix representation of a multi-dimensional array offers an easy picture to the mind; however, the array is not organized as a matrix in the memory, but as a one-dimensional array. From the Section
3.2, we know how a one-dimensional array is organized in the memory. Hence, following is how
ia is stored in the memory:
http://www.programmersheaven.com/articles/vijoeyz/Pointers-and-Arrays-part-2/image002.gif
C employs the row-major method, that is, the rightmost subscript varies first.
5.2 Decaying of Multi-dimensional Arrays into Pointers
Not always does an array decay into a pointer; see the Section
3.3.4 for more information on this. However, when an array decays to a pointer, its interpretation is always pointer to the first element. The same concept applies for a multi-dimensional array (array of [array of [...] ] array, actually). Following table illustrates the decaying of arrays with few examples:
| Declaration | Usage | Interpretation |
| int a[5] | a | int * |
| int a[4][5] | a | int (*)[5] |
| a[1] | int * |
| a[1][2] | int |
| int a[3][4][5] | a | int (*)[3][4] |
| a[1] | int (*)[5] |
| a[1][2] | int * |
| a[1][2][3] | int |
| void fn ( int * ); | int a[5]; fn ( a ); | Decays to a pointer to the first element |
| void fn ( int (*)[] ); | int a[5]; fn ( &a ); | Pointer to the array itself; does not decay to a pointer. |
Chapter 6 Type Conversion Between Pointers and Other Types
Situations often arise in programming practice when a pointer is an operand of an expression with an incompatible operand type. Pointer behaviour, described in the following subsections, under such situations should be familiar to the programmer.
6.1 Pointers and Integers
In C89, pointer and integers were considered equivalent and, hence, interchangeable. This was because pointers were uniform with the size of some integer types. As C has now been implemented on much architectures, it is not portable to assume pointers and integers equivalent. On some architecture - for example, embedded micro-controllerrs - pointers can be wider than integer types.
K&R-II (page 102): "Pointers and integers are not interchangeable ..."
Note the following points:
- An integer, converted to a pointer type, causes implementation-defined behaviour. In addition, it may not properly be aligned and may not point to the intended memory location.
short s = 3;
long *lp;
lp = s;/* implementation-defined behaviour */
lp = (long *) s;/* OK. But may hide a potential bug */
- A pointer, converted to an integer type, also causes an implementation-defined behaviour. In addition, few other aspects to note are:
- the result may not be representable by the target integer type
- the result need not be in the range of the target integer type
- the result may not properly be aligned
- The only integer value that can be safely converted to a pointer of any type is constant 0.
int *ip = 0;/* OK. */
- C99 understand this dilemma of integer and pointer conversions and, hence, provides a standard portable way of inter-mixing them. A conforming implementation may provide the following optional integer types:
#include <stdint.h>
intptr_t
uinptr_t
A valid pointer to '''void''' can be converted to this type and back to the pointer to '''void'''. For example,
struct some_struct *sp = ... ;
intptr_t ip = (void *) sp;
sp = (some_struct *) (void *) ip;/* sp still points to original object */
6.2 Pointers of Different Types
A pointer to '''void''' is the generic object pointer; a pointer to '''void''', therefore, can be converted to pointer to object of any type (except function pointers) and back to that type, and vice versa. Following are some important piece of information on pointer conversions:
- If a pointer to one type is required to be converted into a pointer to another type, an explicit cast is required.
int i, *ip = &i;
short s, *sp = &s;
sp = (short *) ip;/* OK */
A null pointer can be converted into pointer of any type
char *cp = NULL;/* OK */
int *ip = NULL;
In the above, after the initialization the type of null pointer is pointer to '''int''' and pointer to '''char''', respectively.
When converting a pointer type to another pointer type, the behaviour is undefined if the resulting pointer is not correctly aligned.
struct some_struct1 *sp1;
struct some_struct2 *sp2;
sp1 = (struct some_struct1 *) sp2;
When a pointer to an object is converted to pointer to '''char''', the result points to the lowest address of the object. In the following example, if the '''sizeof (si) == 2''', and addresses of each byte are 0x1000 and 0x1001, then '''cp''' points to object in the location 0x1000.
short int si;
char *cp;
cp = (char *) &si;
Consider the following example:
char c = 0x10;
int *ip = (int*) &c;
*ip = 0x2030;
Let '''sizeof (int)''' == 4. The pointer, '''ip''', points to an object of '''char''', which is of one byte. But later in the statement, it is accessed as a 4-byte value. This may result into overwriting the adjacent memory locations, resulting into an undefined behaviour. On a Linux machine, this situation generally results into segmentation fault.
6.3 Function Pointers of Different Types
- A conversion between pointers to functions that have different parameter-type information should be done with an explicit cast. If such a converted pointer is used to invoke the pointed to function, the behavior is undefined.
int func1 ( void );
int (*fptr1) ( void ) = func1;/* OK */
int func2 ( short int );/* OK */
int (*fptr2) (short int) = func2;
fptr1 = ( int (*) (void) ) = func2;/* OK */
(*fptr1) ();/* undefined behaviour: func2 takes
short int argument */
- A pointer to a function should not be converted into a pointer to an object or pointer to void, and vice versa; doing so is invalid.
Chapter 7 - Miscellany
Topics not covered in the earlier sections are mentioned here.
7.1 Pointer Initialization
Initialization provides an object a starting value before a program begins execution, whereas an assignment changes the value of the object during the execution. The following list itemizes various aspects of pointer initializations, including new C99 features:
- A pointer declaration with automatic storage and without an initializer assumes an unspecified value, hence it is recommended that pointer variables of such type should be initialized to null. It must be made to point to a valid location before using it, even if it was initialized to null.
K&R-II, Page 102: "C guarantees that zero is never a valid address for data ..."
char *cp; /* not recommended */
char *ncp = NULL; /* recommended */
- A pointer declaration with static storage has following properties:
- if not initialized, it is initialized to a null pointer by the implementation
- if initialized, the initializer must be a compile-time constant expression
{
static char cas[] = "Avada Kedavara!";
static char *cps = { "Petrifucus Totalus!" };
char c;
static char *cp = &c; /* error: &c is not constant */
}
- The size of an array of unknown size is determined by the largest index of the initializer list. If there are fewer elements in the initializer list than the size of an array, then the remainder of the array is filled with zeroes.
{
int ia[] = { 1, 2, 3, 4, }; /* array size is 4 */
int ib[3] = { 1, 2, }; /* ib[2] == 0 */
int ic[2] = { 1, 2, 3, }; /* error: initializer list size
exceeds array size */
}
- C99 introduces selective initialization of an array. The following example illustrates this:
int ia[5] = { 0, 1, [4] = 4, };
In the above, '''ia[2]''' and '''ia[3]''' are zeroes.
- In C99 implementations, initialization of arrays can be done from both ends.
int ia[NUM] = { 0, 1, 2, [NUM-3] = 7, 8, 9, };
In the above, if '''NUM''' is greater than 6, then the elements indexed between 3 and '''NUM-2''' are initialized to zeroes; if it is less than 6, some of the first initializers will get overridden.
- C99 introduces a new concept called compound literals. Compound literals are used for representing constants of aggregate or union types, hence they are not modifiable lvalues.
struct type_t { int a, b; };
struct type_t s1 = ( type_t ) { 10, 20 };
struct type_t s2 = ( struct type_t { int a, b } ) ( 30, 40 );
int *p = (int []) { 10, 20 }; /* p points to the first element
of an array of two ints */
7.2 Flexible Array Members
It is a common practice among advanced programmers to use a technique called structure hacking. In this technique, the last member of a structure is a pointer to the given type. When allocating a storage for an object of the given structure type, an extra amount of storage is set aside to be accessed as a member. The following example illustrates this concept:
#define SIZE 5
struct s {
int i;
/* ... */
int *ip; /* equivalently, int ip[1]; can also be used */
};
{
/* read the explanation below */
struct s *sp = malloc ( sizeof *sp + sizeof (int) * (SIZE-1) );
/* code to check and initialize sp */
for ( int i = 0; i < SIZE; i++ ) /* declaring i, here, is allowed in C99 */
sp -> ip[i] = i;
/* ... */
}
In the above, while allocating storage for the '''struct''' object, '''sp, sizeof (int) * SIZE''' many extra bytes are allocated. The memory layout of the object, then, looks like this:
http://www.programmersheaven.com/articles/vijoeyz/Pointers-and-Arrays-part-2/image003.gif
This method, in effect, creates a structure with variable-size array. Semantically speaking, '''sp''' has no elements beyond '''ip'''. Accessing a structure object beyond its last object is an undefined behaviour. Since the concept of structure hacking is not mentioned by the Standard, an implementation is free to support this concept in any way it prefers, raising portability issues.
The C99 committee, keeping useful facilities the structure hacking technique provides in mind, has introduced a new feature called flexible array members.
A flexible array member is the last member, which is an array of incomplete type, of a structure with at least one named member. Following structure, which is portable, is equivalent to '''struct s''', above:
struct ss {
int i;
/* ... */
int ip[];
};
However, there are few constraints that apply on flexible array members. The following list summarize them:
- There must be at least one element in the structure and the flexible array member must be the last element.
- A structure having a flexible array member cannot occur in other structures or arrays.
- The '''sizeof''' operator ignores flexible array member while calculating the size of the structure.
7.3 Dangling Pointers
A dangling pointer (also known as a wild pointer) is a pointer, which does not point to a valid memory location. By validity of a location, we mean that a running process has certain restrictions on accessing memory locations that do not fall under its address space.
A pointer not handled properly can produce serious bugs or a badly behaving program. Dangling pointers get, or can be, created in several ways. The following list gives you an idea about dangling pointers: their sources of creation, methods of prevention and effects.
- A straightforward example can be the following one:
{
char *cp = NULL;
/* ... */
{
char c;
cp = &c;
} /* The memory location, which c was occupying, is released here */
/* cp here is now a dangling pointer */
}
In the above, a better solution to avoid the dangling pointer is to make '''cp''' a null pointer after the inner block is exited.
- The design philosophy of C make a compiler to believe that the programmer knows what he is doing. Though a code analysis tool, like '''lint''', can help in finding potential programming mistakes, it is up to the programmer to ensure a good behaving program. As stated earlier, misapplied pointers can create a badly behaving program. Following paragraph points up an example.
A dangling pointer in a program, by definition, points to a memory location outside the process space. The location pointed to by the dangling pointer may or may not contain a valid object. If modified, the valid object's value will change unexpectedly, distorting the performance of the process owning the object. This condition is called memory corruption. This could lead the system's state into a vicious circle, crashing it ultimately.
A clear-cut technique to avoid dangling pointers is to initialize them to '''NULL''', whenever they are declared and no more required.
- A common programming misstep to create a dangling pointer is returning the address of a local variable.
char * func ( void )
{
char ca[] = "Pointers and Arrays - II";
/* ... */
return ca;
}
In the above, if it is required to '''return''' the address of '''ca''', declare it with the '''static''' storage specifier.
A frequent source of creating dangling pointers is a jumbled combination of '''malloc()''' and '''free()''' library calls. A pointer becomes dangling, when the block of memory pointed it is freed.
#include <stdlib.h>
{
char *cp = malloc ( A_CONST );
/* ... */
free ( cp ); /* cp now becomes a dangling pointer */
cp = NULL; /* cp is no longer dangling */
/* ... */
}
Next Page