Computer Science 010

Lecture Notes 3

Pointers and Arrays

Comments on Practice 2

Variable declarations must precede statements within a function.

No input is sent to your C program until the user enters <CR>.

Last question. What does this code do?

int main () {
    int a[4], i;
   
    for (i = 1; i <= 5; i++) {
        a[i] = 0;
    }
}

The problem with this function is that it does not index the array properly. Legal values for the array index are 0 through 3. The C compiler and runtime system do not check this, however. When the program modifies a[4] and a[5], it writes into memory following the array. With the program as it is above, it crashes with a segmentation fault, meaning that the program accessed memory it was not allowed to access. If we changed it to the following code:

int main () {
    int i, a[4];
   
    for (i = 1; i <= 5; i++) {
        a[i] = 0;
    }
}

Something different happens- the program runs forever. It turns out that the address it uses for a[5] is the address where the variable i is located. As a result a[5] = 0 really assigns 0 to i. This causes the loop to continue infinitely since i never gets bigger than 5!

Pointers and Arrays

You've seen arrays now in C. Except for the fact that they do not know their own length, they are very much like arrays in Java.

You've also seen pointers now in C. Last time, we talked about pointers to structs.

Today, I'll show you that pointers and arrays are actually very similar. This should be surprising.

Let's look at some declarations:

int a1[10];
int *intPtr;

The first is an array of 10 integers. After the declaration, memory has been allocated to hold the 10 integers but the values inside the array are uninitialized. The second declaration is a pointer to an integer. Memory has been allocated to hold a pointer but the pointer value is uninitialized. I could initialize the pointer as:

intPtr = a1;

This doesn't look right, though. The type of the variable on the left is int *, while the type of the variable on the right is int []. It turns out that, in C, arrays are really pointers for which memory is allocated in their declaration. So, the assignment above is correct.

What about:

a1 = intPtr;

This is not allowed. While a variable declared as an array is really an initialized pointer, it turns out that it is also a constant pointer. Its value cannot be changed and as a result, it cannot appear on the left hand side of an assignment statement.

So, you might wonder, if I can assign an array to a pointer variable, can I use an index with a pointer variable? The answer is yes. Something reasonable will happen if the subscript is valid with respect to the array assigned to it. Otherwise, bizarre things will happen. The following is ok:

intPtr = a1;
intPtr[0] = 0;   /*  Changes intPtr[0] and a[0]  */
intPtr[10] = 0;  /*  Wrong.  Goes past end of array.  */

Walking through arrays using subscripts

You can walk through arrays in C using subscripts, just as you did in Java. Your answer to the previous practice assignment to find a minimum integer value in an array, probably looked something like this:

int minarray (int a[], int size) {
  /* The minimum value found.  Initially, the minimum is just the
     first element. */
  int min = a[0];
   
  /* A counter to step through the array */
  int i;
   
  /* Walk the array.  We can start at 1 since we have already
     evaluated the 0th element. */
  for (i = 1; i < size; i++) {
    /* If the next value is smaller than the current minimum, remember
       it. */
    if (a[i] < min) {
      min = a[i];
    }
  }

We indicated that when parameters are passed in C, the argument value is copied into the parameter. As a result, if I change the parameter, it has no effect on the original argument. So, that would suggest that if I have an array parameter, the array is copied when the function is called. I could then change the array contents without affecting the original. Let's make a silly change to our function above:

int minarray (int a[], int size) {
  /* The minimum value found.  Initially, the minimum is just the
     first element. */
  int min = a[0];
   
  /* A counter to step through the array */
  int i;
   
  /* Walk the array.  We can start at 1 since we have already
     evaluated the 0th element. */
  for (i = 1; i < size; i++) {
    /* If the next value is smaller than the current minimum, remember
       it. */
    if (a[i] < min) {
      min = a[i];
    }
    a[i] = -1;
     }
  size = size + 2;
   }

Notice the two new lines shown above in red. I am modifying each entry of the array parameter but the first to be -1. I am also changing the value of the size parameter. Let's run a program that prints out the arguments on return and see what is happened.

What you should observe is that the second argument is not changed but the first argument is changed!

What happens is that when we pass an array parameter, we are really passing the address of the array. After all an array is really a constant pointer with memory allocated in the declaration. Since we are passing a pointer to a chunk of memory, when that memory changes it affects the original value since no copy has been made.

In fact, I could have declared my function as the following and left everything else the same.

int minarray (int *a, int size);

Arrays as return values

Some of you noted yesterday that you couldn't figure out how to declare a function that would return an array. In fact, there is no syntax in C to allow this. You can have the effect of returning an array, however, by declaring that the function returns a pointer. Your return statement could still give an array name to return.

int *id (int a[]) {
  return a;
}
   
int main () {
  int a[] = {0, 1, 2};
  int *b = id (a);
  printf ("b = {%d, %d, %d}\n", b[0], b[1], b[2]);
  return EXIT_SUCCESS;
}

A Point on Style

Some C hackers may claim that pointer arithmetic is better than subscripting to access elements in an array. It is true that it may save you a few assembly instructions depending on how it is used. Or it might not. Compilers with good optimization can do very well. Furthermore, desktop applications spend the vast majority of their time waiting for input from a human. The human will not notice any speed improvements on a modern computer due to pointer arithmetic unless you are manipulating truly huge arrays.

On the other hand, most programmers find subscripting easy to understand and pointer arithmetic less so. Since the vast majority of the expense of software is due to the cost of programming labor, anything that decreases the time spent programming is generally worth it. Furthermore, if anybody needs to maintain code it will generally be easier to understand if it uses subscripting rather than pointer arithmetic.

My advice is that unless you have real-time concerns or you have demonstrably proven that a chunk of code is too slow when using subscripts, you should use subscripts. Reserve pointer arithmetic for those rare situations where you absolutely need the speed (and then examine the assembly code generated by the compiler to see if you really have improved the speed!)

Unix utilities and pipes

The command grep searches a file for occurences of a word. For example grep freund index.html prints out all lines in index.html that contain "freund". The command "wc" counts the number of lines/words/characters in a file. Using it is easy: wc index.html. One of the strenghts of Unix is how easy it is to connect programs to do more complicated things. You link the output of one program to the input of another with the pipe operator "|". For example, grep freund index.html | wc finds all lines in index.html containing freund and then counts how many there are. You could also do grep freund index.html | more to scroll throung the output of grep more easily. You can also send program output to a file with ">", as in grep freund index.html > output.txt. And do even more complicated things: grep freund index.html | grep wombat | wc > output.txt.

Back to CS 010 Home Page