Computer Science 010

Lecture Notes 2

 Introduction to C and Pointers

Structs

Structs are as close as C gets to classes (not very close). A struct allows you to group together several data declarations so they can be treated as a group. Thus, a struct is like a class that has only instance variables, not methods. For example, in Java, you might define a Money class that has instance variables to hold dollars and cents:

public class Money {
   private int dollars;
   private int cents;
   
   public Money (int theDollars, int theCents) {
      dollars = theDollars;
      cents = theCents;
   }
   
   public int getDollars () {
      return dollars;
   }
   
   public int getCents () {
      return cents;
   }
   
   public void add (Money moreMoney) {
      cents = cents + moreMoney.getCents();
      if (cents >= 100) {
         cents = cents - 100;
         dollars++;
      }
      dollars = dollars + moreMoney.getDollars();
   }
}
   

In C, we could group the instance variables together in a struct.

typedef struct {
    int dollars;
    int cents;
} Money;
   

The constructor and methods are replaced with C functions as explained below.

 

Functions

Another major difference between C and Java is that C does not have methods. Instead, statements are encapsulated into named units called functions. A function declaration is quite similar to a method declaration except that it does not occur within a class definition.

Functions are invoked in a manner similar to methods except that they are not sent to an object. Thus, the object.method syntax is replaced by a simple function name. Since functions are not sent to an object, there is no this identifier. You can think of the object that a method is sent to as an implicit parameter. Since it doesn't exist in C, typically a C function will have an additional parameter to make up for this missing implicit parameter. For example, we can write the methods above as:

   int getDollars (Money m) {
      return m.dollars;
   }
   
   int getCents (Money m) {
      return m.cents;
   }
   
   void add (Money someMoney, Money moreMoney) {
      someMoney.cents = someMoney.cents + getCents(moreMoney);
      if (someMoney.cents >= 100) {
         someMoney.cents = someMoney.cents - 100;
         someMoney.dollars++;
      }
      someMoney.dollars = someMoney.dollars + getDollars(moreMoney);
   }

Notice that each function has an additional parameter identifying the structure whose fields are being accessed. Also, notice that there are no "public" and "private" keywords in C.

Prototypes

It is generally a good idea to declare all of your functions at the top of your files and to define them later. C requires that all functions be declared before they are used. A function declaration is also known as a prototype. It simply defines the return type of the function, the function name, and the types and names of the parameters. This single line ends in a ; instead of being followed by the function definition. Thus, the prototypes for the functions above are:

int getDollars (Money m);
int getCents (Money m);
void add (Money someMoney, Money moreMoney);

I/O

In order to do input and output, you must add the following line to the beginning of your program:

#include <stdio.h>

In a later class we will describe exactly what this means. For now, it is sufficient to know that it includes in your program the definitions of the printf and scanf functions required to do I/O.

printf

Both printf and scanf take similar arguments. The first argument is a formatting string. A formatting string is a string with literal characters, escape characters, and conversion specifications. Let's look at printf first:

printf ("Happy 2003!\n");

In this case, the string contains only literals and the escape sequence \n. \n represents a carriage return. Executing this printf statement results in the following output:

Happy 2003!

More commonly, you will want to output the values of variables or expressions. To do this, you include conversion specifications in the formatting string and extra arguments following the string that represent the expression to be output. The conversion string is a % followed by a character that identifies the type of the expression being output. For example, here is another way to output the same string:

year = 2003;
printf ("Happy %d!\n",year);

%d indicates that an integer will be output. year is the integer expression to output. It is possible to put more than one conversion specification in a printf as follows:

year = 2003;
punctuation = '!';
printf ("Happy %d%c\n",year, punctuation);

%c is the specification for a character.

C does not check that the conversion specification matches the type of the expression or even that the number of conversion specifications match the number of arguments!

scanf

scanf is similar syntactically to printf although its purpose is to read input from the keyboard. Again, the first parameter to scanf is a formatting string. The following arguments are variables to which the input values should be assigned. Also, note that the variable names must be preceded by & when using the integer and character types already discussed. For example,

scanf ("%d/%d/%d", &month, &day, &year);

This expects the user to enter a date such as 1/7/2003. The % sequences define the type to read in. The / characters are character literals that scanf will match. If the characters typed by the user cannot be interpreted as the appropriate type, scanf will stop processing input and will return with only some of the variables being set.

getchar

getchar is another function that can be used to get keyboard input one character at a time. This function takes no arguments and returns a single character:

c = getchar();

The character returned might be whitespace (tab, newline, blank) or any other character. It is most useful when the input has little structure or when you are expecting a single character input.

Sample Program

/* 
   This is a program to draw a happy face or sad face on the screen.
   It takes input from the user.  h draws a happy face.  s draws a sad face.  
   The program quits when the user enters q.
*/

/* Note that these characters are the only way to enter
   comments in C.  // does not start a 1-line comment
   as in Java. */

/* stdio.h defines printf, getchar, and EOF */
#include <stdio.h>

/* Include some common definitions, including EXIT_SUCCESS */
#include <stdlib.h>

/* Define my own boolean type */
typedef int bool;
#define TRUE 1
#define FALSE 0

/* An enumerated type defining face types. */
typedef enum {
  happy,
  sad
} Mood;

/* A prototype of the function to draw faces. */
void face(Mood which);

/* This is how you declare the main program in C. */
int main() {
  int c;     /* The character indicating what type of face to draw. */
  int c2;    /* Character we read into until we find the end of the line. */

  /* Loop until the user enters q to quit the program. */
  while (TRUE) {
    /* Prompt the user for the next input and get their input. */
    printf("Enter h for a happy face, s for a sad face, and q to quit: ");
    c = getchar();

    /* Call the face function with the appropriate enum value based 
       upon the character input by the user.  */
    if (c == 'h') {
      face(happy);
    } else if (c == 's') {
      face(sad);
    } else if (c == 'q' || c == EOF) {
      /*  Exit the program on q or EOF (end-of-file).  Note that getchar 
          normally returns a char.  However, if it encounters EOF, it 
          returns that value, which is an int not a char.  That explains 
          why c and c2 are declared as ints not as chars. */
      return EXIT_SUCCESS;
    } else {
      /* No other user input has meaning so we output an error message. */  
      printf ("I don't know how to draw that face!\n");
    }

    /* Read and discard the remaining characters on the line. */
    c2 = getchar();
    while (c2 != '\n') {
      c2 = getchar();
    } 

  }
 
  return EXIT_SUCCESS;

}

/* face draws a happy face or a sad face on the screen.
   Parameters:
   which - which type of face to draw.  Only happy and sad are valid values.
*/
void face (Mood which) {
  /* Determine which type of face to draw and draw it. */
  if (which == happy) {
    printf(":-)\n");
  } else if (which == sad) {
    printf(":-(\n");
  } else {
    /* If this function is used correctly, we should not get here. */
    printf("face: unknown face type\n");
  }
}

Compiling and Running C Programs

To compile a C program, use the following command line:

gcc -Wall -o <output-filename> <source-filename>

For example, if you place the above program in a file called face.c, you would compile it using:

gcc -Wall -o face face.c

This will create a file called face in your directory. Note that there will be a * at the end of face. This just indicates that the file is executable and is not actually part of the filename. To execute the file, just type the output-filename at a shell prompt as follows:

-> face

Pointers

Up to this point, the features of C that we have covered are simple. Pointer manipulation and memory management are definitely where things get tricky. To understand pointers, you need some understanding of how memory is organized in a computer. Once you understand that concept, pointer manipulation becomes conceptually simple, yet remains extremely difficult to do correctly.

The memory of a computer consists of a very large number of bits. Bits are organized into 8 bit units called bytes. A byte is large enough to hold a char. Bytes are futher grouped together to form words. In modern computers, words are typically either 4 or 8 bytes. A word is large enough to hold an int. You can think of the memory in a computer as an extremely long array of bytes. A memory address is essentially an index into this very long array. Variables are a mechanism by which a high-level programming language abstracts from the hardware memory being manipulated. A programmer identifies memory locations using variable names. The compiler and runtime system bind those names to memory locations. The computer hardware does not deal with variables, only with memory locations.

In Java, we can have more than one variable refer to the same object. Each object occupies some part of this huge memory array. If two variables refer to the same object, then conceptually those variables are bound to the same chunk of memory. In reality, each of these variables is bound to its own chunk of memory. Otherwise, the bindings between variables and memory would need to change as a program executed. This would greatly slow down program execution. Instead, what happens is that the chunk of memory that is bound to a variable contains the address of the chunk of memory where the object resides. Whenever the variable is used, the object is found by going to the address held in the variable's memory. These are pointers. The main difference between C and Java is that in C you manipulate these pointers manually, while in Java the virtual machine does the manipulation for you.

When you declare a variable in Java whose type is a class, you are automatically getting a pointer. When you declare a variable whose type is not a class (int, char, boolean, ...), you do not get a pointer. Since we need to manage pointers manually in C, we need to tell the compiler if a particular variable should be a pointer or it should be a value.

When you declare a pointer in C, you indicate the type of thing that is pointed to. Then add a * to indicate that it is a pointer, rather than the type itself. Here's an example:

Money *ptr;

This declares a pointer to a value whose type is the Money struct we declared earlier. At this point the pointer does not yet point to anything. Similarly, if we declare a variable in Java whose type is a class, initially it does not point to anything. We must initialize it first. In Java, we could initialize it by assigning the result of a constructor call or we could initialize it by assigning an existing object to it that is held in another variable. Today, we will look at the latter, how to assign an existing value to a pointer variable. Later we will learn how to do something analogous to calling a constructor.

So, suppose we have an existing instance of Money:

Money someMoney;

Note that in this declaration, there is no *. As a result someMoney contains a real money value. We can say:

someMoney.dollars = 100;
someMoney.cents = 1;

Now, we can take our money pointer and have it refer to this existing money struct:

ptr = &money;

& is an operator that means "get the address of". So this assignment gets the address of the money variable and assigns this address to ptr. ptr points to the same money struct that the money variable contains. If we have a second pointer, we can assign between them directly, without using the address-of operator:

Money *ptr2;
ptr2 = ptr;

This works because the type of the variable on the right and the type of the variable on the left are both Money *.

In Java, after a variable is initialized, we access the instance variables and methods using . notation: variable.method (). C uses this notation to access parts of a struct when we have a variable whose type is the struct type. When the type is a pointer, we use a different syntax:

ptr->dollars

The arrow is simply 2 characters: a dash followed by a greater-than sign. It looks like an arrow, though, and signifies that we are following a pointer. If we have not initialized the ptr variable one of 3 things will happen:

Since we are in control of deciding when to use pointers, we can decide to use pointers with primitive types:

int *intptr;
int n = 10;
intptr = &n;

To dereference a pointer to a primitive type, we do not use the -> syntax since there would be nothing to write after the arrow! Instaed, we use a prefix *:

int i = *ptr;

i gets the value pointed to by ptr.

Now we can explain the syntax of scanf better. When you pass a parameter to a function, C copies the value in the argument and stores that value in the function's parameter. If the parameter is changed, it does not affect the original argument, since it is modifying a copy of the original value, not the original value directly. When we call scanf, we want the argument to be modified. What we must do then is pass the address of the argument. The address is copied and bound to a parameter inside scanf. scanf dereferences the pointer passed in and thus manipulates the memory that the original argument is bound to.

You need to pass addresses to functions if you want their value to change. In the case of scanf, we want the arguments to have new values on return so we must pass in addresses:

scanf ("%d/%d/%d", &month, &day, &year);

The scanf function wants to modify the values of month, day, and year. To do that, it needs addresses where it can assign the values. Since month, day, and year are simply ints, we need to pass the addresses in.

If a pointer has not been given a value yet, we call it a null pointer. Java provides a special constant called null so that you can test for that condition. C does not. Instead you use the value NULL: NULL is defined for you if you include stdlib.h

#include <stdlib.h>
int *ptr = NULL;


Return to CS 010 Home