Skip to content

Instantly share code, notes, and snippets.

@abainbridge
Last active November 2, 2024 14:39
Show Gist options
  • Select an option

  • Save abainbridge/aadfac7facf871befbf5d2fa3ddb0f83 to your computer and use it in GitHub Desktop.

Select an option

Save abainbridge/aadfac7facf871befbf5d2fa3ddb0f83 to your computer and use it in GitHub Desktop.
Pointers in C: Part 1

Pointers and Arrays: Part 1

Pointers are kinds of variables and play a very important role in C programming language. They have many uses, such as:

  • Strings
  • Dynamic memory allocation
  • Getting multiple results from a function
  • Building data structures
  • Pointing to functions

Beginners in C often find difficult with the concept of pointers. The usual reason is that they have a weak understanding of variables, so we will start by making sure we understand how regular variables work.

Variables

A typical machine has an array of consecutively numbered or addressed memory cells that may be manipulated individually or in contiguous groups. We call a single cell a "byte".

For example here are the first few bytes of memory with the number 5 stored at address 3:

  0   1   2   3   4   5   6   7
+---+---+---+---+---+---+---+---+---
|   |   |   | 5 |   |   |   |   |
+---+---+---+---+---+---+---+---+---

A variable in a program has four parts:

  • A name.
  • A type. This defines the amount of storage space the variable occupies and what operations the variable supports.
  • A value.
  • A location in memory where the value is stored. Also called the address.

If we write:

char c = 5;

We've created a variable, where:

  • The name is c.
  • The type is char. Variables of type char occupy 1-byte of memory and support normal arithmetic operations like, add, multiply, assignment etc.
  • The value is 5.
  • The location is usually beyond our control, but to match the diagram above, let's say the system decided the location is address 3.

We can store variables that require more than one byte too. For example, the type int stores a signed integer typically using 4 bytes.

int i = 1234;

For this line, the system might decide to store the variable at location 4. We say the variable is at address 4, because that's where it starts. In reality it is at addresses 4, 5, 6 and 7.

  0   1   2   3   4   5   6   7
+---+---+---+---+---+---+---+---+---
|   |   |   | H |     1234      |
+---+---+---+---+---+---+---+---+---
              c         i

Later, if we write:

i = 5678;

then at run time, when this statement is executed, the value 4567 will be placed in the memory reserved for the storage of the value of i.

Okay, now consider:

int j = 0;
int k = 0;
j = 7;    <-- line 3
k = j;    <-- line 4

At line 3, the compiler interprets the j as the address of the variable j and creates code to copy the value 7 to that location.

At line 4, however, the j is interpreted as the value stored at the location set aside for j, in this case 7. So, the compiler creates code to copy the contents of the memory location referred to by j into the memory location referred to by k.

This is an important point: whether the compiler uses the value or address of a variable depends on which side of the = sign the variable is written.

Pointers

Now, let's say that we have a reason for wanting a variable to hold a memory address. Such a variable is called a pointer variable.

The number of bytes need store a memory address depends on the machine architecture. When you hear people talk about 32-bit and 64-bit machines, those numbers are the number of bits in a memory address, so pointers on those machines would use 4 and 8 bytes respectively.

In C to define a pointer variable we need to add an asterisk to the end of the type of the variable, eg:

char * ptr;

The type char * says that we intend to use our pointer variable to store the address of a char. Such a pointer is said to "point to" a char. We haven't given it an initial value, so it probably doesn't point to a valid char yet.

Say the system decided to store ptr at address 4, and that our machine has 32-bit memory addresses, then we have:

  0   1   2   3   4   5   6   7
+---+---+---+---+---+---+---+---+---
|   |   |   |   | unknown value |
+---+---+---+---+---+---+---+---+---
                       ptr

To create a valid pointer, we need to set the value of the pointer to the address of a valid variable. To get the address of a variable, we use the unary operator &. Let's create a char variable called c as before and setup a pointer to that:

char c = 5;
char * ptr = &c;

The & operator gets the address of c, even though c is on the right hand side of the assignment operator '='. The compiler will generate code to copy that address into our ptr variable. After running these lines, the memory will look like this:

  0   1   2   3   4   5   6   7
+---+---+---+---+---+---+---+---+---
|   |   |   | 5 |       3       |
+---+---+---+---+---+---+---+---+---
              c        ptr
              |\        /
               \_______/

Now, ptr is said to "point to" c.

Bear with us now, there is only one more operator we need to discuss: the "dereferencing operator", which is the asterisk. It is used as follows:

*ptr = 7;

This line of code will copy the value 7 to the location pointed to by ptr. That is, when we use the * this way we are referring to the location that ptr is pointing to, not the location of the pointer itself.

Similarly, we could write:

printf("%d\n", *ptr);

to print the number stored at the address pointed to by ptr.

The really confusing part here is that the asterisk does two different things depending on context:

  • In a variable declaration it says the variable is a pointer.
  • In a use of a variable it says to dereference this pointer. That is, work on the variable the pointer points to.

One way to see how all this fits together would be to run the following program and then review the code and the output carefully.

#include <stdio.h>

int main(void)
{
    int j = 1;
    int k = 2;
    int * ptr = &k;

    printf("j has the value %d and is stored at %p\n", j, &j);
    printf("k has the value %d and is stored at %p\n", k, &k);
    printf("ptr has the value %p and is stored at %p\n", ptr, &ptr);
    printf("The value of the integer pointed to by ptr is %d\n", *ptr);

    return 0;
}

When you run this on a real machine, you'll probably find that the addresses are pretty big numbers, and they'll be displayed in hexadecimal notation.


To review:

  • A variable is declared by giving it a type and a name (e.g. int k;)
  • A pointer variable is declared by giving it a type that includes asterisk and a name (e.g. int *ptr;) where the asterisk tells the compiler what type the pointer is to point to (int in this case).
  • Once a variable is declared, we can get its address by preceding its name with the unary & operator, as in &k.
  • We can "dereference" a pointer, i.e. refer to the value of that which it points to, by using the unary * operator as in *ptr.

Wrinkles:

If you try to declaring two pointer variables at once, like this:

int * x, y;

you'll define x to be a pointer as expected, but y will be a normal int. Instead, you have to write:

int * x, * y;

The rule in C for these multi-variable declarations is that each variable gets same type, apart from the pointerness of the type, which you have to respecify for each variable. Yes, this rule is ridiculous.

You can arrange the whitespace however you like. These all do the same thing:

int *x, *y;
int* x, * y;
int*x,*y;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment