Basic Pointers – FussyLogic

I saw this question on reddit, and decided to help. Itâ€™s a little more fundamental than I usually blog about, but I thought it might help someone one day who only seen the abstractions that high-level languages present to us, never the nitty-gritty underneath.

So as a beginner programmer the line of code:
int *pApples = &apples;
means that I am creating â€œsomethingâ€ â€œsomewhereâ€ and this â€œsomethingâ€ points a finger at the memory address of apples. When I output the â€œsomewhereâ€ of pApples by typing in &pApples I get the same address as &apples. To me, it looks like two things are occupying the same space at the same time. Iâ€™m just fiddling with pointers at the moment. This is way beyond what I have been taught in class so far so sorry if this is a stupid question.

Literally â€œwhereâ€ is an unanswerable question. The compiler, CPU and OS all contribute to that.

However, broadly, the pointer is, like every other variable, a block of memory. Allocated either on the stack, the heap, or global memory spaces.

The particular CPU (and compiler) you are using determines how big a block of memory is allocated for any particular variable type. For example, an unsigned int on my system (and probably yours) is 32-bits wide, so takes up four bytes. Letâ€™s say it happens, in one particular run of a program, to be allocated at 0x40000000 in memory, and is equal to 0xaabbccdd.

0x40000000 0x40000001 0x40000002 0x40000003
[0xdd]     [0xcc]     [0xbb]     [0xaa]

This particular CPU is little-endian (hence the least significant byte of the 32-bit unsigned int is stored first in memory); on big-endian CPUs, the opposite would be true. Big-endian is more friendly for humans to read, but itâ€™s often easier for a CPU and whatever algorithm youâ€™re writing to start at the least significant end, so little-endian has ended up more common.

Itâ€™s actually better (and one of the advantages of using a higher-level language) to not write code that knows the endian-ness of its target and let the compiler sort out those details. Weâ€™ll do the same by just labelling the whole memory block as one entity with our unsigned int in it.

0x40000000
[0xaabbccdd]

The compiler makes this bit of memory available to us, labelled and typed, so weâ€™ll treat this as.

(unsigned int i)
@ 0x40000000
[0xaabbccdd]

With this information the compiler knows enough about i to be able to perform all your integer operations on it. So when we loaded i with something:

unsigned int i;  // as the programmer we don't know that
                 // this is stored at 0x4000000
i = 0xaabbccdd;

The compiler then knows to write 0xdd into 0x4000000, 0xcc into 0x40000001, etc (actually on a 32-bit system it leaves it to the CPU to write it all in one operation, but imagine itâ€™s done byte-by-byte for now). Similarly if you do

i = i / 10;

It knows to translate this to (making up some assembler language):

mov  [0x4000000], R1    ; put contents of 0x4000000 in register 1
mov  0x000000a, R2      ; put 10 in register 2
call _unsigned_int_long_division  ; divide R1 by R2, answer in R3
mov  R3, [0x40000000]   ; answer goes from register back to memory

Importantly, if it was a signed int, then it would know to call a different subroutine, but your i = i / 10 line would be unchanged. Similarly, for float or long long, or even uint8_t. Also note that the division operation is really just another function call, itâ€™s just so fundamental that languages tend to give it a concise symbol.

Now, weâ€™ve already seen pointers in this little bit of pseudo assembly as we copied the number from memory to a CPU register so the CPU could work on it, then back into the memory location representing the variable. We can do similar things in C though.

unsigned int i;   // an integer, stored at say 0x40000000
unsigned int *p;  // a pointer to an integer, stored at say 0x40000004

i = 0xaabbccdd;

// Point p at i
p = 0x40000004;

You would never do this. You canâ€™t in fact, because itâ€™s only because this isnâ€™t a real example that we even know that the compiler/linker has decided that i is stored at 0x40000000. Fortunately, we donâ€™t need to know, the compiler comes with a handy operator which lets us query itâ€™s knowledge of where it chose to store this variable â€“ the â€œaddress-ofâ€ operator.

p = &i;

Now, note that weâ€™re not doing anything different from when we assigned 0xaabbccdd to i earlier â€“ weâ€™re just putting a number in a variable. Itâ€™s just that this time the number has an additional meaning.

Letâ€™s look at the memory:

(unsigned int i)   (unsigned int *p)
@ 0x40000000       @ 0x40000004
[0xaabbccdd] <---- [0x40000000]

Both 32-bits of storage, but different types.

Remember how the compiler knew when we performed operations with i which subroutines to call because it also knew what type the variable was. Exactly the same applies to pointers â€“ the compiler knows that the number stored in p can have some â€œpointeryâ€ operations performed on it that canâ€™t be done on an unsigned int (and vice versa in fact). In particular, the inverse operation of â€œaddress-ofâ€, â€œdereferenceâ€, which is the unary operator symbol â€œ*â€.

i = i / 10;   // legal, division is defined for `unsigned int`
p = p / 10;   // illegal, division is meaningless for a pointer

i = *p;       // legal,
p = *i;       // illegal, we can't dereference an integer

The compiler is protecting us â€“ even though p and i are both 32-bit numbers, dereferencing i is not possible because the compiler has been told it doesnâ€™t point at anything. Note that if we were using assembly language, there would be no such protection, the CPU doesnâ€™t know anything about data types (for the most part). So, we can dereference a pointer, but what type should the dereferenced number have? Weâ€™ve already told the compiler that.

unsigned int *p;

Weâ€™re saying that â€œ*pâ€ will be an unsigned int. Hence while we canâ€™t perform division on a pointer, we can perform it on a dereferenced integer pointer.

i = *p / 10;

Letâ€™s look at the psuedo-assembly:

mov  [0x40000004], R4   ; put the contents of p into a register
mov  [R4], R1           ; put contents of 0x4000000 in register 1
mov  0x000000a, R2      ; put 10 in register 2
call _unsigned_int_long_division  ; divide R1 by R2, answer in R3
mov  R3, [R4]           ; answer goes from register back to memory
                        ; pointed to by R4

The middle of this is the same as before, but weâ€™ve added some indirection around the outside.

Nothing stops you continuing this idea. Since we can point at anything, and we have a way of telling the compiler the type of the thing weâ€™re pointing at, we can go again:

unsigned int i;
unsigned int *p;
unsigned int **pp;

p = &i;
pp = &p;

pp is now a pointer-to-a-pointer-to-an-unsigned int. Same as before; weâ€™ve told the compiler that **pp is equal to an unsigned int. Or *pp must be something that can points to an unsigned int *.

To go back to your question directly.

When I output the â€œsomewhereâ€ of pApples by typing in &pApples I get the same address as &apples.

Hopefully now you can see that &pApples is not equal to &apples. &pApples is a pointer to a pointer to an integer; and &apples is a pointer to an integer. To modify my last example a little to use your naming:

int apples;          // let's say this is at 0x40000000
int *pApples;        // let's say this is at 0x40000004
int **ppApples;      // let's say this is at 0x40000008

pApples = &apples;   // 0x40000004 now holds 0x40000000
ppApples = &pApples; // 0x40000008 now holds 0x40000004

// Note because the pointers point at apples regardless of what it
// holds, it doesn't matter that we assign this after we assign to
// the pointers
apples = 10;         // 0x40000000 now holds 10

Letâ€™s fill in our last pretend memory map:

(int apples)     (int *pApples)   (int **ppApples)
@ 0x40000000     @ 0x40000004     @ 0x40000008
[0x0000000a] <-- [0x40000000] <-- [0x40000004]

Now we can use those pointersâ€¦

// Derferencing the pointer-to-the-integer gets us an integer
assert(*pApples == apples);
// Dereferencing the pointer-to-the-pointer once gets us a pointer to
// apples, pApples
assert(*ppApples == pApples);
// Since *ppApples equals pApples; then dereferencing *ppApples is the same
// as derferencing pApples... i.e. apples
assert(**ppApples == apples);

The compiler picked the storage locations of these variables; and we fetched those locations with the address-of operator and wrote them to other variables. Those other variables have to be typed such that the compiler will allow assignment of pointers to them, but a pointer is just another variable once you scratch the surface.

Leave a Reply Cancel reply