I saw this question on reddit, and decided to help. Itâ€™s a little more fundamental than I usually blog about, but I thought it might help someone one day who only seen the abstractions that high-level languages present to us, never the nitty-gritty underneath.
So as a beginner programmer the line of code:
int *pApples = &apples;
means that I am creating â€œsomethingâ€ â€œsomewhereâ€ and this â€œsomethingâ€ points a finger at the memory address of apples. When I output the â€œsomewhereâ€ of pApples by typing in &pApples I get the same address as &apples. To me, it looks like two things are occupying the same space at the same time. Iâ€™m just fiddling with pointers at the moment. This is way beyond what I have been taught in class so far so sorry if this is a stupid question.
Literally â€œwhereâ€ is an unanswerable question. The compiler, CPU and OS all contribute to that.
However, broadly, the pointer is, like every other variable, a block of memory. Allocated either on the stack, the heap, or global memory spaces.
The particular CPU (and compiler) you are using determines how big a block of memory is allocated for any particular variable type. For example, an
unsigned int on my system (and probably yours) is 32-bits wide, so takes up four bytes. Letâ€™s say it happens, in one particular run of a program, to be allocated at 0x40000000 in memory, and is equal to 0xaabbccdd.
0x40000000 0x40000001 0x40000002 0x40000003 [0xdd] [0xcc] [0xbb] [0xaa]
This particular CPU is little-endian (hence the least significant byte of the 32-bit
unsigned int is stored first in memory); on big-endian CPUs, the opposite would be true. Big-endian is more friendly for humans to read, but itâ€™s often easier for a CPU and whatever algorithm youâ€™re writing to start at the least significant end, so little-endian has ended up more common.
Itâ€™s actually better (and one of the advantages of using a higher-level language) to not write code that knows the endian-ness of its target and let the compiler sort out those details. Weâ€™ll do the same by just labelling the whole memory block as one entity with our
unsigned int in it.
The compiler makes this bit of memory available to us, labelled and typed, so weâ€™ll treat this as.
(unsigned int i) @ 0x40000000 [0xaabbccdd]
With this information the compiler knows enough about
i to be able to perform all your integer operations on it. So when we loaded
i with something:
unsigned int i; // as the programmer we don't know that // this is stored at 0x4000000 i = 0xaabbccdd;
The compiler then knows to write 0xdd into 0x4000000, 0xcc into 0x40000001, etc (actually on a 32-bit system it leaves it to the CPU to write it all in one operation, but imagine itâ€™s done byte-by-byte for now). Similarly if you do
i = i / 10;
It knows to translate this to (making up some assembler language):
mov [0x4000000], R1 ; put contents of 0x4000000 in register 1 mov 0x000000a, R2 ; put 10 in register 2 call _unsigned_int_long_division ; divide R1 by R2, answer in R3 mov R3, [0x40000000] ; answer goes from register back to memory
Importantly, if it was a signed
int, then it would know to call a different subroutine, but your
i = i / 10 line would be unchanged. Similarly, for
long long, or even
uint8_t. Also note that the division operation is really just another function call, itâ€™s just so fundamental that languages tend to give it a concise symbol.
Now, weâ€™ve already seen pointers in this little bit of pseudo assembly as we copied the number from memory to a CPU register so the CPU could work on it, then back into the memory location representing the variable. We can do similar things in C though.
unsigned int i; // an integer, stored at say 0x40000000 unsigned int *p; // a pointer to an integer, stored at say 0x40000004 i = 0xaabbccdd; // Point p at i p = 0x40000004;
You would never do this. You canâ€™t in fact, because itâ€™s only because this isnâ€™t a real example that we even know that the compiler/linker has decided that
i is stored at 0x40000000. Fortunately, we donâ€™t need to know, the compiler comes with a handy operator which lets us query itâ€™s knowledge of where it chose to store this variable â€“ the â€œaddress-ofâ€ operator.
p = &i;
Now, note that weâ€™re not doing anything different from when we assigned 0xaabbccdd to
i earlier â€“ weâ€™re just putting a number in a variable. Itâ€™s just that this time the number has an additional meaning.
Letâ€™s look at the memory:
(unsigned int i) (unsigned int *p) @ 0x40000000 @ 0x40000004 [0xaabbccdd] <---- [0x40000000]
Both 32-bits of storage, but different types.
Remember how the compiler knew when we performed operations with
i which subroutines to call because it also knew what type the variable was. Exactly the same applies to pointers â€“ the compiler knows that the number stored in
p can have some â€œpointeryâ€ operations performed on it that canâ€™t be done on an
unsigned int (and vice versa in fact). In particular, the inverse operation of â€œaddress-ofâ€, â€œdereferenceâ€, which is the unary operator symbol â€œ*â€.
i = i / 10; // legal, division is defined for `unsigned int` p = p / 10; // illegal, division is meaningless for a pointer i = *p; // legal, p = *i; // illegal, we can't dereference an integer
The compiler is protecting us â€“ even though
i are both 32-bit numbers, dereferencing
i is not possible because the compiler has been told it doesnâ€™t point at anything. Note that if we were using assembly language, there would be no such protection, the CPU doesnâ€™t know anything about data types (for the most part). So, we can dereference a pointer, but what type should the dereferenced number have? Weâ€™ve already told the compiler that.
unsigned int *p;
Weâ€™re saying that â€œ
*pâ€ will be an
unsigned int. Hence while we canâ€™t perform division on a pointer, we can perform it on a dereferenced integer pointer.
i = *p / 10;
Letâ€™s look at the psuedo-assembly:
mov [0x40000004], R4 ; put the contents of p into a register mov [R4], R1 ; put contents of 0x4000000 in register 1 mov 0x000000a, R2 ; put 10 in register 2 call _unsigned_int_long_division ; divide R1 by R2, answer in R3 mov R3, [R4] ; answer goes from register back to memory ; pointed to by R4
The middle of this is the same as before, but weâ€™ve added some indirection around the outside.
Nothing stops you continuing this idea. Since we can point at anything, and we have a way of telling the compiler the type of the thing weâ€™re pointing at, we can go again:
unsigned int i; unsigned int *p; unsigned int **pp; p = &i; pp = &p;
pp is now a pointer-to-a-pointer-to-an-
unsigned int. Same as before; weâ€™ve told the compiler that
**pp is equal to an
unsigned int. Or
*pp must be something that can points to an
unsigned int *.
To go back to your question directly.
When I output the â€œsomewhereâ€ of pApples by typing in &pApples I get the same address as &apples.
Hopefully now you can see that
&pApples is not equal to
&pApples is a pointer to a pointer to an integer; and
&apples is a pointer to an integer. To modify my last example a little to use your naming:
int apples; // let's say this is at 0x40000000 int *pApples; // let's say this is at 0x40000004 int **ppApples; // let's say this is at 0x40000008 pApples = &apples; // 0x40000004 now holds 0x40000000 ppApples = &pApples; // 0x40000008 now holds 0x40000004 // Note because the pointers point at apples regardless of what it // holds, it doesn't matter that we assign this after we assign to // the pointers apples = 10; // 0x40000000 now holds 10
Letâ€™s fill in our last pretend memory map:
(int apples) (int *pApples) (int **ppApples) @ 0x40000000 @ 0x40000004 @ 0x40000008 [0x0000000a] <-- [0x40000000] <-- [0x40000004]
Now we can use those pointersâ€¦
// Derferencing the pointer-to-the-integer gets us an integer assert(*pApples == apples); // Dereferencing the pointer-to-the-pointer once gets us a pointer to // apples, pApples assert(*ppApples == pApples); // Since *ppApples equals pApples; then dereferencing *ppApples is the same // as derferencing pApples... i.e. apples assert(**ppApples == apples);
The compiler picked the storage locations of these variables; and we fetched those locations with the address-of operator and wrote them to other variables. Those other variables have to be typed such that the compiler will allow assignment of pointers to them, but a pointer is just another variable once you scratch the surface.