{"id":1282,"date":"2014-11-06T00:00:00","date_gmt":"2014-11-06T00:00:00","guid":{"rendered":"https:\/\/www.fussylogic.co.uk\/blog\/?p=1282"},"modified":"2015-02-19T20:36:39","modified_gmt":"2015-02-19T20:36:39","slug":"basic-pointers","status":"publish","type":"post","link":"https:\/\/www.fussylogic.co.uk\/blog\/?p=1282","title":{"rendered":"Basic Pointers"},"content":{"rendered":"<p>I saw <a href=\"http:\/\/www.reddit.com\/r\/Cplusplus\/comments\/2lfchb\/so_where_exactly_in_memory_does_my_pointer_exist\/\">this<\/a> question on reddit, and decided to help. It\u00e2\u20ac\u2122s a little more fundamental than I usually blog about, but I thought it might help someone one day who only seen the abstractions that high-level languages present to us, never the nitty-gritty underneath.<\/p>\n<blockquote>\n<p>So as a beginner programmer the line of code:<\/p>\n<pre><code>int *pApples = &amp;apples;<\/code><\/pre>\n<p>means that I am creating \u00e2\u20ac\u0153something\u00e2\u20ac\u009d \u00e2\u20ac\u0153somewhere\u00e2\u20ac\u009d and this \u00e2\u20ac\u0153something\u00e2\u20ac\u009d points a finger at the memory address of apples. When I output the \u00e2\u20ac\u0153somewhere\u00e2\u20ac\u009d of pApples by typing in &amp;pApples I get the same address as &amp;apples. To me, it looks like two things are occupying the same space at the same time. I\u00e2\u20ac\u2122m just fiddling with pointers at the moment. This is way beyond what I have been taught in class so far so sorry if this is a stupid question.<\/p>\n<\/blockquote>\n<p>Literally \u00e2\u20ac\u0153where\u00e2\u20ac\u009d is an unanswerable question. The compiler, CPU and OS all contribute to that.<\/p>\n<p>However, broadly, the pointer is, like every other variable, a block of memory. Allocated either on the stack, the heap, or global memory spaces.<\/p>\n<p>The particular CPU (and compiler) you are using determines how big a block of memory is allocated for any particular variable type. For example, an <code>unsigned int<\/code> on my system (and probably yours) is 32-bits wide, so takes up four bytes. Let\u00e2\u20ac\u2122s say it happens, in one particular run of a program, to be allocated at 0x40000000 in memory, and is equal to 0xaabbccdd.<\/p>\n<pre><code>0x40000000 0x40000001 0x40000002 0x40000003\n[0xdd]     [0xcc]     [0xbb]     [0xaa]<\/code><\/pre>\n<p>This particular CPU is little-endian (hence the least significant byte of the 32-bit <code>unsigned int<\/code> is stored first in memory); on big-endian CPUs, the opposite would be true. Big-endian is more friendly for humans to read, but it\u00e2\u20ac\u2122s often easier for a CPU and whatever algorithm you\u00e2\u20ac\u2122re writing to start at the least significant end, so little-endian has ended up more common.<\/p>\n<p>It\u00e2\u20ac\u2122s actually better (and one of the advantages of using a higher-level language) to not write code that knows the endian-ness of its target and let the compiler sort out those details. We\u00e2\u20ac\u2122ll do the same by just labelling the whole memory block as one entity with our <code>unsigned int<\/code> in it.<\/p>\n<pre><code>0x40000000\n[0xaabbccdd]<\/code><\/pre>\n<p>The compiler makes this bit of memory available to us, labelled and typed, so we\u00e2\u20ac\u2122ll treat this as.<\/p>\n<pre><code>(unsigned int i)\n@ 0x40000000\n[0xaabbccdd]<\/code><\/pre>\n<p>With this information the compiler knows enough about <code>i<\/code> to be able to perform all your integer operations on it. So when we loaded <code>i<\/code> with something:<\/p>\n<pre><code>unsigned int i;  \/\/ as the programmer we don&#39;t know that\n                 \/\/ this is stored at 0x4000000\ni = 0xaabbccdd;<\/code><\/pre>\n<p>The compiler then knows to write 0xdd into 0x4000000, 0xcc into 0x40000001, etc (actually on a 32-bit system it leaves it to the CPU to write it all in one operation, but imagine it\u00e2\u20ac\u2122s done byte-by-byte for now). Similarly if you do<\/p>\n<pre><code>i = i \/ 10;<\/code><\/pre>\n<p>It knows to translate this to (making up some assembler language):<\/p>\n<pre><code>mov  [0x4000000], R1    ; put contents of 0x4000000 in register 1\nmov  0x000000a, R2      ; put 10 in register 2\ncall _unsigned_int_long_division  ; divide R1 by R2, answer in R3\nmov  R3, [0x40000000]   ; answer goes from register back to memory<\/code><\/pre>\n<p>Importantly, if it was a signed <code>int<\/code>, then it would know to call a different subroutine, but your <code>i = i \/ 10<\/code> line would be unchanged. Similarly, for <code>float<\/code> or <code>long long<\/code>, or even <code>uint8_t<\/code>. Also note that the division operation is really just another function call, it\u00e2\u20ac\u2122s just so fundamental that languages tend to give it a concise symbol.<\/p>\n<p>Now, we\u00e2\u20ac\u2122ve already seen pointers in this little bit of pseudo assembly as we copied the number from memory to a CPU register so the CPU could work on it, then back into the memory location representing the variable. We can do similar things in C though.<\/p>\n<pre><code>unsigned int i;   \/\/ an integer, stored at say 0x40000000\nunsigned int *p;  \/\/ a pointer to an integer, stored at say 0x40000004\n\ni = 0xaabbccdd;\n\n\/\/ Point p at i\np = 0x40000004;<\/code><\/pre>\n<p>You would never do this. You can\u00e2\u20ac\u2122t in fact, because it\u00e2\u20ac\u2122s only because this isn\u00e2\u20ac\u2122t a real example that we even know that the compiler\/linker has decided that <code>i<\/code> is stored at 0x40000000. Fortunately, we don\u00e2\u20ac\u2122t need to know, the compiler comes with a handy operator which lets us query it\u00e2\u20ac\u2122s knowledge of where it chose to store this variable \u00e2\u20ac\u201c the \u00e2\u20ac\u0153address-of\u00e2\u20ac\u009d operator.<\/p>\n<pre><code>p = &amp;i;<\/code><\/pre>\n<p>Now, note that we\u00e2\u20ac\u2122re not doing anything different from when we assigned 0xaabbccdd to <code>i<\/code> earlier \u00e2\u20ac\u201c we\u00e2\u20ac\u2122re just putting a number in a variable. It\u00e2\u20ac\u2122s just that this time the number has an additional meaning.<\/p>\n<p>Let\u00e2\u20ac\u2122s look at the memory:<\/p>\n<pre><code>(unsigned int i)   (unsigned int *p)\n@ 0x40000000       @ 0x40000004\n[0xaabbccdd] &lt;---- [0x40000000]<\/code><\/pre>\n<p>Both 32-bits of storage, but different types.<\/p>\n<p>Remember how the compiler knew when we performed operations with <code>i<\/code> which subroutines to call because it also knew what <em>type<\/em> the variable was. Exactly the same applies to pointers \u00e2\u20ac\u201c the compiler knows that the number stored in <code>p<\/code> can have some \u00e2\u20ac\u0153pointery\u00e2\u20ac\u009d operations performed on it that can\u00e2\u20ac\u2122t be done on an <code>unsigned int<\/code> (and vice versa in fact). In particular, the inverse operation of \u00e2\u20ac\u0153address-of\u00e2\u20ac\u009d, \u00e2\u20ac\u0153dereference\u00e2\u20ac\u009d, which is the unary operator symbol \u00e2\u20ac\u0153*\u00e2\u20ac\u009d.<\/p>\n<pre><code>i = i \/ 10;   \/\/ legal, division is defined for `unsigned int`\np = p \/ 10;   \/\/ illegal, division is meaningless for a pointer\n\ni = *p;       \/\/ legal,\np = *i;       \/\/ illegal, we can&#39;t dereference an integer<\/code><\/pre>\n<p>The compiler is protecting us \u00e2\u20ac\u201c even though <code>p<\/code> and <code>i<\/code> are both 32-bit numbers, dereferencing <code>i<\/code> is not possible because the compiler has been told it doesn\u00e2\u20ac\u2122t point at anything. Note that if we were using assembly language, there would be no such protection, the CPU doesn\u00e2\u20ac\u2122t know anything about data types (for the most part). So, we can dereference a pointer, but what type should the dereferenced number have? We\u00e2\u20ac\u2122ve already told the compiler that.<\/p>\n<pre><code>unsigned int *p;<\/code><\/pre>\n<p>We\u00e2\u20ac\u2122re saying that \u00e2\u20ac\u0153<code>*p<\/code>\u00e2\u20ac\u009d will be an <code>unsigned int<\/code>. Hence while we can\u00e2\u20ac\u2122t perform division on a pointer, we can perform it on a dereferenced integer pointer.<\/p>\n<pre><code>i = *p \/ 10;<\/code><\/pre>\n<p>Let\u00e2\u20ac\u2122s look at the psuedo-assembly:<\/p>\n<pre><code>mov  [0x40000004], R4   ; put the contents of p into a register\nmov  [R4], R1           ; put contents of 0x4000000 in register 1\nmov  0x000000a, R2      ; put 10 in register 2\ncall _unsigned_int_long_division  ; divide R1 by R2, answer in R3\nmov  R3, [R4]           ; answer goes from register back to memory\n                        ; pointed to by R4<\/code><\/pre>\n<p>The middle of this is the same as before, but we\u00e2\u20ac\u2122ve added some indirection around the outside.<\/p>\n<p>Nothing stops you continuing this idea. Since we can point at anything, and we have a way of telling the compiler the type of the thing we\u00e2\u20ac\u2122re pointing at, we can go again:<\/p>\n<pre><code>unsigned int i;\nunsigned int *p;\nunsigned int **pp;\n\np = &amp;i;\npp = &amp;p;<\/code><\/pre>\n<p><code>pp<\/code> is now a pointer-to-a-pointer-to-an-<code>unsigned int<\/code>. Same as before; we\u00e2\u20ac\u2122ve told the compiler that <code>**pp<\/code> is equal to an <code>unsigned int<\/code>. Or <code>*pp<\/code> must be something that can points to an <code>unsigned int *<\/code>.<\/p>\n<p>To go back to your question directly.<\/p>\n<blockquote>\n<p>When I output the \u00e2\u20ac\u0153somewhere\u00e2\u20ac\u009d of pApples by typing in &amp;pApples I get the same address as &amp;apples.<\/p>\n<\/blockquote>\n<p>Hopefully now you can see that <code>&amp;pApples<\/code> is not equal to <code>&amp;apples<\/code>. <code>&amp;pApples<\/code> is a pointer to a pointer to an integer; and <code>&amp;apples<\/code> is a pointer to an integer. To modify my last example a little to use your naming:<\/p>\n<pre><code>int apples;          \/\/ let&#39;s say this is at 0x40000000\nint *pApples;        \/\/ let&#39;s say this is at 0x40000004\nint **ppApples;      \/\/ let&#39;s say this is at 0x40000008\n\npApples = &amp;apples;   \/\/ 0x40000004 now holds 0x40000000\nppApples = &amp;pApples; \/\/ 0x40000008 now holds 0x40000004\n\n\/\/ Note because the pointers point at apples regardless of what it\n\/\/ holds, it doesn&#39;t matter that we assign this after we assign to\n\/\/ the pointers\napples = 10;         \/\/ 0x40000000 now holds 10<\/code><\/pre>\n<p>Let\u00e2\u20ac\u2122s fill in our last pretend memory map:<\/p>\n<pre><code>(int apples)     (int *pApples)   (int **ppApples)\n@ 0x40000000     @ 0x40000004     @ 0x40000008\n[0x0000000a] &lt;-- [0x40000000] &lt;-- [0x40000004]<\/code><\/pre>\n<p>Now we can use those pointers\u00e2\u20ac\u00a6<\/p>\n<pre><code>\/\/ Derferencing the pointer-to-the-integer gets us an integer\nassert(*pApples == apples);\n\/\/ Dereferencing the pointer-to-the-pointer once gets us a pointer to\n\/\/ apples, pApples\nassert(*ppApples == pApples);\n\/\/ Since *ppApples equals pApples; then dereferencing *ppApples is the same\n\/\/ as derferencing pApples... i.e. apples\nassert(**ppApples == apples);<\/code><\/pre>\n<p>The compiler picked the storage locations of these variables; and we fetched those locations with the address-of operator and wrote them to other variables. Those other variables have to be typed such that the compiler will allow assignment of pointers to them, but a pointer is just another variable once you scratch the surface.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I saw this question on reddit, and decided to help. It\u00e2\u20ac\u2122s a little more fundamental than I usually blog about, but I thought it might help someone one day who only seen the abstractions that high-level languages present to us, never the nitty-gritty underneath. So as a beginner programmer the line of code: int *pApples\u2026 <span class=\"read-more\"><a href=\"https:\/\/www.fussylogic.co.uk\/blog\/?p=1282\">Read More &raquo;<\/a><\/span><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[65,135,136,100,6,137],"_links":{"self":[{"href":"https:\/\/www.fussylogic.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/posts\/1282"}],"collection":[{"href":"https:\/\/www.fussylogic.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.fussylogic.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.fussylogic.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.fussylogic.co.uk\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1282"}],"version-history":[{"count":3,"href":"https:\/\/www.fussylogic.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/posts\/1282\/revisions"}],"predecessor-version":[{"id":1311,"href":"https:\/\/www.fussylogic.co.uk\/blog\/index.php?rest_route=\/wp\/v2\/posts\/1282\/revisions\/1311"}],"wp:attachment":[{"href":"https:\/\/www.fussylogic.co.uk\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1282"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.fussylogic.co.uk\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1282"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.fussylogic.co.uk\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1282"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}