PostgreSQL and Row IDs

From a blog post

[PostgreSQL] allows for treating INSERT and UPDATE statements as table references, when used with the RETURNING clause. This is quite fancy, even if not very useful in everyday SQL.

â€œnot usefulâ€? ~~Madness I tell you~~ I respectfully disagree.

I only discovered RETURNING in the last year; and itâ€™s incredibly useful in everyday SQL. Iâ€™ll come back to what it does later.

Letâ€™s say you write a web app that edits a single table. One page letâ€™s you add a new row. After the new row is added, you want to redirect the browser to the edit page for that row. Your URIs would probably go like this in a RESTful style:

- <http://yourapp.com/rows/new> shows the editor and submits to...
- <http://yourapp.com/rows/insert> reads the `POST`ed form writes
  the record to the database, and redirects to...
- <http://yourapp.com/rows/edit/2029> shows the editor and submits
  to...
- <http://yourapp.com/rows/update/2029> reads the `POST`ed form writes
  the record to the database, and redirects to...
- <http://yourapp.com/rows/edit/2029> shows the editor and submits
  to...
- and so on

Or similar.

Iâ€™m interested in the /table/insert step. Weâ€™re going to run a query like this in that step (with appropriate escaping to prevent SQL injection):

INSERT INTO Rows
    (Field1, Field2, Field3)
    VALUES ('data#1', 'data#2', 'data#3');

Now weâ€™ll redirect to /table/edit/2029â€¦ exceptâ€¦ oops, where do we get the 2029 from? This is the problem Iâ€™ll discuss here.

Let me first show the schema for the table (PostgreSQL style):

CREATE TABLE Rows (
    ID serial PRIMARY KEY,
    Field1 text,
    Field2 text,
    Field3 text
);

Importantly, PostgreSQL automatically creates a sequence for the ID field and uses it as the default value for that field (sequences are atomic auto-incrementers, assuring that each time itâ€™s next value is fetched, that next value is returned only once â€“ multiple simultaneous INSERTs are therefore guaranteed to get unique IDs); PRIMARY KEY is, really, just a short way of saying â€œUNIQUE NOT NULLâ€, but itâ€™s also a way of indicating to the reader that this field is theâ€¦ primary key. That has the effect of automatically creating an index for that field.

We can see whatâ€™s happened in reality if we do a \d rows from the psql prompt.

database=> \d rows
                                   Table "public.database"
  Column   |           Type           |                     Modifiers                      
-----------+--------------------------+----------------------------------------------------
 id        | integer                  | not null default nextval('rows_id_seq'::regclass)
 field1    | text                     |
 field2    | text                     | 
 field3    | text                     | 
Indexes:
    "rows_pkey" PRIMARY KEY, btree (id)

The id column defaults to nextval('rows_id_seq'), being the next atomic value from the named sequence. The sequence might look like this:

database=> \d rows_id_seq
        Sequence "public.rows_id_seq"
    Column     |  Type   |        Value        
---------------+---------+---------------------
 sequence_name | name    | rows_id_seq
 last_value    | bigint  | 9
 start_value   | bigint  | 1
 increment_by  | bigint  | 1
 max_value     | bigint  | 9223372036854775807
 min_value     | bigint  | 1
 cache_value   | bigint  | 1
 log_cnt       | bigint  | 24
 is_cycled     | boolean | f
 is_called     | boolean | t

Before I knew about RETURNING I solved the problem by directly querying the sequence; noting the value and using that in the INSERT and the redirect. That meant that I would completely bypass the default modifier for the id column when I needed the ID; but when I didnâ€™t a normal insert would work fine.

import psycopg2

# You will need a particularly configured pg_hba.conf for this to
# work; along with, of course, the database and user already created.
dbh = psycopg2.connect("dbname='database' user='yourdbuser'")

# get next sequence value
cur = dbh.cursor()
cur.execute("SELECT nextval('rows_id_seq')")
nextid = cur.fetchone()[0]

cur = dbh.cursor()
cur.execute("""INSERT INTO rows (id, field1, field2, field3)
    VALUES (%s,%s,%s,%s)""", (nextid, 'data#1', 'data#2', 'data#3'))

redirectToEditPage(nextid)

This is horrible. Imagine we want to make a cascade of records in various intra-referenced tables and have to go through this every time. Thatâ€™s what I used to do; and I looked enviously at the one feature of MySQL that I wished I hadâ€¦ mysql_insert_id(). I neednâ€™t have been envious, PostgreSQL has had a better solution for years (since 8.2), I just didnâ€™t know about it â€” RETURNING. Itâ€™s use is simple:

import psycopg2

# You will need a particularly configured pg_hba.conf for this to
# work; along with, of course, the database and user already created.
dbh = psycopg2.connect("dbname='database' user='yourdbuser'")

cur = dbh.cursor()
cur.execute("""INSERT INTO rows (field1, field2, field3)
    VALUES (%s,%s,%s) RETURNING ID""", ('data#1', 'data#2', 'data#3'))
nextid = cur.fetchone()[0]

redirectToEditPage(nextid)

RETURNING makes INSERT, which normally returns nothing, behave a little like a SELECT on the table being altered. Here, RETURNING ID runs the INSERT then returns SELECT ID FROM rows.

(My defence is that I learned PostgreSQL when it was at version 7.1, so didnâ€™t support RETURNING.)

For me, thatâ€™s about as useful as it gets in â€œeveryday SQLâ€.

Leave a Reply Cancel reply