PostgreSQL and Row IDs

By | 2013-09-27

From a blog post

[PostgreSQL] allows for treating INSERT and UPDATE statements as table references, when used with the RETURNING clause. This is quite fancy, even if not very useful in everyday SQL.

“not useful”? Madness I tell you I respectfully disagree.

I only discovered RETURNING in the last year; and it’s incredibly useful in everyday SQL. I’ll come back to what it does later.

Let’s say you write a web app that edits a single table. One page let’s you add a new row. After the new row is added, you want to redirect the browser to the edit page for that row. Your URIs would probably go like this in a RESTful style:

- <http://yourapp.com/rows/new> shows the editor and submits to...
- <http://yourapp.com/rows/insert> reads the `POST`ed form writes
  the record to the database, and redirects to...
- <http://yourapp.com/rows/edit/2029> shows the editor and submits
  to...
- <http://yourapp.com/rows/update/2029> reads the `POST`ed form writes
  the record to the database, and redirects to...
- <http://yourapp.com/rows/edit/2029> shows the editor and submits
  to...
- and so on

Or similar.

I’m interested in the /table/insert step. We’re going to run a query like this in that step (with appropriate escaping to prevent SQL injection):

INSERT INTO Rows
    (Field1, Field2, Field3)
    VALUES ('data#1', 'data#2', 'data#3');

Now we’ll redirect to /table/edit/2029… except… oops, where do we get the 2029 from? This is the problem I’ll discuss here.

Let me first show the schema for the table (PostgreSQL style):

CREATE TABLE Rows (
    ID serial PRIMARY KEY,
    Field1 text,
    Field2 text,
    Field3 text
);

Importantly, PostgreSQL automatically creates a sequence for the ID field and uses it as the default value for that field (sequences are atomic auto-incrementers, assuring that each time it’s next value is fetched, that next value is returned only once – multiple simultaneous INSERTs are therefore guaranteed to get unique IDs); PRIMARY KEY is, really, just a short way of saying “UNIQUE NOT NULL”, but it’s also a way of indicating to the reader that this field is the… primary key. That has the effect of automatically creating an index for that field.

We can see what’s happened in reality if we do a \d rows from the psql prompt.

database=> \d rows
                                   Table "public.database"
  Column   |           Type           |                     Modifiers                      
-----------+--------------------------+----------------------------------------------------
 id        | integer                  | not null default nextval('rows_id_seq'::regclass)
 field1    | text                     |
 field2    | text                     | 
 field3    | text                     | 
Indexes:
    "rows_pkey" PRIMARY KEY, btree (id)

The id column defaults to nextval('rows_id_seq'), being the next atomic value from the named sequence. The sequence might look like this:

database=> \d rows_id_seq
        Sequence "public.rows_id_seq"
    Column     |  Type   |        Value        
---------------+---------+---------------------
 sequence_name | name    | rows_id_seq
 last_value    | bigint  | 9
 start_value   | bigint  | 1
 increment_by  | bigint  | 1
 max_value     | bigint  | 9223372036854775807
 min_value     | bigint  | 1
 cache_value   | bigint  | 1
 log_cnt       | bigint  | 24
 is_cycled     | boolean | f
 is_called     | boolean | t

Before I knew about RETURNING I solved the problem by directly querying the sequence; noting the value and using that in the INSERT and the redirect. That meant that I would completely bypass the default modifier for the id column when I needed the ID; but when I didn’t a normal insert would work fine.

import psycopg2

# You will need a particularly configured pg_hba.conf for this to
# work; along with, of course, the database and user already created.
dbh = psycopg2.connect("dbname='database' user='yourdbuser'")

# get next sequence value
cur = dbh.cursor()
cur.execute("SELECT nextval('rows_id_seq')")
nextid = cur.fetchone()[0]

cur = dbh.cursor()
cur.execute("""INSERT INTO rows (id, field1, field2, field3)
    VALUES (%s,%s,%s,%s)""", (nextid, 'data#1', 'data#2', 'data#3'))

redirectToEditPage(nextid)

This is horrible. Imagine we want to make a cascade of records in various intra-referenced tables and have to go through this every time. That’s what I used to do; and I looked enviously at the one feature of MySQL that I wished I had… mysql_insert_id(). I needn’t have been envious, PostgreSQL has had a better solution for years (since 8.2), I just didn’t know about it — RETURNING. It’s use is simple:

import psycopg2

# You will need a particularly configured pg_hba.conf for this to
# work; along with, of course, the database and user already created.
dbh = psycopg2.connect("dbname='database' user='yourdbuser'")

cur = dbh.cursor()
cur.execute("""INSERT INTO rows (field1, field2, field3)
    VALUES (%s,%s,%s) RETURNING ID""", ('data#1', 'data#2', 'data#3'))
nextid = cur.fetchone()[0]

redirectToEditPage(nextid)

RETURNING makes INSERT, which normally returns nothing, behave a little like a SELECT on the table being altered. Here, RETURNING ID runs the INSERT then returns SELECT ID FROM rows.

(My defence is that I learned PostgreSQL when it was at version 7.1, so didn’t support RETURNING.)

For me, that’s about as useful as it gets in “everyday SQL”.

Leave a Reply