Use PostgreSQL

By | 2010-09-21

The key factor for me when selecting a database is this: data integrity is more important than anything. I made my choice about ten years ago. I tried MySQL and I tried PostgreSQL and found that PostgreSQL was far stricter on data types, referential integrity, foreign key relationships and its adherence to the SQL standard. At the time, PostgreSQL was known to be marginally slower than MySQL.

Unfortunately, ten years ago was when far too many web developers were making their ill-informed choice about which database to use, and they picked MySQL. MySQL became the open source database of preference. It shouldn’t be.

Both MySQL and PostgreSQL have developed since then; MySQL has had backend after backend written for it, in an attempt to bring that much valued data integrity into the mix. Of course, because it was bolted on rather than designed in, that made MySQL’s much-lauded speed turn out to be a bit of an illusion. As data integrity went up, speed went down. PostgreSQL on the other hand has taken its solid core and improved it and improved it.

From this site, the read-write performance of the two databases can be seen to be (broadly) similar:

PostgreSQL R/W performance MySQL R/W performance

Let’s look at the read-only performance; remembering that MySQL is “the fast one”.

PostgreSQL R/O performance MySQL R/O performance

What was true ten years ago is no longer true.

What’s more, many of the more serious problems in the list of gotchas for MySQL 4 are still applicable to MySQL 5 (I’m not up to date with MySQL usage, so I’ll be pleased to remove any of the below that have now been fixed). The [equivalent PostgreSQL][postgresql-gotchas] is, by comparison, fairly trivial (many of them are complaints about PostgreSQL being overly strict about syntax: who cares about that?).

Some key failures for me personally, that give you a feel for how cavalier MySQL is with data — note particularly the tendency to silently alter and store invalid data.

  • NULL IS NOT NULL … If no DEFAULT value is specified for a column, MySQL automatically assigns one, as follows. If the column may take NULL as a value, the default value is NULL. If the column is declared as NOT NULL, the default value depends on the column type …

  • NULL inserts as Now() … The TIMESTAMP column type provides a type that you can use to automatically mark INSERT or UPDATE operations with the current date and time. If you have multiple TIMESTAMP columns, only the first one is updated automatically. …

  • SELECTing NULLs is inconsistent … For the benefit of some ODBC applications (at least Delphi and Access), the following query can be used to find a newly inserted row: … SELECT * FROM tbl_name WHERE auto IS NULL;. Subsequent identical queries return something different.

  • Invalid ENUMs silently degrade to NULL … when ENUM receives an invalid value, an empty string is inserted.

  • Case insensitivity by default … Values in CHAR and VARCHAR columns are sorted and compared in case-insensitive fashion, unless the BINARY attribute was specified when the table was created. The BINARY attribute means that column values are sorted and compared in case-sensitive fashion according to the ASCII order of the machine where the MySQL server is running. BINARY doesn’t affect how the column is stored or retrieved. …

  • VARCHARs don’t store trailing blanks … However, in contrast to CHAR, VARCHAR values are stored using only as many characters as are needed, plus one byte to record the length. Values are not padded; instead, trailing spaces are removed when values are stored. (This space removal differs from the SQL–99 specification.) …

  • DEFAULT Now(), et al. is not possible It is not possible to create a column with a default value which is a function or expression.

  • Divide by zero IS NULL MySQL (incorrectly) thinks dividing by zero is NULL, rather than raising an error as it should.

  • Concatenation operator is treated as logical OR In MySQL || is translated as a logical OR and not as a concatenation operator.

  • Out-of-bounds INSERTs insert truncated data An attempt to INSERT a record that does not conform to the bounds checks should fail noisily. MySQL truncates the data and inserts it anyway. What is actually INSERTed is data-dependent

  • February 31st MySQL will insert February 31st successfully, and will report it as such.

I was prompted to write the above because PostgreSQL 9.0 has just been released. Each iteration gets you more features, and a faster database. The big one for version 9.0 is built-in replication. You can now have a PostgreSQL cluster that keeps itself automatically backed up, and nominate hot-standby machines should your database master break down.

Let’s forget the MySQL versus PostgreSQL war; and instead wonder why, given that PostgreSQL is completely free, that anyone is using MSSQL with its per-client licensing and dismal performance?

I’ve got nothing.

(As an aside: it’s pretty hard to get performance comparisons from proprietary databases because their software license forbids it. However, when PostgreSQL was last compared against Oracle, it was found to be 15% slower. That was version 8.2. Version 8.3 PostgreSQL is over 100% faster than 8.2. Conclude what you will.)

Leave a Reply