Hiding, obfuscating or encrypting database IDs

There are certain cases in web development when you want to hide an object's database ID (primary key) before exposing it to the user (such as order ids in an e-commerce store).

Why you might need to hide database ids

First of all: it has nothing to do with database security.

The real reason to obfuscate database ids is to prevent users from guessing the number of objects in the database.

In most cases, this doesn't matter, but there are exceptions. I'll illustrate them using three examples:

Example 1

Suppose you run an e-commerce store.

A competitor can easily guess the number of sales you're making each month by looking at the order ids.

This is very valuable data.

  • The competitor could adjust their selling strategy depending on this data.
  • They can find the correlation between your sales and a campaign you're running and copy what you're doing.

Example 2

A little subjective, but longer order ids look more "trustworthy".

An invoice with Order ID: #10 may make a user think, "Wait, only ten people have bought from them? Is there something wrong with their products?".

Whereas Order ID: #147-295-8700 makes it seem that you're popular and get a lot of orders.

Example 3

This example is similar to Example 1. Using an object's id in the URL will give the users or competition a hint about the number of objects.

In most cases, this doesn't matter, for example anybody can see how many blog posts a website has. So, there's no point in hiding blog post ids.

But when the data is private, it's probably a good idea to obfuscate the database ids.

Techniques for hiding database IDs

There a few techniques for obfuscating database ids.

Let's compare them all:

1. Generating random numbers

A random number (or alphanumeric string) is generated and saved in the database. This is just a bad solution and should almost always be avoided.

Pros:

  • Simple and straight-forward

Cons:

  • Bad performance.
  • Generated ids need to be saved in the database (needs extra column in the table)
  • You need to query the database before saving them to ensure uniqueness.

2. Using UUID

Using UUID to generate a random string.

Pros:

  • Simple and straight-forward

Cons:

  • Generated UUIds need to be saved in the database (needs extra column in the table).
  • Not suitable for order ids. Might be too long to print on an invoice/receipt and more prone to mistakes when typing it out.
  • Slower db lookup than Integer primary key

3. Using hashes such as MD5 or SHA1

This technique involves hashing an object's current primary key and then saving the hash in a new column. All in all, this is quite similar to using UUID.

Pros:

  • Simple and straight-forward

Cons:

  • Hashes need to be saved in the database (needs extra column in the table).
  • Not suitable for order ids. Might be too long to print on an invoice/receipt and more prone to mistakes when typing it out.
  • Slower db lookup than Integer primary key

4. Encrypting the existing IDs

This technique involves encrypting the current Integer ID (primary key) of an object to get a new, completely random ID.

The resulting IDs are reversible so you can just decrypt them to get the original primary key back and then fetch the object from the database.

Since Integer lookups are faster then Varchar (string) lookups, this means you get better performance than other techniques mentioned previously.

The encryption algorithms used are very fast and lightweight and have negligible performance effects which is faster than updating a row on the database.

Pros:

  • Faster than other solutions because encrypted ids are reversible so no database querying is required.
  • No need to save in the database.
  • Certain implementations and algorithms can generate short and unpredictable ids which are suitable for order ids (and practical for printing on receipt/invoice).

Cons

  • A bit complex to get started with

My opinion

I prefer the Encryption technique because it offers the best performance.

You can decrypt and encrypt the IDs on the fly and there's on overhead of database querying to ensure uniqueness.

I recommend these two options to generate IDs:

  1. Optimus ID: When you need integer only ids.
    This is based on Knuth's Multiplicative hashing. You can find its implementation for almost every language. Here are some links for PHP, Python, JavaScript.
  2. Hashids: When you need alphanumeric ids.
    It's a very popular id obfuscation library. Their Official website has links for implementations in various languages.

That's it

If you know of any other approaches or libraries for obfuscating databse ids, please comment about them; I'll update this post with your suggestions.