TIL: behavior of Postgres ON CONFLICT and RETURNING

· 747 words · 4 minute read

TL;DR: Using ON CONFLICT DO NOTHING and RETURNING in the same SQL statement is probably wrong.

I was working on a side project written using Phoenix, where I needed the get-or-create functionality. As an example, suppose I have a table with the following columns:

  1. ID: The primary key, which is a UUID. The ID is auto-generated by Ecto. Hence, from the perspective of Postgres, this value is provided by the application.
  2. lookup_key: A string column with a uniqueness constraint. This is generated by the business logic of the application.
  3. Other columns: Not relevant here.

What I wanted is that when a record is added to the table and there is a constraint violation due to lookup_key, return what was in the DB without touching it; otherwise, insert that record as usual. This was my attempt:

def create_my_object(attrs) do
  cs = changeset(%MyObject{}, attrs)
  Repo.insert(cs, on_conflict: :nothing, conflict_target: :lookup_key, returning: true)
end

defp changeset(my_object, attrs) do
  my_object
  |> cast(attrs, [:lookup_key, :other, :columns])
  |> unique_constraint(:lookup_key)
end

The changeset bit is mostly to tell Ecto what are the columns that we want to update (and in a real application, also about data validations). The interesting stuff are in the Repo.insert call:

  • on_conflict: This instructs the DB on what to do when there is a uniqueness constraint violation (due to the column specified by conflict_target). :nothing means “ignores the error in case of conflicts”. The other options are :raise (which raises an error when there is a violation) or some kind of replacements, which describes how the entry in the DB should be updated. The latter is usually used to implement the upsert pattern.
  • returning: The list of columns that we want to DB to return after the insertion. If set to true, all columns are requested.

To me, in plain English, the Repo.insert call reads like:

Insert the given changeset, if there is a conflict on the lookup_key, do nothing, and return what is in the DB.

I also wrote some integration tests to validate this:

test "create_my_object should not create new instances when there is a conflict" do
  attrs = %{lookup_key: "my-lookup-key"}

  # This should get created:
  assert {:ok, obj1} = create_my_object(attrs)

  # This should not create a new entry. Instead, it should return what is essentially `obj1`.
  assert {:ok, obj2} = create_my_object(attrs)

  assert obj1.id == obj2.id
end

But the final assertion failed – the IDs do not match. I checked that obj1.id is in the DB while obj2.id is not. I also checked that no new entries are created due to the second call to create_my_object. Hence, what we have is partially working, except that it is not returning what is in the DB. Instead, it is returning what is generated by Ecto.

First, I checked that the generated SQL statement is as we expect, which is indeed the case:

INSERT INTO "my_objects"
("lookup_key","other","columns","inserted_at","updated_at","id")
VALUES ($1,$2,$3,$4,$5,$6)
ON CONFLICT ("lookup_key") DO NOTHING
RETURNING "lookup_key","other","columns","inserted_at","updated_at","id"

Next, I tried a few variations of on_conflict. For example, on_conflict: {:replace, [:updated_at]}. This sets the updated_at in the DB to that of the record that we were trying to insert. The generated SQL is mostly the same as above, except that DO NOTHING was replaced with DO UPDATE SET "updated_at" = EXCLUDED."updated_at". The test also passed, but the updated_at is changed.

After some experimentation, I found that the RETURNING bit kicks in when it is not DO NOTHING, for example, even when the update does not result in any logical changes to the record, the entry does get returned.

Looking at the documentation for INSERT, it becomes clear:

The optional RETURNING clause causes INSERT to compute and return value(s) based on each row actually inserted (or updated, if an ON CONFLICT DO UPDATE clause was used). This is primarily useful for obtaining values that were supplied by defaults, such as a serial sequence number. However, any expression using the table’s columns is allowed. The syntax of the RETURNING list is identical to that of the output list of SELECT. Only rows that were successfully inserted or updated will be returned. For example, if a row was locked but not updated because an ON CONFLICT DO UPDATE … WHERE clause condition was not satisfied, the row will not be returned.

So where do we go from here? There are two solutions:

  1. Do the {:replace, [:updated_at]} trick: it might pollute the data a little bit if we need the data in updated_at to be semantically sound.
  2. After each insertion, read from the DB again using the lookup_key.