One of the most useful lessons I ever learned about designing database schemas w...

hn_throwaway_99 · on Nov 7, 2021

This a very strange, and incorrect, conclusion you have come to. It doesn't matter that datetime fields are unique, as most of the time you are not searching for a particular date time, but a range. I.e. "show me all of rows created between date X and Y". In that case, the ordering of the index makes it efficient to do that query.

Furthermore, often times your query will have an "ORDER BY dateTimeCol" value on it (or, more commonly, ORDER BY dateTimeCol DESC). If your index is created correctly, it means it can return rows with a quick index scan instead of needing to implement a sort as part of the execution plan.

pushrax · on Nov 7, 2021

This entirely depends on what kind of index is used. A sorted index, such as a B or B+ tree (used in many SQL databases), will allow for fast point/range lookups in a continuous value space. A typical inverted index or hash based index only allows point lookups of specific values in a discrete value space.

whoknowswhat11 · on Nov 7, 2021

I was just going to make this comment.

Postgres as far as I know uses B-tree by default.

You can switch sort order I think for this as well, so "most recent" becomes more efficient.

Multi-column indexes also work, if you are just searching for first column postgres can still use multi-column index.

jugg1es · on Nov 7, 2021

It was a sorted BTREE index in MySQL 5.x. I agree that its supposed to be fast but it just wasn't for some reason.

pushrax · on Nov 7, 2021

Are you sure it was actually using the index you expected? There are subtleties in index field order that can prevent the query optimizer from using an index that you might think it should be using.

One common misstep is having a table with columns like (id, user, date), with an index on (user, id) and on (date, id), then issuing a query like "... WHERE user = 1 and date > ...". There is no optimal index for that query, so the optimizer will have to guess which one is better, or try to intersect them. In this example, it might use only the (user, id) index, and scan the date for all records with that user. A better index for this query would be (user, date, id).

Izkata · on Nov 7, 2021

One of the more bizarre things we'd found in MySQL 5.something was that accidentally creating two identical indexes significantly slowed down queries that used it.

I wouldn't be surprised if you hit some sort of similar strange bug.

Diggsey · on Nov 7, 2021

> Since virtually every value is going to be different, indexing and searching on that field is (almost) no better than having no index on the datetime field at all.

Do you mean exact equality is rarely what you want because most of the values are different?

Or are you talking about the negative effect on index size of having so many distinct values?

I think the latter point could be quite database-dependent, eg. BTree de-duplication support was only added in Postgres 13. However, you could shave off quite a bit just from the fact that storing a date requires less space in the index than a datetime.

dreyfan · on Nov 7, 2021

B-Tree indexes perform just fine on datetime fields. Were you using hash or inverted indexes maybe?

dilyevsky · on Nov 7, 2021

You’re maybe thinking of compression? Primary id is unique too but indexed search is still logn compared to full table scan (n)

blibble · on Nov 7, 2021

there's more index types than only hashes (which do indeed only support equality)

e.g. btrees allow greater than/less than comparisons

postgres supports these and many more: https://www.postgresql.org/docs/9.5/indexes-types.html

jugg1es · on Nov 7, 2021

I was using a BTREE index and we were doing greater/less than queries. It was not a hash index.

ComodoHacker · on Nov 7, 2021

Did you also learn that certain indexes allow for range queries, and modern databases are quite efficient at those?

And yes, please write an article, it would be quite interesting. With data and scripts to reproduce, of course.

mceachen · on Nov 7, 2021

Your result is surprising: I suspect your wide table with an index wasn't in cache, and your timestamp-only table was.

The whole point is that indexes (can) prevent full table scans, which are expensive. This is true even if the column(s) you're indexing have high cardinality: but it relies on performant table or index joins (which, if your rdbms is configured with sufficient memory, should be the case).

drchickensalad · on Nov 7, 2021

I have no idea how you came to this conclusion, but indices on datetime fields are completely required to bring down the seek time from O(n) to O(log(N)), plus the length of the range. Massive parts of the businesses I've worked at would be simply impossible without this.

The cardinality of the index has other ramifications that aren't generally important here.

throwawaythekey · on Nov 8, 2021

If you were doing something like this then your story makes sense

`WHERE state_date > x AND end_date < x`

You can't index two ranges at once in a b tree!