Talking about dates before this epoch makes little sense (any date before the Gregorian epoch will not resemble the same stellar / planet constellation as date after the Gregorian epoch).
A precision of 100 ns is not bad, just inflexible; I didn't mean to imply otherwise. The epoch is weird, though. Here's a thought experiment: imagine that a hundred programmers are each asked to pick an epoch for a timestamp, and will be paid $1000 for each other programmer who chooses the same epoch, but they can't talk with each other to coordinate. Which would you pick? I would pick the Unix epoch, because I think others would do the same. Anything else is, in a certain important sense, weird.
That doesn't make the epoch weird, just different. The Unix Epoch is itself weird. I personally like it mostly for being a neat little historical artifact, but it's not interesting in any sense. What epoch would your hypothetical programmers settle on if they all had their minds wiped of any existing epochs.
I bet you'd see a lot of Jan 1 1900, 2000, or year 1. Very few would pick 1970.
Other fun non-Jan 1 ideas might be December 26, 1791 (Charles Babbage's birthday) or February 5, 1944, (the date Colossus came online)
It makes a great deal of sense when thinking about the use case: recording a monotinic time in a computing environment with limited resources and extremely unlikely to ever see files from before midnight of new years eve 1970.
With a great deal of existing routines that work with those values and a good number of existing files and storage applications based on the simple version, it makes even mores sense to extend that in an unsigned way with a larger bitfield for any timestamp. Though as recently came up, leap seconds and synchronization could use a few bits to describe which timestamp is in use.
1970 makes sense if you’re picking an epoch and you are in the 70s - everything will happen “after” your start date but picking the actual current date doesn’t seem natural enough.
It’s like picking 2020 as the beginning of an epoch now (of course we’d probably fall back to 2000 as that’s a big round number).
> Here's a thought experiment [...] I would pick the Unix epoch, because I think others would do the same.
Ah yes, nobody will ever need to represent a date/time before 1970-01-01.
As for your thought experiment: recently I had to pick a "NULL/no/unknown date" value to put into a non-nullable database column and I picked 1900-01-01. Unix epoch crossed my mind only briefly and I discarded it immediately because it's too arbitrary and, more importantly, too near valid dates that the user might use. (Use-case: materials from historical archives being ingested.)
So my opinion is that the RFC authors should be commended for having the foresight of choosing an epoch that's widely more applicable and far less arbitrary (the 1st day of the calendar system that most of the world is using today) than the Unix epoch.
> Ah yes, nobody will ever need to represent a date/time before 1970-01-01.
More like: the few people interested in doing this can use UUIDv8 (unspecified time format). There are many, many reasons to store times before 1970 in computer systems, but I'm not sure any apply to these UUID formats.
tl;dr of the "Background" section of the RFC: they're designing it so that two UUIDs with fresh timestamps will have values near each other in the database keyspace, which in many cases improves efficiency. If your timestamp isn't freshly generated, I don't know why you'd embed it in a UUID with these formats.
It's arguably not a good practice to extract the timestamp from the UUID at all. Instead, I might just treat them once generated as opaque approximately-sorted bytes. More clear to have a separate fields for timestamps with particular importance. Though I might not feel too religious about that if tight on storage space.
> If your timestamp isn't freshly generated, I don't know why you'd embed it in a UUID with these formats.
Ingesting past/archival data. As to why use these formats: to still benefit from their sorting properties. As to why use UUID at all: because it's a format understood by a wide variety of software.
> It's arguably not a good practice to extract the timestamp from the UUID at all.
Ya, I'd do it only if I trust the source (i.e., a closed system).
> Ingesting past/archival data. As to why use these formats: to still benefit from their sorting properties. As to why use UUID at all: because it's a format understood by a wide variety of software.
Their argument about the value of sorting is quite particular. Put stuff roughly in ascending order by the record creation time so that creating a bunch of records will touch (eg) fewer btree nodes. The simplest way to achieve that is to just use the current time. If you have some other timestamp at hand and happen to be scanning them in ascending order by that, you still get no advantage by this argument over just using the current timestamp:
> However some properties of [RFC4122] UUIDs are not well suited to this task. First, most of the existing UUID versions such as UUIDv4 have poor database index locality. Meaning new values created in succession are not close to each other in the index and thus require inserts to be performed at random locations. The negative performance effects of which on common structures used for this (B-tree and its variants) can be dramatic. As such newly inserted values SHOULD be time-ordered to address this.
My gut tells me that you should rarely if ever stick past timestamps in UUID primary keys. One reason why: can it ever change? Maybe your artifact dating technique was wrong, and you want to update the dates of a bunch of artifacts. But now you have to change your primary key. You probably don't want that.
I wouldn't say it makes "little sense". Just like we can talk about the year 776 BC even though no one at the time called it that, we can extend the Gregorian calendar backwards to dates when it wasn't used anywhere. The Wikipedia article on the proleptic Gregorian calendar lists some use cases. [1]
And in any case, 15 October 1582 isn't some hard cutoff point where we can stop worrying about calendar conversions. Only four countries adopted the Gregorian calendar on that day, and even in Europe there are several countries that only switched in the 20th century. If a piece of software needs to support historical dates that go anywhere near 1582, it needs to be aware of the existence of different calendars.
> Talking about dates before this epoch makes little sense (any date before the Gregorian epoch will not resemble the same stellar / planet constellation as date after the Gregorian epoch).
That is not the only sense in which we care about dates. For example, we might want to talk about which of two events came first, and by how much. Historians have lots of uses for dates before the introduction of the Gregorian calendar.
Historically, calendars came to be from observing the movement of planets and stars. Now:
> we might want to talk about which of two events came first, and by how much
Neither of which makes sense without having a common reference point (start of Gregorian calendar being one such possible reference point). Trivial example: there is exactly one day between 1582-10-04 (Julian c.) and 1582-10-15 (Gregorian c.), i.e., different reference points.
> Historians have lots of uses for dates before the introduction of the Gregorian calendar.
Yes, and I bet they're hyper-aware about the calendar system(s) they're using. The rest of our mortals can just use Gregorian calendar :)
Exactly as used by windows to store file times. And the epoch is the start of the modern calendar https://en.wikipedia.org/wiki/Gregorian_calendar
Talking about dates before this epoch makes little sense (any date before the Gregorian epoch will not resemble the same stellar / planet constellation as date after the Gregorian epoch).