Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ruby 1.9 doesn't implement ISO 8601 properly (tommorris.org)
47 points by gourlaysama on Feb 24, 2013 | hide | past | favorite | 48 comments


The article is making two somewhat unrelated complaints:

1. Date.iso8601 does not correctly parse some ISO 8601 formats like ordinal dates or month-only dates.

2. The Date class does not support year-only or year/month granularity.

In response to the first, the documentation for Date.iso8601 (http://ruby-doc.org/stdlib-1.9.3/libdoc/date/rdoc/Date.html#...) says:

"Creates a new Date object by parsing from a string according to some typical ISO 8601 formats."

Not "all ISO 8601 formats," "some typical ISO 8601 formats." While supporting more formats is obviously desirable, the method never claimed to be a complete ISO 8601 implementation. It would be nicer if it were, but would also take more work and it appears that 100% spec support was neither prioritized nor promised.

I believe the second complaint is less justified. Asking a single class (Date) to support multiple granularities opens up a lot of semantic conundrums that have no obvious resolution. What should this return?

   Date.iso8601("2012-01") < Date.iso8601("2012-01-05")
If one is truly month granularity and one is day granularity, then the two aren't directly comparable. It would make more sense to me to have the month granularity representation be a separate class altogether (YearMonth?) with easy and well-defined conversions between the two.


If you support only a subset of the standard, or if you extend the standard, that may still be OK (not without problems, but still OK in some situations).

This case seems like an outright violation, however. If the standard says that ####-## is a year-month date with month granularity, and you interpret it as an ordinal date, then that's a clear violation. It means that others communicating with you won't just get an error, they will get the wrong answer, which is much worse.


So it's not even a subset of ISO 8601, it's just the umpteenth ad-hoc date parsing method with semantics defined as "portion of a 3k line C file with no comments": https://github.com/ruby/ruby/blob/7ea675732ac1dac72f07756498...

PS I did my undergrad thesis on Ruby; I love the language. But now that I have a job where people read my code more often than it's written, I understand why things like clear semantics matter.

And why method names should be precise since they lead developers to assume behavior. Didn't your english teacher tell you a poem's title means something? So do method names. "iso8601" implies some meaningful relationship to ISO 8601, but the only relationship here is that the coders of this method were thinking hard about ISO 8601 when they wrote it.


Sure, it's a tough problem. The resolution to my complaint is a proper ISO 8601 implementation for Ruby.

Which I'll probably have to write. ;)


ISO 8601 is large standard and predates the internet (as we know it) and seems like an attempt of formalizing all the many different ways people expressed dates (on physical papers in addition to computer systems), and also some ways no-one would ever use.

I think you’ll find a lot of APIs (or formats) claiming to support an “ISO 8601 date” when in fact they mean a date of the form “YYYY-MM-DD” — it’s a common misunderstanding, but I think better that, than actually allowing the full range of dates allowed by the actual ISO 8601 standard :)


Perhaps a useful compromise would be to implement W3C's ISO-8601 profile http://www.w3.org/TR/NOTE-datetime


My God. The W3C actually made something simpler. Drastically simpler; simple enough to write down on one page. Mind you, it's just a note that someone else submitted, and the W3C hasn't formally discussed it, let alone blessed it -- but even so, it's heartening.


Oh sure, I know the full extent of the madness. At a previous job I was working with XML Schema durations to represent energy emissions data. It gives me bad memories.


Maybe a Date should be a Range of Time? Even a specific date is a 24h interval.


The problem with making all dates into ranges is that there would be no total order. To effectively have a total order, you would need to say "give me the range of time at the smallest possible granularity", which is really just asking for a point in time. If the granularity becomes smaller later, it might break the application if you aren't careful. So, I don't think it's a good idea to give up on points in time.

However, I strongly agree that ranges of time are a very useful concept, and I spent a lot of time implementing Range Types[1] in postgresql for that reason. I think Ruby should have both ranges of time and points of time (and ideally work with the SQL counterparts seamlessly).

[1] http://www.postgresql.org/docs/9.2/static/rangetypes.html


Does a point in time ever really exist? I think it's all ranges, just with smaller granularities, until you get into quantum mechanics. So, your point-in-time type is actually a range-with-granularity-X type, which should be able to inter-operate with range types of different granularities in a well-defined way.

I've recently done a lot of work using time ranges (a scheduling app) in C#, using the TimePeriod library available via NuGet. Very helpful library, but it would have been even better if DB2 (which I have to use quite often) supported a range type like you've implemented for PostgreSQL.


"Does a point in time ever really exist?"

I agree, but that gets a little too philosophical. Does an integer exist? What about characters: is my "a" really the same as your "a"?

The way I see it, there are a lot of practical reasons to look at time as a total order for some things, even if ranges are available. Programming languages and databases should model reality, but only in the sense that it's relevant to what you are trying to do. If you are writing a scheduling app, then ranges of time are very important, and should be their own data type. If you just want to tell your customers when they bought some widget, a point in time makes more sense.


Of course an integer exists; it's a mathematical zero-dimensional point. Time is not a pure mathematical construct like that; it's continuous (at non-quantum scales, at least). Characters are integral, non-continuous values too... there's nothing between "a" and "b".

If you define the total order of a set of time ranges as an ordering on their beginnings, then you have the ordering you're talking about. If you compare time ranges that have different granularities the results might not make a ton of sense depending on what you're trying to do, but they'll be consistent. Eg: 2012 < 2012-05 < 2012-05-03 < 2012-05-03T15:00:00. Is the year 2012 "less than" May 3, 2012? Interpreted as a point in time, yes, but interpreted as time periods no, not really, because they overlap. But 2012 might be a perfectly valid value for when a customer bought some widget; if you ask them they may not remember the date any more specifically than that, and if you're deciding whether or not to process a warranty claim just knowing the year might be sufficient.


"If you define the total order of a set of time ranges as an ordering on their beginnings"

What is the data type of the beginning of a range of time, if not a point in time? I'm having trouble even defining ranges without points.

I'd be interested to hear your suggestions in more detail. There's a temporal data LinkedIn group that you are welcome to join (I haven't been as active lately, but I'll try to get back into it). It would be a better forum to discuss matters like this.


of course you can compare them (4 states: before, after, same, overlapping). you cannot put them into an order. that's something different.


There are different ways to overlap, too, so I don't think it's correct to say that there are only 4 states.


Actually the article seems to be a bit confused:

> If you aren’t doing ordinal dates, you aren’t doing ISO 8601.

It seems ruby1.9 does implement ordinal dates just fine:

    irb(main):002:0> Date.iso8601("2012-012")
    => #<Date: 2012-01-12 ((2455939j,0s,0n),+0s,2299161j)>
    irb(main):005:0> Date.iso8601("2012-366")
    => #<Date: 2012-12-31 ((2456293j,0s,0n),+0s,2299161j)>
What it does not implement is the "YYYY-MM" format:

    irb(main):010:0> Date.iso8601("2012-12")
    => #<Date: 2012-01-12 ((2455939j,0s,0n),+0s,2299161j)>


Yeah, I wrote it in a bit of a hurry on the train. Will fix.


Maybe he should subclass it and make it strict, renaming it the RFC3339 method:

http://tools.ietf.org/html/rfc3339

Basically, RFC3339 is ISO8601 cut down to the most straightforward implementation (YYYY-MM-DDTHH:MM:SS), without the "Ordinal Date" variant that is at issue here.


I think it would make the point much clearer if the author of that article would actually explain a use case for this (apparently incorrectly handled) syntax which I have never ever come in contact with :).


1) It's in the standard. If you call a function iso8601 you better follow that standard. Or document where the function deviates from it.

2) Just because you didn't come in contact with ordinal dates doesn't mean others don't need it.

3) The article has a link to https://en.wikipedia.org/wiki/Ordinal_date. The wiki page doesn't have any specific use cases but just use your imagination.


Usecase is that there are various embedded/industrial/enterprise systems that return and expect dates in either of those two formats as part of their API. Either because they want to represent the idea of "month" or because they don't care about days and not about week days and months (Implementing whole calendar in random embedded/industrial system is unnecessary busywork that costs you not only effort but also code size)


Well the point is that they shouldn't call the method iso8601 if it doesn't implement the whole iso8601 standard


That's a bold statement. I don't think I've ever seen a program / library which implements any standard completely and without issues. It's not uncommon to see a list of things that are not done, incomplete, or just called out as wrong and rejected from the implementation.

It's not perfect, but no implementation will ever be imho.


Not implementing part of a standard is different from implementing it differently than the standard specifies.

YYYY-MM is year-month, not year-day365. It's not very reasonable to interpret yyyy-mm as year-day365 in a function called iso8601 when the standard says that's not how to interpret it.


I think it's OK to be missing a few aspects of the standard. But in that case, it should still parse correctly and throw a "not supported" exception for unsupported features.

Parsing it incorrectly is bad news.


Ever seen a method called "validateEmailAddress" or something similar? Take any given implementation of that and you can predict with almost certainty that it does not validate all possible valid email addresses (which also has a spec that even came along after the internet was conceived).


So do tons of other libraries and PL standard libraries. jodatime in Java, moment.js, JS' Date object, python's datetime stdlib module, python-dateutil etc. ISO-8601 is actually quite a difficult standard to program for. 9 out of 10 times, these libraries either don't support just year, year-month, or both. 9 out of 10 times, when a datetime object is formatted to a string, you see inconsistencies about that dangling Z marker at the end.

Tom Morris is right, they all suck when it comes to ISO-8601 support.


I wonder how Perl's DateTime::Format::ISO8601 [1] holds up...

[1] https://metacpan.org/module/DateTime::Format::ISO8601


Seems to be fine:

  $ re.pl

  >> use aliased 'DateTime::Format::ISO8601';

  >> ISO8601->parse_datetime("2012-012");
  2012-01-12T00:00:00

  >> ISO8601->parse_datetime("2012-366");
  2012-12-31T00:00:00

  >> ISO8601->parse_datetime("2012-12");
  2012-12-01T00:00:00

  >> ISO8601->parse_datetime("2012");
  2012-01-01T00:00:00
And if it isn't then there is also Date::ISO8601 [1] which is written by Zefram [2] who is renowned for being a Date/Time nut [3] :)

1 - https://metacpan.org/module/Date::ISO8601

2 - https://metacpan.org/author/ZEFRAM

3 - https://metacpan.org/module/Date::Darian::Mars


Not only that, but ISO 8601 also allows you to just specify the year, but Date.iso8601("2012") raises an ArgumentError "invalid date". Nor does it support "week dates", Date.iso8601("2012-W01") raises an ArgumentError too.

I don't know the standard, but either Wikipedia is wrong or the Date implementation is sorely lacking.


It may well be lacking in completeness, but I bet it covers over 99% of actual real-world ISO8601 dates. When was the last time you ran across a "week date"?


Very common in Europe, where businesses often operate relative to weeks; we frequently use week numbers for that reason (an abomination in my opinion). Ruby has decent but not great support for ISO week numbers.


Yep, an example: I edit OpenStreetMap. The opening_hours property allows one to represent a wide range of opening hours. https://wiki.openstreetmap.org/wiki/Key:opening_hours

A lot of public parks in London are open at different times during summer and winter.

Here's an example I added personally, the churchyard of St Anne's Church in Soho. http://www.openstreetmap.org/browse/way/40879988

"week 1-13 Mo-Su 10:00-16:00; week 14-43 10:00-18:00; week 44-52 10:00-16:00"


Yep. There's a lot more problems than that.

I've been reading the source code of one ISO 8601 implementation in JavaScript and writing my own in Java. Looks like I'll have to fix the Ruby one too.

Given that it's used by, oh, HTML and XML, not properly implementing ISO 8601 is really no big deal.</sarcasm>


Was it just me or was the wording in the article slightly confusing? The second paragraph changes from talking about the standard to talking about the ruby implementation with out saying so in either case and no transition.

If I understand correctly, the complaint is that: ISO 8601 defines ####-## to be a year and month only, and it should have the granularity of a month; but Ruby treats it as an ordinal date.

It seems like it could be said much more clearly with an example:

    irb(main):011:0> Date.iso8601("2012-12")
    => #<Date: 2012-01-12 ((2455939j,0s,0n),+0s,2299161j)>
It's treating ####-## as an ordinal date, when the standard says that it's a month-granularity date.


Maybe it's rfc3339 and not iso8601? e.g. the method has been named wrong?


It is. Luckily enough that was fixed in rails recently: https://github.com/rails/rails/commit/31f807c7aaaf12c16ea157...


While we're on this subject, does anyone know of a decent C or C++ date-time library? Something like Joda Time. I know Boost has a couple of options but they look a bit less capable.


Replying to my own post, in case someone stumbles on this while searching: ICU. It's rather baroque, but seems to do the right thing. I recommend writing a C++11 wrapper for it (I might actually release this at some point in the future).


Well done you found an issue. The next step is to report it and if possible submit a patch: http://bugs.ruby-lang.org/projects/ruby/wiki/HowtoReport


This isn't exactly a bug; it's an API-design disagreement.


In any case if he wants to champion the change he will get more success discussing it over there. It's not that hard but yeah it's easier to moan over a blog post than actually fixing the issue.


I'm actually working with other data-folk trying to document ISO 8601 compliance across a wide range of date-time libraries and document it publicly, probably on the microformats.org wiki or W3C wiki.


> Ruby 1.9 doesn't [do such and such] properly

Expect to see more of these headlines now Ruby 2.0.0 is out, as Rubyists start nudging users over by painting v 1.9 as defective. Good way to avoid the Python 3 and Perl 6 conundrums I suppose!


Why not just fill a bug report?


That would make too much sense! Whining about it as though the world owes you something is the way to go.


I'm very sorry that I posted an entry on my personal blog.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: