Extending the range of time

Until Qt 6.2, QDateTime's ability to take time-zone adjustments – both seasonal daylight-saving time and occasional changes (on the whims of politicians) to a zone's standard offset from UTC – into account was limited to the years 1970 through 2037, with some kludges in place to extrapolate beyond 2037. As the default assignee for time-related bugs, I'd long wanted to fix this, to use such information as we do have available. So – finally, once Qt 6 had been released – I made that change and discovered what it broke.

I was, of course, expecting things to break: the restriction wouldn't have been there if there weren't problems with accessing time-zone data outside that range and, in any case, extending the range meant some code would be exercised in ways it never had been before. So I was ready for breakage and fixed various things before unleashing the change on unsuspecting colleagues and users. None the less, of course, there were subtler problems that eluded my testing and ended up surfacing as bugs once this change was released.

I discuss all of the following in terms of local time (using Qt::LocalTime as the spec parameter to QDateTime's constructor; this is the default spec), as that's more commonly used, but some of it applies also to times specified with respect to a particular time-zone (passing a QTimeZone parameter to the constructor, instead of optional spec and offsetSeconds). In particular, as some local time code paths now exercise time-zone code paths as fall-backs, bugs in the time-zone code have also shown up (and I've fixed them).

Why was it limited ?

Before I embark on describing the consequences of the change, I'd better explain why the range used to be limited.

There's a type called time_t used by system APIs for mapping between local time and UTC (Coordinated Universal Time, the modern replacement for Greenwich Mean Time); it counts the number of seconds since the start of 1970 in UTC. There's another type, called struct tm, that describes a broken-down time, with various fields for the year, month, day, hour, minute and second, along with a flag to indicate whether the daylight-saving status of the time is known and, if so, whether it's in daylight-saving time or in standard time. There are various standard functions to convert between these two representations of a point in time. While time_t is always a UTC time, a struct tm can represent either local time or UTC.

Back in olden times, when time_t was a 32-bit signed integer, it could represent any value from −231 to 231−1. It counts seconds. Now, 231 seconds is 68 years, 18 days, 3 hours, 14 minutes and 8 seconds. So the old 32-bit time_t could only represent date-times in the range from 1901-12-13 20:45:52 UTC to 2038-01-19 03:14:07 UTC (in the only sane time format). Thus, back when time_t was a 32-bit type, system APIs couldn't tell us about local time's offset from UTC (much) beyond the end of 2037.

These days, most computer systems support 64-bit integral types and an obvious solution to the so-called 2038 problem was for them to use such a type as time_t, thereby extending its range. Every major operating system has by now made that change, so the only thing holding Qt back from reaching beyond it was simply that the code assumed it was a limitation and didn't call the system functions for times beyond the end of 2037. Instead, it tried to extrapolate what would happen beyond that date by looking at the time-zone rules in 2037. That's in principle a reasonable thing to do (still), given that no time-zone has yet committed to changing its daylight-saving rules after 2037; indeed, the politicians responsible for such changes seldom give those of us who maintain software even one year's advance notice of such changes, much less fifteen or more. I'll come back later to how that extrapolation didn't quite work robustly.

In any case, the relevant system functions' specifications don't require them to work for times before the start of 1970: indeed, some of them are specified to return a time_t value of −1 as their way of reporting an error, which makes it tricky to represent the last second of 1969. Most Unix implementations of these functions do, in fact, cope with times before 1970, but Microsoft has always held to the 1970 start, so we couldn't have cross-platform support for correct handling of local time's offset from UTC before 1970.

The problem with extrapolating

As mentioned above, the prior code extrapolated forward beyond 2037; and lifting the 1970 boundary was going to require doing similar, at least on Microsoft, into the past.

The code that extrapolated forward was just using 2037's data: it took the same time and date within the year and found what offset 2037 was using on that date. Unfortunately, there's a problem with that: a given date's day of the week shifts from one year to the next and many zones' daylight-saving rules put the transitions on Sundays (for example, the last Sundays of March and October). That means that a given date near the transition may be before or after it, depending on what day of the week it falls on.

So using the same date in 2037 was wrong for at least some dates. Likewise, using a date in 1970 wouldn't be a robust way to determine the time-zone's state on the corresponding date in earlier years (even if the rule in effect in 1970 did date back all the way to the year in question).

Fortunately, the day of the week for any given date can be determined entirely from the day of the week of the first day of its year and how many days later it is, which at most depends on whether it's in a leap year. It is easy enough to construct simple tables mapping day of the week to a year that starts with that day of the week, one for leap years and one for ordinary years. We can then use those to select a year in which (assuming the same rule is in effect) the system functions will give the right offset to apply for a date-time in a year outside their supported range.

The problem with that, in turn, is that the range of leap years needed to find one starting on each day of the week is rather long – at least 24 years between the first and last in the table – which increases the risk that the zone had a change of rules in that interval.

Further complications

Fortunately I haven't yet had to cope with this. In principle, a 64-bit time_t reaches about 292 gigayears, or about 21 times the current age of the universe, to either side of 1970. Since struct tm represents the year with a 32-bit int field, it can't represent the whole of that range. So even with a 64-bit time_t, the relevant system functions can only represent times within a little over 2 gigayears either side of 1900 – the offset that's applied to that year field. (At the start of this range, the Oxygen produced by cyanobacteria was causing iron dissolved in Earth's oceans to settle out as iron oxide, almost wiping out life, which was still all single-cell.) That would be plenty wide enough for QDateTime, though – it represents time by a 64-bit count of milliseconds since the start of 1970, so only reaches 292 megayears either side of that moment (a range that starts before the rise of the first dinosaurs), less than a seventh of the range struct tm can represent with its 32-bit year field.

So system types can now represent the full range of QDateTime values and we could naïvely hope our problems would be solved. However, as we've already seen, operating systems don't necessarily arrange for relevant system functions to cover the full range of their types. Just as Microsoft cuts off at the start of 1970 (when time_t is zero), Apple cuts off at the start of 1900 (when the year field of struct tm is zero); and Microsoft also has a cut-off in year 3000, beyond which its system functions decline to provide useful results.

So it remains necessary to extrapolate outside the system-supported range. In the case of Apple, this is fairly straightforward, as I happen to know that no zone actually used daylight-saving time until 1908, so we can just use the standard time offset at some point between 1900 and 1908, trusting that it was in force for all prior times. (That isn't strictly accurate, as most places used local solar mean time until the arrival of the railways, which is what prompted the invention of time zones in the first place, but it's good enough for most practical purposes and certainly the best we can hope to do under the circumstances.)

In the case of Microsoft, the situation is trickier. As well as the system functions excluding times before 1970, the sytem only comes with time-zone information reaching back a few years, so what it'll tell you for the offsets even in the 1970s isn't reliable – it's extrapolated backwards from the rule in effect at some later date, regardless of the historical reality. None the less, when that's the data we have, the best we can do is use it and, likewise, extrapolate backwards from it to all of history before 1970. We also have to do the same for the future beyond 3000, but at least that forward extrapolation is reasonable (as noted above).

One other thing we can do, aside from extrapolating backwards based on what the system functions tell us, is to use QTimeZone. As long as this is able to discover and represent the system time-zone, we can bypass the system time_t / struct tm functions. That saves us the need to extrapolate in most cases. Then again, when QTimeZone is obliged to use Microsoft's system APIs for time-zone information (it prefers the ICU library when avaiable), it's subject to the same limitation on past zone information as the time_t functions. (If Microsoft's data claims a recurrent daylight-saving rule dating back forever, we cut it off at 1900, using its standard time before then, because no zone used DST before 1908.) We do have plans (see QTBUG-68812) to add a new C++20 time-zone backend, based on <tz.h>, that'll give us consistent cross-platform time-zone information – and let me delete a lot of code – but Qt 6 currently only relies on C++17 features.

Expanding the range

So there was a lot of preparation to be done, before the old limitation to the years from 1970 through 2037 could be lifted, but once we understood the problems it was time to set about fixing them. The 2037 boundary was the one we had the most pressing need to lift, since that's closer to the present than 1970 and getting closer with every passing year: that was QTBUG-73225 and, all things considered, it was the easier boundary to fix, since all major platforms already do in fact have time_t functions that work beyond that boundary.

Somewhat more challenging was QTBUG-80421, the lifting of the 1970 restriction on Windows (and, in the process, the previously unnoticed 1900 boundary on Apple). For this I had to extrapolate available time-zone information to before 1970. When the sytem time-zone information was available via QTimeZone, this was easy enough; but the fall-back from that required extrapolation from the time_t functions. As noted above, this is tricky and involves some risk for leap years. Fortunately, we only have to exercise it when we can't represent the system zone as a QTimeZone. On the bright side, removing the logic to detect whether to take time-zone information into account did make the code simpler, allowing some clean-ups.

Those both made it into 6.2 in good time for its Feature Freeze (after all, they were significant behaviour changes), along with fixes for the first few bugs they exposed. Fortunately, we have quite robust testing of the date-time and time-zone code these days, which found most of the problems nice and quickly (and we now have more testing, for the bugs found later). None the less, we've had a few bugs reported by those who use Qt and I can guess we'll see one or two more. The QTimeZone code is now being exercised a lot more than it was before, due to use as a fall-back, which has been where most of the long-hidden bugs have been lurking. There have also been some bugs with arithmetic overflow, since we do have test-cases close to the boundaries of the range of values QDateTime can represent. Some bugs remain (of course) – QML's Date type has issues on some platforms, getting the wrong time-zone offset for local time, for example – which shall get attention as time permits.

The net upshot of it all is that both QDateTime and QTimeZone are now more robust – and ready for the Epochalypse.


Blog Topics:

Comments