Wednesday, July 18, 2007

Interpreting 2 digit years

As a good practice, it is always recommended to use 4 digits to specify the year. But a lot of applications may have a front-end that accepts 2 digit year as the input from the user. Or there could be a intergration with a third-party that requires us to parse 2 digit years.

We all understand the ambiguity in parsing 2 digit years. The logic that you would need to use depends on the business requirement. For e.g. if you are accepting birth-year then it cannot be greater than today and we can go 100 yrs back.

So how do different API's handle it?

The SimpleDateFormat class follows the following logic:
"For parsing with the abbreviated year pattern ("y" or "yy"), SimpleDateFormat must interpret the abbreviated year relative to some century. It does this by adjusting dates to be within 80 years before and 20 years after the time the SimpleDateFormat instance is created."

The Joda Time API takes a different approach:
"What is particularly interesting about this format is the two digit year. Since the interpretation of a two digit year is ambiguous, the appendTwoDigitYear takes an extra parameter that defines the 100 year range of the two digits, by specifying the mid point of the range. In this example the range will be (1956 - 50) = 1906, to (1956 + 49) = 2005. Thus 04 will be 2004 but 07 will be 1907. This kind of conversion is not possible with ordinary format strings, highlighting the power of the Joda time formatting architecture"

The default setting in Windows 2000 m/cs:
"Under Windows 98 or Windows 2000, two-digit years for the Year argument are interpreted based on user-defined computer settings. The default settings are that values from 0 through 29 are interpreted as the years 2000–2029, and values from 30 through 99 are interpreted as the years 1930–1999. For all other Year arguments, use a four-digit year; for example, 1924."

The Internet mail standard uses the following:
"Two digit years are treated as valid in the loose translation and are translated up to a 19xx or 20xx figure. By default, following the specification of RFC2822, if the year is greater than '49', it's treated as being in the 20th century (19xx). If lower, or equal, then the 21st (20xx). That is, 50 becomes 1950 while 49 is 2049."