Strategy with dates inside imported data (from text files) ?

The question is simple and as usual, the answer can be a bit complex.

If in your software you have to import text files from elsewhere, how do you deal with the embedded Dates (if you have some) ?

We all know how easy errors comes into our developments, how easy imported data can have errors… So, as usual, we have to put strategies to correct as much as possible bad data.

I think at… SQLDate (only):
Discrepencies in Date Fields Delimiters (mix of different delimiters in the same date / document),
Having the Year Field defined with two AND four digits (I have examples in wikipedia of list with both…)
Not knowing (without getting an eye on the list and searching if I can discover) what field # describe a Day number (in one or 2 digits) or a Month number (in one or 2 digits).

And in this Internet age with data exchange from / to anywhere, the complexity rise to the top !

If we only take 3 dates definitions, we have people who are in 5777, some others in 2017 and some others in 1438.
And I will not talk about India where the existing official calendars vary depending on the region you are living.
No judgement here, just facts (and I hope they are exact).

I nearly forgot: I’ve made some successive quest about this question in many days, but either I was unable to type the correct sentence to get an answer or there is no answer (everyone code his/her solution in her/his corner).

What is your experience ?

Doesn’t this help:

ParseDate is very limited. You are almost always better off coding your own routine.

For SQL dates the answer is simple at least for the classic date class.

I wrote an extended ParseDate routine a few years back that will accept almost any reasonable date format and figure out what the proper SQLDate or SQLDateTime value should be… It is not perfect (and I doubt such a routine exists)… but it does way better than ParseDate can do. I have used it in many projects including CPI - Custom Property Inspector that I am working on now

You could also check out the ParseDateMBS function in MBS Plugins.
It allows you to specify the format of the dates and parse them.

[quote=357738:@Christian Schmitz]You could also check out the ParseDateMBS function in MBS Plugins.
It allows you to specify the format of the dates and parse them.[/quote]
But the idea would be that you don’t KNOW the format of the incoming data…

Wouldn’t it be good in that case to try a few formats and check if you get valid data for one of them?

The difficulty with ‘guessing’ the format and running through a whole bunch is the UK/US issue:

UK : dd/mm/yyyy US : mm/dd/yyyy
Receive a date like “12/10/2017” will pass both the US and the UK tests but give a completely wrong answer (12 October versus 10 December).

The only way I have found to work this issue is to insist to the user that the format is “X” and that THEY must ensure that the sent data is formatted correctly (it is their data, after all). Whilst this get some initial resistance the users generally accept the issue. In all other cases I have found with importing financial data (QIF, OFX etc.) is to check the initiator (which bank or credit card company) and have switches inside the code that recognises the formats used by them.

When I am talking with a new organisation it is one of the first questions that I ask them - what is your date format standard?

I have to say that every time I have written code that attempts to guess which format is used I am wrong! Ask the question and code according to the answer.

Well, you can run through a lot of dates and see if one of the mm or dd fields has more than 12.
Than you may know it.

But for edge cases you will need to ask user.

Thanks for your inputs.

The strategy is:

  1. Check the two dates fields Delimiters and make a user report if there are differents in the same document.

  2. Check the date length (# of digits + Delimiters) to determine if the year is 2 or 4 digits. USer report too

  3. Check field 1, 2 and 3 contents:
    a. if the field have a value higher that 12 it may be a day definition; if that same field is > 31, this is a year field
    b. if the field have a value below 13: this may be a month definition
    c. if the field have value is higher that 31 this may be the year definition; of course 4 digits here means a year definition field.

Of course, all of there in a loop that check the whole column.

A check for a little bit more than 16000 Row runs in less a-of a second (say 1 second).

I actually only coded parts 1 and 2.

Entry # 2 may be pointless…

If a report is issued to the user, it will contains a warning about the dates validity thrugh the document.
Past a certain point, all we can do is to put the task in the user’s hands. *

Why not using ParseDate ? Who says the Dates in the imported text file is in the application running locales (have the locales Delimiters) - as someone already states.

  • Of course, in the case of wikipedia list of data with wrong dates formats (/ and - mixed Delimiters, mixed # of 2 and 4 digits, etc.), if one cares, he better report the error to wikipedia.

That’s why in the rest of the world, the ‘911’ is called ‘119’ .

Joke aside, it’s a shame that we do not use the same phone # to call the urgency (911 or whatever #, but only one number for all countries).
What will happens if you need to call the 911 and this is an unknow to you number in the country you are currently visiting ?

I think it is the 18 in France (unsure… 17 is for the Police; other numbers may exists).

Well, the standard number is 112 in all of Europe and a couple of more countries.
911 is only used in a few countries.

SO when the US switches to metric system, they can also switch to 112 :slight_smile:

[quote=357886:@Christian Schmitz]Well, the standard number is 112 in all of Europe and a couple of more countries.
911 is only used in a few countries.[/quote]
and here I thought it was 999 in Europe

THAT will never happen… unfortunatly we have 250 years of doing it “our way” that you can’t get millions of people to change

Bob Dylan: “The Times They Are A Changin’”.

That said, I certainy will ever see that future.

Population (2017 estimate): 325,365,189; comparethat to the whole humanity. When the benefit goes to masses, we have to believe in changes.

As of 2017, the US is part of a group of 7 very influencial countries not officially using the International System (the metric system was the French ancestor of SI), including Myanmar, Liberia, Palau, the Marshall Islands, the Federated States of Micronesia, and Samoa. Of that group, Myanmar, Liberia and Samoa are essentially metric, despite not having official regulation.

The funny thing is that circa 1965 -1970, the USA was planning metrification, ahead of many countries now completely metric. The plan was scrapped under the Reagan administration.

and so glad they did…

Not seeing those other 6 as " very influencial countries" … most people don’t even know they exist

I suggest we all transfer our emergency numbers worldwide to ‘505’ since it looks closest to the international distress call ‘S.O.S.’.