Date Parsing Ahhhhhhhhh!!!!

I loathe dates. Right, my app loads text files/xml files etc and I extract the dates. Dates could be in any format dd/mm/yy mm/dd/yy dd mmmmm yy etc you get the picture. I can’t use the ParseDate function as the files may be international and date format may be US, UK etc etc. All the dates in a single file will be of one format so I don’t mind having an intermediate window where the user specifies their date format. I would prefer it to be autonomous but I guess with dates there is too much variation. Any ideas/advice?

By the way what I want to end up with is either a Date object or a string in the format YYYYMMDD.

Reply with full commented code. Ha ha, joke but please any help appreciated I only have a small amount of hair left to pull out.

I have a similar situation but all are distinguishable. I simply try format 1, see if it matches the data, format 2, etc… You, however, I think are stuck because you have dd/mm/yy vs. mm/dd/yy, i.e. 01/02/03 could be Feb 1, 2003 or Jan 1, 2001. You have no idea w/o asking the user what’s up.

here is what I use… trys PARSEDATE … if that fails it begins “guessing”

Hopefully I included all the support routines… if not, let me know and I will post them

FUNCTION IsValidDate(byref in_date as string,fmt as integer,include_zero_day as boolean=false) as boolean
  Dim theDate As New date
  Dim i As Integer
  Dim m As String
  Dim x As Integer
  Dim y As Integer
  Dim t As String
  Dim ok As Boolean
  Dim mth As Integer
  Dim dy As Integer
  Dim yr As Integer
  Dim s As String
  Dim v(-1) As String
  If fmt<0 Then fmt=optionDATEFMT
  s=Trim(in_date)
  
  If s<>"" Then
    ok=ParseDate(s,thedate)
    If ok And theDate.year<100 Then
      yr=FIX_YEAR(Val(Right(theDate.Shortdate,4)))
      s=Left(theDate.ShortDate,6)+Format(yr,"0000")
      ok=ParseDate(s,theDate)
    End If
    //
    If Not ok Then ' lets force things a bit
      t=cleanup_slashes(s)
      For i=1 To 12
        m=Mid(MonthNames,(i-1)*3+1,3)
        x=InStr(t,m)
        If x>0 Then
          // if full month name .. blank out all but 1st 3 char.
          dy=0
          For y=x+3 To t.Len
            If InStr("/0123456789",Mid(t,y,1))>0 Then
              dy=y
              Exit For
            End If
          Next y
          If dy>x+3 Then t=Left(t,x+3)+Mid(t,dy)
          //
          If x>1 Then
            t=cleanup_slashes(Left(t,x-1)+"/"+Mid(t,x,3)+"/"+Mid(t,x+3))
            x=InStr(t,m)
            
            y=InStr(x,t,"/")
            mth=i
            yr=0
            dy=Val(ReplaceAll(Left(t,x-1),"/",""))
            If y>0 Then
              yr=Val(ReplaceAll(Mid(t,y),"/",""))
            End If
            If dy>31 Or yr=0 Then
              y=dy
              dy=yr
              yr=y
            End If
            t=Str(mth)+"/"
            If dy>0 Then t=t+Str(dy)+"/"
            t=t+Str(fix_year(yr))
          Else
            t=cleanup_slashes(Str(i)+"/"+Mid(t,4))
            x=5
          End If
          Exit For
        End If
      Next i
      //
      For i=1 To Len(t)
        m=Mid(t,i,1)
        If Not IsNumeric(m) And m<>"/" Then t=ReplaceAll(t,m,"/")
      Next i
      t=cleanup_slashes(t)
      
      v=Split(t,"/")
      If v.ubound>0 Then
        For i=v.Ubound DownTo 0
          If v(i)="" Then v.remove i
        Next i
        '
        ' if v(0)>31 then assume that YEAR was put first
        If Val(v(0))>31 Then
          v.append v(0)
          v.remove 0
          t=v(0)+"/"+v(1)
          If v.Ubound=2 Then t=t+"/"+v(2)
        End If
        //
        yr=fix_year(Val(v(v.Ubound)))
        v(v.Ubound)=Str(yr)
        If v.ubound=2 And Val(v(1))=0 Then v.remove 1 ' 00 was day value
        If v.Ubound=1 Then 'missing day
          t=v(0)+"/1/"+v(1)
        End If
        ok=ParseDate(t,thedate)
      End If
    End If
    //
    If ok Then
      thedate.year=FIX_YEAR(thedate.year)
      s=formatdate(theDate,fmt,(v.ubound=1))
    End If
  End If
  If ok Then in_date=s
  Return ok
FUNCTION FIX_YEAR(yr as integer) as integer
  If yr<100 Then
    If yr<50 Then
      yr=2000+yr
    Else
      yr=1900+yr
    End If
  End If
  Return yr
END FUNCTION
FUNCTION cleanup_slashes(t as string,include_colon as boolean=true) as string
    Dim s As String=" -.\\:,;+"
  Dim i As Integer
  For i=1 To Len(s)
    If (Mid(s,i,1)=":" Or Mid(s,i,1)=" ") And Not include_colon Then Continue
    t=ReplaceAll(t,Mid(s,i,1),"/")
  Next i
  While InStr(t,"//")>0
    t=ReplaceAll(t,"//","/")
  Wend
  While Right(t,1)="/"
    t=Left(t,Len(t)-1)
    If t="" Then Exit While
  Wend
  If Right(t,2)="/a" Or Right(t,2)="/p" Then t=t+"m"
  If Right(t,3)="/am" Or Right(t,3)="/pm" Then t=Left(t,Len(t)-3)+Right(t,2)
  Return t
END FUNCTION
FUNCTION format_date(dt as date,fmt as integer,force_zero_day as boolean=false) as string
  Dim xday As Integer
  xday=dt.day
  If force_zero_day Then xday=0
  Return INPUT_DATE(Format(dt.Year,"0000")+Format(dt.Month,"00")+Format(xDay,"00"),fmt)
  //return Format(dt.Month,"00")+"/"+Format(dt.Day,"00")+"/"+Format(dt.Year,"0000")
END FUNCTION

Yes, thats what i’m thinking. My main issue is that one of my file formats I am supporting is QIF which does not have a specified date format, the other is CSV which again could be any format of date. I don’t mind asking the user but what I have done so far is asked them to type in the format dd/mm/yy or dd/mm/yyyy etc etc. As a developer we know the difference between yy and yyyy but will the end user. Also what if they write the format as dd.mm.yy but the delimiter in the file is / . There are so many iterations.

I have also read that QIF can contain month words 12 July 2013. I’ve not seen this yet but apparently so. Ok I could use IsNumeric to test but just another iteration. Then what if my user is french for example 12 Juillet 2013. Do banks in other countries format their dates according to their language?

The code I just posted will handle almost ANY input date format… almost

Thanks a Million Dave. I’ll give it a whirl and see. I’ll let you know how I get on…

oh yeah… and I believe it parses TIME too :slight_smile:

Dave, I can’t see where fmt is used?

almost the very last time

s=formatdate(theDate,fmt,(v.ubound=1))

default it to 1

It is supposed to choose what the return format is…

It takes Any Date in, and will reformat the input value into any other format

I use it in an app where all the dates should display the same, but it allows the user to input the date anyway they see fit

sure wish you could edit posts…

Looks like FORMATDATE is what doesn’t use it…

It may have been code I promoted from an even older project, and removed that piece

@Dave, how does your parsing tell the difference between 01/02/03 and 02/01/03 ? i.e mm/dd/yy vs. dd/mm/yy ?

Absolutely spot on Dave. Got it working. Have you any formats you know it fails on? I’ll put it through its paces to check. I couldn’t find MonthNames variable but defined it as String = “Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec”, is that ok?

Yes Jeremy, your right. I guess this is where I need to ask the user the format but then it will defeat the object of all the above code.

Dave?

remove the commas
“JanFebMarAprMayJunJulAugSepOctNovDec”

and for 01/02/03 vs 02/01/03

if it finds two itmes that could be day or month, it assumes the American way… Month is the first one, day is the second.
year is always last, unless it is obvious a value is a year “2003” vs “03”

So 01/02/03 is Jan 02, 2003 and 02/01/03 would be Feb 01,2003

But 2003-Feb-01 works (or it should)…

I cut that out of a Personal Database Program (keep track of passwords, website, insurance, domains etc, etc etc) that I wrote a few years ago.

Toss in foreign month & day names and different orders and the fact that at least on OS X a person CAN define a separator that is several characters and dates become very hard.
WWDC 2013 has a session on dates - Solutions to Common Date and Time Challenges :stuck_out_tongue:

Reality is that you can guess - sometimes wrongly as 01/02/03 could be any of Jan 2, 2003, Feb 1, 2003, Mar 2, 2001 or any other combination you can think of as they could each be day month or year

Asking the end user is not a bad option

Where you have control write it in a very nice simple unambiguous form ISO 8991 is nice (YYYY-MM-DD HH:MM:SS GMTOFfset)

The code I posted should only have an issue if the entered date is ambigous (sic) and not in an “American” format (sorry, but I’m American, and write most software first for myself, and then for anyone who wants to purchase it).

It DOES deal with almost any separator (independant of what the OS was told), which is pretty much what the “CLEANUP_SLASHES” routine does…

But hey, I posted it as a “here is how I do it, perhaps you will learn something from it”… not as the end all do all :slight_smile:

Since the date format will be the same throughout a single document, how about keeping track of the max values in each position as you parse each date. If any are 4 digits (or if the value > 31) then assume that position is the year. Otherwise, use the last position (or the first depending on what format is most likely). Then, if either of the remaining positions > 12 then assume that is the day, otherwise assume whichever you feel is most likely to be the day. Then use this assumed format as the default you show the user for approval.

Mark, that is sort of what I do now, make some assumptions and then have a popup list of the possible formats their dates could be and then let them choose. Its a real issue as I need to assume that any user downloading a bank statement and opening it up won’t really have a clue what format the dates are in within thier downloaded statement. Its really frustrating. I now know why QIF statement format has been abandoned although a lot of UK banks still offer it for download. One of my banks only offer QIF, CSV both of which are a real pain to parse due to the varying formats and loose specifications.

“Where you have control write it in a very nice simple unambiguous form ISO 8991 is nice (YYYY-MM-DD HH:MM:SS GMTOFfset)”

No idea how fastener design got into this ::slight_smile: (The date format to which you refer is actually ISO 8601.)

[quote=14862:@J Andrew Lipscomb]“Where you have control write it in a very nice simple unambiguous form ISO 8991 is nice (YYYY-MM-DD HH:MM:SS GMTOFfset)”

No idea how fastener design got into this ::slight_smile: (The date format to which you refer is actually ISO 8601.)[/quote]

typing without going to look up the right name for it :stuck_out_tongue: