Automation and lack of info in the original data

When must we stop the data analysis with user data that lack entries (for the analyze) ?

I have a list in a csv file that holds (between others) three columns:

a. Date Start
b. Date End
c. Number of “days” (it can be a number of days - Monday-Saturday - or a number of Sundays…).

I created a method that count the number of rows between two “entries”… and place this value into the correct Cell…


[code]Start Date: 1962-03-11
End Date: 1962-06-10

of Rows: // Is filled by the method[/code]

The list holds one Row for each entry to be tested (days of the week or Sundays).
The method scan the Rows (, Column) and if an “End Date” = “current Date” (date in the current Row), then the number of found Rows is set in the # of Rows Cell.

I noticed (in Row 9 !) that the program stop (I wrote it that way, it seems) if the end date is missing or incomplete or… And, when I put a correct - but imaginary * -date, the program continue the computation and report the # of found days until another glitch in the original data is found.

This is (more or less) OK for me - as a developer - because the user have to provide a correct list of dates for the software to run correctly, BUT, as a user, it is a shame.

The question here is: “When must we stop the data analysis with data that lack of correct entries ?”

Nota: in my case (and I am quite certain that other user’s data can be in that case), the lack of correct data exists because… I do not have these data !

For better understanding, imagine a list of historic facts:

Name of the fact: birth, death, battle, war, reign / presidential run, etc.
Date of the fact: start-end dates (birth-death / war start-end, etc.)
Span of the fact: number of years for example

In some cases, exacts dates either are unknow or only one on two is unknow, undefined, vague…

Imagine a date fact list applied to the Bible (Torah). What is Moses (Moshe) [or David or…] birth / death dates (in a flow of well know birth-death for historical people dates) ?

You as a developer must decide the appropriate action to take. Any of these are acceptable

  1. Skip the row / exclude it from the count
  2. Include the row in the count
  3. Stop counting
  4. Stop counting and report an error

In this particular case, option 1 might be best. Simply skip the row because it doesn’t have sufficient info to include it as being “between Start Date and End Date”. For other criteria, some other action may be most appropriate.

Thank you Tim.

At first, the Methond choosed item 3: stop counting.

Why “Reporting an error” does not comes to mind ?

In the mean time, I think at a case #5:
add a user field that define the scan start row: the user will be able to skip the part that lack vital data for the automated process (once the program report an error: Thank you Tim). In fact, this is just a matter or a variable loop start ( For LoopIdx = StartRowNbr to MaxRows).

So the appropriate actions can be:

Skip the row / exclude it from the count Include the row in the count Stop counting Stop counting and report an error Let the user decide the Row start for the action process