Automation and lack of info in the original data

Emile_Schwarz · April 12, 2016, 9:01am

When must we stop the data analysis with user data that lack entries (for the analyze) ?

Example:
I have a list in a csv file that holds (between others) three columns:

a. Date Start
b. Date End
c. Number of days (it can be a number of days - Monday-Saturday - or a number of Sundays).

I created a method that count the number of rows between two entries and place this value into the correct Cell

Example

[code]Start Date: 1962-03-11
End Date: 1962-06-10

of Rows: // Is filled by the method[/code]

The list holds one Row for each entry to be tested (days of the week or Sundays).
The method scan the Rows (, Column) and if an End Date = current Date (date in the current Row), then the number of found Rows is set in the # of Rows Cell.

I noticed (in Row 9 !) that the program stop (I wrote it that way, it seems) if the end date is missing or incomplete or And, when I put a correct - but imaginary * -date, the program continue the computation and report the # of found days until another glitch in the original data is found.

This is (more or less) OK for me - as a developer - because the user have to provide a correct list of dates for the software to run correctly, BUT, as a user, it is a shame.

The question here is: When must we stop the data analysis with data that lack of correct entries ?

Nota: in my case (and I am quite certain that other users data can be in that case), the lack of correct data exists because I do not have these data !

For better understanding, imagine a list of historic facts:

Name of the fact: birth, death, battle, war, reign / presidential run, etc.
Date of the fact: start-end dates (birth-death / war start-end, etc.)
Span of the fact: number of years for example
etc.

In some cases, exacts dates either are unknow or only one on two is unknow, undefined, vague

Imagine a date fact list applied to the Bible (Torah). What is Moses (Moshe) [or David or] birth / death dates (in a flow of well know birth-death for historical people dates) ?

Tim_Hare · April 12, 2016, 9:50pm

You as a developer must decide the appropriate action to take. Any of these are acceptable

Skip the row / exclude it from the count
Include the row in the count
Stop counting
Stop counting and report an error

In this particular case, option 1 might be best. Simply skip the row because it doesn’t have sufficient info to include it as being “between Start Date and End Date”. For other criteria, some other action may be most appropriate.

Emile_Schwarz · April 13, 2016, 7:04am

Thank you Tim.

At first, the Methond choosed item 3: stop counting.

Why Reporting an error does not comes to mind ?

In the mean time, I think at a case #5:
add a user field that define the scan start row: the user will be able to skip the part that lack vital data for the automated process (once the program report an error: Thank you Tim). In fact, this is just a matter or a variable loop start ( For LoopIdx = StartRowNbr to MaxRows).

So the appropriate actions can be:

Skip the row / exclude it from the count Include the row in the count Stop counting Stop counting and report an error Let the user decide the Row start for the action process