I have an interesting issue where I have two file formats that are pretty similar and quite complex. The difference between them is there are extra rows in one file compared to another. It makes sense to read the file with one method.
Our existing code uses an enum to document the row numbers for each of the elements of interest. Obviously with the update to two different file types the value list of the enum will be different based on the filetype. So, that lead me to think about passing the enum into the method. However, there doesn’t seem to be a way of doing that. Note I mean passing the whole enum not a member of it. The following code would allow me to pass a value of an enum into a method:
Function Process(row as MyEnum)
' Code here
' Used like this:
But it doesn’t allow me to pass the whole of the enum. Code that would look something like this:
' This doesn't work
Function Process(AnEnum as Enum)
' And neither does this
Function Process(AnEnum as Enumeration)
The closest idea I’ve had to to create a class with two subclasses, each containing a version of the enum.
Protected Class SecondFileRows Inherits RowDescriber
Enum, Name = RowNumbers, Flags = &h0
RowX = 5
RowY = 11
Protected Class FirstFileRows Inherits RowDescriber
Enum, Name = RowNumbers, Flags = &h0
RowX = 1
RowY = 10
Function Process(EnumClass as RowDescriber)
if row = EnumClass.RowNumbers.RowX then
' Do what is required
' Called something like this:
If filetype = 1 then
Process( New FirstFileRows )
Process( New SecondFileRows )
I still think there are issues with this approach as the base class doesn’t know that the enum exists so the Process code would likely complain about the references to RowNumbers.
Any pointers would be welcome. How can I decide which enum is used in a Method based on code.
Closest I’ve come is to have a single class containing properties RowX and RowY and set them in the constructor. Alternatively, create a structure to house the properties and again assign them in the constructor. The structure at least allows me to have groups of properties such as:
etc. It’s an improvement over my previous thought of having Computed properties that would decide on the fly. Trouble with that is it has to decide each and every time one of them is accessed, whereas, the assign at constructor method only requires a single if to create the set.
I think you’ll have to fake the enums using classes and passing an instance around. It might be best to use methods to get the values rather than properties as you could then make both classes inherit from the same interface.
Hmm… whilst it would work the file could have 1,000,000 rows in it and all those function calls will take a lot of time. My structure method at least is only a memory access, with no stack operations involved.
In the process is something like this:
For each Row in File
Select case ctype( Row, Class.MyEnum )
There are a great number of RowX type options and many many loops.
Basically, all I’m after is a group of constants (or several groups) that will be as easy to read as the above code. Also, change the constants efficiently based on the type of file being processed.
RowX as Integer
RowY as Integer
Public Property RowNumbers As RowNums
Public Sub Constructor(Type as integer)
If Type = 1 Then
RowNumbers.RowX = 1
RowNumbers.RowY = 2
RowNumbers.RowX = 2
RowNumbers.RowY = 10
Now my code would look like this:
For each Row in File
Select case Row
RowNumbers is a fake enum (structure) with RowX and RowY as elements.
Yes, I know. But each time I call your Method GetRowX I will incur a function call. Even if I cache all of these parameters in local variables you have added 80 odd local variables to the memory footprint, plus the time taken to perform that action. There isn’t just one file to be read. It could be hundreds of them.
Using my method I have a single class that can be instantiated once for each file type at the start of the application. Then as the code is working to process the file I pass the appropriate instance into the function for each file it processes.
The method looks exactly the same on the inside. The enums simply become configurable at startup. There is a single place (the constructor) where you can see all the enums and the values assigned to them, rather than having to look in 80 odd methods to ensure you have the correct values for each element.
Not quite, you suggested copying them into local variables. Which would have to happen at the start of each file. My method doesn’t require the local variables at all.
What I am saying is that the code using enums looks identical to the code using my method. It is clean and has no penalty for lookups. There is nothing to return. I define some structures and simply fill them in the constructor. In that constructor every variable is configured. I’m only using structures as I have 5 enums in the first place. Each structure is effectively an enum replacement, but configurable at runtime.
I didn’t have to create 160 methods to return my variables (80 parameters, two file types).
Speed and memory utilisation really really matters to this application. It runs 1000s of patients worth of data unto 1,000,000 times for each of up to 10,000 configurations. Each of these runs can take 5-10 seconds. e.g. each patient could be processed 10,000,000,000 times and there could be 100,000 patients. The Core of the application is written in C++ for speed and xojo is the interface. Oh, yes and the user could have many of these datasets to process.
I imagine a structure would be quicker than calling methods for every row you iterate. However, if you are accessing the structure members for every row you iterate you might still get a boost from using local variables.
A quick test iterating 100,000,000 told me the following:
32 Bit App
• Accessing 5 class methods = 35.48 seconds
• Accessing 5 structure members = 9.07 seconds
• Accessing 5 local variables = 8.78 seconds
64 Bit App
• Accessing 5 class methods = 22.19 seconds
• Accessing 5 structure members = 4.13 seconds
• Accessing 5 local variables = 3.69 seconds
If you are processing that amount of data are you running multiple instances to process more than one patient at a time?
Yes, the code is able to run up to 124 threads at once. The source data is in an XLSM workbook and it transferred into a database, each thread then works on its own DB file. Everything is then brought together at the end.