I’m refactoring a project that takes in a huge (possibly millions) number of lines of text and using NthField to get 3 out of 7 fields (separated by “|”). On a few hundred lines of input, the performance is non-noticeable. However, in real-world tests, the profiler identifies the method that contains the loop reading and processing the lines as the culprit in a serious slow down (I read 1000 lines at a gulp). Before I spend a day refactoring my NthField logic, has anyone done comparisons of Splitting the fields into an array and then accessing the specific members of the array versus using NthField?
Yes, it’s significantly faster. Like, blow-you-away type faster.
Every time you call NthField, it has to start from the start of the string to index the fields. By splitting it into an array, you’ve done the same as creating an index once.
By using the split option, I can get the fields explicitly. And that makes the rest of the parse/match logic much more straightforward (and removes two inner loops!).
To report back on this, my per-loop time went from an average of 430ms per iterations to 11ms per iteration. Multiply that 419ms saving times 1.5 million iterations and you can see why I am excited by the refactor.