I have a large (1TB) CSV file that I need to convert to SQLite. When I try to use a Text Stream to convert it to SQLite, it works for a while, then dies with a buffer error after converting about 3GBs of data. I am reading one line, then saving that line to the SQLite database in a loop.
So my thinking as to move forward is three fold.
A) I have not tried to use a Binary Stream to look for the end of line, converting lines chunk at a time. My idea is that a Binary Stream may be less taxing on the system. Please feel free to tell me if I am wrong.
B) The second approach is to read only so many lines, then take a break with a timer. So that only so many lines are read. As each line read is added to the SQLite database, I could close the database, after a chunk of reads, which may help somehow.
Looping opening the database, then reading so many lines from the CSV file, writing to the database, then closing the database, and rereading may help. All the while I need to keep the CSV file open to hold the position of the reading pointer.
C) The last option would be to open the input file, open a new output file, write a GB of data, close the output file, open a new output file, and continue. Basically, split the CSV file into GB chunks. Then do the import one file at a time. The theory here is that keeping the SQLite database open with all these writes, fills a buffer for rolling back the transactions. And this is what is killing the process. So splitting the file would solve this.
Normally I don’t need to mess with data files of this size, so this is a little new to me. Any suggestions would be great. It’s been decades where I needed to think of memory size limits in an application. But this large file is blowing up my past solution.
Thanks for your help