External Text Files Verses Embedded Text Files: Updating Data & Hiding Data

Currently my program opens 5 text files kept in the same folder as the application and places the data from the files into variables. Three of smaller text files (37KB to 3.8MB) pertain to a web site and will have content that may slowly change over time depending upon when the web site adds new categories (i.e., a few times a year). The two largest files (24MB & 15MB) have content that will not change unless I happen to find some unlikely errors in the data.

I am torn between leaving the data in the existing files or embedding them the project. I really don’t like the idea of using the external files as users could alter one of the files the program may not have any data to work with. The text files also provide competitor’s easier access to the data so they could use it for their own version of the program (there are currently no competitors).

On the other hand if I embed all the files users may be required to make a larger download as opposed to just updating a text file that has changed. The program is less than 6MB, but embedding the data would up it size to approximately 44MB; however, that’s likely not as much of problem as it was years ago with limited dial up band width and speeds.

Assuming I embed the data what would be the best way to at least partially hide the data? The program itself would likely be donation freeware so we are not talking about trying to hide a serial number. Would rc4 be a good option as in this post:


Assuming I use text files what would be a good cross platform way to hide the data in the text files? The program will for now be only for the Mac Intel platform and will likely remain for in-house use for a 6 month testing period.

I’ve done something similar with a project. The way I did it – and I’m not saying it’s the best way it’s just the way I did it – was to store all the data as XML. I had three XML files: the original that shipped the with program, a second one that included whatever additions/changes the user made, and then a third that was dynamically loaded from my website with whatever new data or corrections I made available.

On program launch I first loaded the main file into my data structure, then the update file (which possibly corrected errors in the first file). The final one was the user file, which might contain changes to the original data that the user had made. This worked very well in my testing (though the product hasn’t shipped yet) and allowed me and the user to update the data (and give the user’s version priority as user changes are loaded last and thus overwrite the existing data).

In my case only the main file was embedded inside the application; the other two were stored externally in the Applications Data folder (so the user wouldn’t see and mess with them).

I didn’t encrypt the data, but I certainly could have. That would be my advice if you didn’t want competitors messing with it. Note that text files stored in your application are easily available by anyone simply opening up the bundle.

But while (OSX) they are directly visibile in the bundle… DO NOT ALTER THEM (other than via the IDE and a recompile). Otherwise you run the risk of messing with the CRC and causing the app not to work at all