A client of mine is starting with a small database project and the project along with the data will span over many years and require a very large amount of data space (terrabytes). What is an industrial-strength database that is reliable and has long-term support and reliability with Xojo in Windows? All users will be on a Windows 7 or Windows 8 Operating System for the foreseeable future.
My initial thought is to use MySQL, and to create individual databases because I will require speed and the project can be separated into many divisions. Each division will have a database with many columns (up to about 50), and each column will have at least 5 million data points. Searching, creating reports, data calculations, backups, and other sorting/filtering duties could cause the user an extended wait time, which is unacceptable.
Company Divisions (about 1000) * Database (50 columns x 5 million data points) = A lot of Data
Access to the database(s) are through VPN’s (intranet) which have reliable connections and is the business standard for this company. Exposing a user interface to the internet is not an option.
I am just curious as to what is a good direction to have as far as a database backend to limit issues and ensure scalability. What are some of the lessons learned that fellow Xojo programmers can share? I have worked on many smaller projects where SQLite has worked well, and the many data points with this project could cause instability.
Thanks for your help and guidance.
Eugene, most large ERP systems with multi-gigabyte to terabyte-sized databases run atop of one of the major commercial RDBMS. IBM DB2, Oracle or MS SQL Server. Traditionnaly, DB2 and Oracle have had the lion’s share, but in recent years, MS SQL has gained popularity. PostgreSQL is probably the best candidate among free RDBMS. Definitaly, mySQL would not be my first choice, as there is an ample body of evidence that it does not scale very well. (that said, SAP AG developed MAXDB from mySQL. MAXDB can support a number of SAP products) There are other good RDBMS systems. the choice is yours.
Regardless of the RDBMS you decide to go with, you will need to consider the design very carefully. Normalization rules don’t matter that much with small projects, but will make a difference when you scale up. Then, with such large databases, one must consider availability. The infrastructure must be able to manage the large transactional volume that comes with typical large data volumes.
I wouls select a solid RDBMS that is well supported by your development tools, that has good third management party tools and that will have a well documented scale-up roadmap, including distribution of the DB over multiple servers. That will save you or your client a lot of headaches going forward.
My suggestion would be SQLServer. You can start out with the SQLServer Express edition which is identical to the full version in most ways except for some limitations on CPU and memory usage. If the client outgrows it, they’ll probably be ready to start paying for it anyway. In my previous life, I had a client whose business ran on the lite version for years before we had to upgrade them.
I’d be using one of the major vendors DB’s - Sybase, Oracle, Enterprise DB (Postgresql with real support etc), or MS SQL server.
Splitting each company division into it’s own DB may actually NOT be advantageous - so be careful about that.
While you would be able to deal with each division in its own DB cross division queries & such may become problematic.
Been there tried that for the financial reporting while at a large pipeline firm in Calgary
I would also agree with the SQL Server suggestion. (tried to remain neutral before) It is my current preferred RDBMS on Windows. There are many excellent third party management tools, for example to help you synchronise schemas or data between databases (suppose you have a DEV, QA and Prod environments). I will not recommend any here, but Google will find them for you.
If you go the MS SQL route, as a personal preference, I would remain with 2008 R2, because you can use Access 2010 as a data and table stucture management tool, more convenient to use than MSSMS in my opinion. You will still need MSSMS for some tasks. Access 2013 lost the ability to work as a front end for MS SQL. And Access 2010 cannot edit SQL server newer than 2008 R2 (but it can display the content and structure of a table, or the code of a stored procedure).
Greg, Norman, and Louis,
Thanks for your suggestions. Initially, was leaning towards the Postgre SQL option because of the initial cost, and after reading the posts it looks like MS SQL is likely going to be the route. Yes, if the database is getting that large, a modest fee for keeping all of the data would/should be small in the large project scope.
I’m going to have to dig into editors a little more Louis. You make several good points about the table management tools - time to dust off a few books
Norman, your right about the many databases that can and probably exist and the lack of communication between all of them. My guess is that there will likely be some scope-creep and once the initial database program is complete, there will usually be suggestions for many modifications.
Greg’s idea is good that the database will start off smaller with free SQLServer Express and then has the ability to grow - which could be an AFE for another project.
Thanks gentlemen - great feedback! Now, where is that ‘answered’ button?