Sync Methodology

I have been pondering this for a while and not sure how to approach this.

Lets say I have two computers… each has the same app… and an associated database.
Information is entered, deleted, changed on either or both computers.

Exactly what process would be used to determine what transactions need to occur so that both computers end up “in synch”,
I have seen apps that adverstise “iCloud” or “Dropbox” synching…

Synchronization is a science in itself. It’s incredibly difficult to do and there are multiple ways to approach it. The variables of how your application functions and what the data represents is very important factors to finding the ideal scenario that will work for you.

It IS complicated that is for sure. The addition of records is not so much an issue, it is the edits and deletes. At some point you have to decide what to do when two or more users edit the same record and there is a conflict. Time and date stamping the edits would seem a necessity but that still doesn’t make the logic any easier.

I would say the challenges in this were one of the reasons for the birth of cloud computing.

Is there any way to use a single remote or cloud based database?

This particular app is designed to hold personal information in a secure manner… Password, Credit Card Information, Software and Domain registrations, medical records and home inventory etc… etc… There are 22 tables of “data”, and 10 tables of support informataion such as screen layouts, reference values etc.

If installed on a LAN, yes it can use a central database… located on any device on the LAN.

What I’m trying to come up with is a way to allow the user to have the app and copy of the database on say a Laptop… and be able to transfer the changes to the master database when the laptop is connected back to the LAN again.

The records are timestamped already as to when they are created and/or updated, so it would be possible to determine a record changed by comparing the record id and timestamp… but that would only work for records that existed when the laptop left the LAN. Records added to either the Master or Remote copies would have the same record ids but may be totally unrelated otherwise…

I may just have to rely on the user to know what to transfer… The app has the ability to export a single record in XML that can then be imported to another copy of the DB.

Its not a show-stopper feature…but it would be nice to have.

Basically this is a “merge”

  1. data that has the same keys compare the timestamps and the newer one overwrites the older in either direction
  2. data that is only on one machine (say the laptop) which you should be able to tell then gets pushed to the other machine(s)
  3. lather rinse & repeat for EVERY other machine checking for data it has that no others have

As long as keys are guaranteed to be unique across app instances then you can do this
Uniquing will be the challenge so you can tell; what data is on one machine & not the other when you may edit on both

Had the same issue back in the early 90’s before the net as we know it today. This is how we resolved it.

  1. Each record in the database maintained a modify time stamp and a create time stamp.
  2. At some point the two systems would connect (back then via dial up). System 1 would send all data that was created or modified since the last sync. System 2 (the receiving side) would import/update the data with the following rules. If a record was created and did not exist in System 2 then it would be added. If the record already existed and the modified timestamp was later then the modify time stamp on System 2 the the record would be updated else it would be ignored.

The process would then reverse it self.
This process worked well and there were only minor issue. Such as when a record was modified on both systems. You then were faced with the chicken or egg scenario. In our case we opted to go with the most current record.

That sounds like it would work… EXCEPT
for records created brand new in each DB copy… you must have a Primary Key to determine that a record from Database A is the counter-part record to a specific record in Database B… and Create/Modify Timestamps are no where near good enough for that. And if you added a new record to database A, and a new record to database B, but both had TOTALLY different data… you “can’t” know.

Example

Both Databases start out like this

1 FRED 5am
2 GEORGE 9am
3 WILMA 10am

where the # is the PK, the name is the “data”, followed by a timestamp

Later both databases are independently changed

and Database A now looks like this

1 FRED 5am
2 GEORGE 9am
3 WILMA 10am
4 BARNEY 11am
5 BETTY 11:01am

and Database B looks like this

1 FRED 5am
2 GEORGE 9am
3 WILMA 10am
4 PEBBLES 11:00am
6 BETTY 11:01am
5 BARNEY 11:02am

TECHNICALLY from a data content point of view (PK and Timestamps don’t “count” as data)…
Database B only has ONE record out of synch… Pebbles

And that doesn’t even cover DELETED records (dang site would not let me edit … AGAIN)

Ok, see what you mean. Actually you have 2 records out sync. Pebbles and Barney. PK can not be used for syncing. After sync of Database B into A.

Database A will look like this.
1 FRED 5am
2 GEORGE 9am
3 WILMA 10am
4 BARNEY 11:02am
5 BETTY 11:01am
6 PEBBLES 11:00am

I view PK as internal to the table and only references the unique record reference internal to that table not another database.
Deleted records? Those you will have to keep track off in a table or some kind of log.

The situation I described earlier involved 4 databases. Database B,C and E would sync up with Database A. Once done Database A would reconnect with B,C, and E and sync up.

More stuff. In my example the BARNEY record was modified/update and PEBBLES was added to Database A. I used the name as the key but of course that would not be correct.

UUIDs will, for the most part, solve the unique key issue. You still have the problem of applying business rules to records modified at both locations. The rules need to be defined and conflicts resolve accordingly. If you have minimal conflicts then you can flag records and have a human perform a review and make the final decision. It really depends on the business and the immediacy of the data.

There are two ways to merge data:

I’d call them Static and History-based:

  • Static: You compare the two end products, by their individual items, and then try to decide which ones rule over the others.
  • History-based: You keep a record of every change. Then, when merging the two, you apply each change in sequence, thereby ideally also noticing if both sides attempt to change the same data from the source origin (i.e.: if A changes an item, then B reads the changed item and changes it again, it’s safe. But if both A and B change the same item, then there’s a conflict). This is how modern version source control systems work (such as git).

Thanks for all the information folks…
I have decided that while it might be a “nice to have”… for the price this app will be selling for and the amount of work synching is going to be, and the fact that it would most likely be used by a very small % of clients [most will use it on a single computer, or within a LAN)… it it not worth my effort. Effort that can be invested on more useful “bells and whistles”.

[quote=41110:@Dave S]That sounds like it would work… EXCEPT
for records created brand new in each DB copy…
[/quote]

UUID’s which should be unique to the computer / db that they were created in

Not worth the effort… I would have to retro-actively alter 22 tables per database per client… but thanks anyways.

You can always add a UUID sync key. It’s not necessarily required that the primary key match.

i have done i application that sync from 4 location by internet but is the same DataBase it only sync the payment and Balance of customer.