How to compare a table from two different databases?

Tim_Seyfarth · December 12, 2022, 7:18pm

Hello all,

I have two databases that are used by two different apps. One is the master the other needs to get updates from the master. Both tables are identical.

Currently I am looping through the results which takes about 10-12 seconds to complete. Can anyone suggest some SQL code to do this on the database itself without looping? Or a better/faster method of doing the comparison?

I am using PostgreSQL, Xojo 2022 R3.2 targets are Windows and Raspberry Pi. However the databases all reside on the same computer.

Thank you!
Tim

Louis_D · December 12, 2022, 7:34pm

This may not be what you are looking for, but this is certainly how I prefer to compare tables between databases:

Many database tools have the ability to compare tables between databases, and sometimes, have data transfer and data synchronizing tools also. With PostgreSQL, I use Navicat on a Windows computer. It does have the feature. A number of other tools that I tried and since abandoned, have the feature also. The host system does not really matter, so long as you configured postgresql to allow remote access and opened the firewall on the hosts. In the spirit of full disclosure, Navicat is not free and certainly not cheap. But for me, it is worth the money. I don’t upgrade with every new version. As I said, it is not cheap.

Tim_Seyfarth · December 12, 2022, 7:37pm

Thanks Louis.

I need to do it programatically within Xojo…

Tim

Ian_Kennedy · December 12, 2022, 7:49pm

You can do it with a single SQL statement:

INSERT INTO table1 (column11, column12, ...)
SELECT column21, column22, ...
FROM table2
[WHERE condition];

If the two tables are exactly identical in every way:

INSERT INTO table1 SELECT * FROM table2;

If you have to do it with a loop, make a prepared statement and use it in the loop. It will be dramatically faster than issuing the same SQL over and over again with new data. See examples:

https://documentation.xojo.com/api/databases/postgresqlpreparedstatement.html#postgresqlpreparedstatement

Tim_Seyfarth · December 12, 2022, 8:03pm

Hi Ian,

Part of the problem is I need to know which records are different. Updating the slave table is only part of what I need to do.

Is there a command that will provide that?

Thanks,
Tim

Ian_Kennedy · December 12, 2022, 8:19pm

Something along the lines of

theSQL = "select from table2 left join table1 where table2.field1 <> table1.field1 or table2.field2 <> table1.field2 or table2.field3 <> table1.field3"

Should identify the changed records.

Then you can then loop through them and process as required.

Var oInsert as PostgreSQLPreparedStatement
Var oUpdate as PostgreSQLPreparedStatement
Var oQuery as PostgreSQLPreparedStatement

oInsert = SQLitePreparedStatement(db.Prepare("Insert into table1 values (?,?,?)"))
oUpdate = SQLitePreparedStatement(db.Prepare("update table1 set field1=?, field2=? where field3=?"))

// Search for records
While not table2->eof // loop though records

   If UpdateRequred then // If we need an update
      oUpdate.SQLExecute( value1, value2, value3 )

   else if InsertRequired then // If we need to insert
      oInsert.SQLExecute( value1, value2, value3 )

   end if

   // Move to next record
wend

Ian_Kennedy · December 12, 2022, 8:21pm

Do you have some code you are using now, it would make it easier to suggest improvements.

Jeff_Tullin · December 12, 2022, 9:06pm

Do you need to make the slave table identical to the master?

if so, empty the slave and copy all records from the master

Do you need to make only records they have in common change in slave to match master?

if so, create a temporary table at the slave end, copy the records which exist in both tables, then delete them from the slave table and insert from the temp table.

If you truly need to know which records exist in common but are different, you probably need to assemble a query like this one:

select a.*, count(1) as diffrecordcount
 from master a, slave b  where a.ID = b.ID and  
(
coalesce(a.fieldname1,'X')  <> coalesce(b.fieldname1,'X')
or
coalesce(a.fieldname2,'X')  <> coalesce(b.fieldname2,'X')
)
;

I used coalesce here … some dbs use ISNULL or similar.

The comparison must handle null values and provide a default fallback for the field, because comparing NULL with a value doesn’t work. I tend to use a value that wouldnt occur for real.
Like ZZZbananaZZ for a text field…

Tim_Hare · December 12, 2022, 9:36pm

I assume you meant
coalesce(a.fieldname1,'X') <> coalesce(b.fieldname1,'X')

Jeff_Tullin · December 12, 2022, 9:47pm

You’re right, I did. Thanks. I’ve changed it now.

Thom_McGrath · December 13, 2022, 8:14am

If the two should always be in sync, I recommend setting up replication and let PG handle the details. However, I’m assuming your needs are a little more complex.

Option A, add a generated column that prepares a hash of the important columns. A stored generated column will update this hash whenever the record is written. You can compare this hash between the two tables to determine if a row does not match. Can you join between the two tables? It’s not clear to me how you’re currently comparing. This option doesn’t really change much, just makes it more convenient to compare the rows rather than doing lots of column comparisons.

Option B, just insert everything and use an ON CONFLICT clause with the primary key. Something like INSERT INTO table (column) VALUES (value) ON CONFLICT (primary_key) DO UPDATE SET column = EXCLUDED.column WHERE column IS DISTINCT FROM EXCLUDED.column; should work, though I might have made a mistake since I’m typing this off the top of my head. I don’t really like this option though, because it requires a failure first then an update, which is slower than just avoiding the insert in the first place.