Help on SQL Statement

Simon_Berridge · February 22, 2016, 10:48am

I am looking for some help on constructing a SQL statement and have been scratching my head for a while now and been unable to solve the problem. I am hoping that one of you guys can help me, please.

The target DB is SQLite.

Simplified versions of the tables:

[code]Table 1
mKey INTEGER NOT NULL PRIMARY KEY,
aid Text,
fid Text,
amt Currency

Table 2
mKey INTEGER NOT NULL PRIMARY KEY,
aid Text,
fid Text,
amt Currency
[/code]

As you can see the table structures are identical. The only differences are in the data item mKey.

I wish to construct a query that will delete any item in Table2 that has the same ‘aid’ and ‘fid’ as in Table1.

I have tried:

delete from Table2 where (select Table1.aid || Table1.fid from Table1) = (Table2.aid || Table2.fid);
and a number of other sql statements but to no avail.

Any advice?

Simon.

Jeff_Tullin · February 22, 2016, 11:41am

delete from Table2 where aid || fid in
(select aid || fid from Table1) ;

?

Markus_Winter · February 22, 2016, 12:41pm

[quote=248674:@Jeff Tullin]delete from Table2 where aid || fid in
(select aid || fid from Table1) ;

?[/quote]
Don’t think so.

“As in table1”, not “in table1”

DaveS · February 22, 2016, 12:50pm

delete from Table2 a
where EXISTS( SELECT 8 FROM table1 b WHERE a.aid=b.aid AND a.fid=b.fid)

Simon_Berridge · February 22, 2016, 1:01pm

[quote=248674:@Jeff Tullin]delete from Table2 where aid || fid in
(select aid || fid from Table1) ;
[/quote]

This works like a dream, extremely fast too. On the real tables this took less than a second for about 350,000 records to be deleted.

[quote=248683:@Dave S]delete from Table2 a where EXISTS( SELECT 8 FROM table1 b WHERE a.aid=b.aid AND a.fid=b.fid)[/quote]

This is really slow. It works fine but the time was about 18 seconds.

Thank you both. I was on the right track but it was the keyword ‘IN’ that I missed.

Simon.

Kem_Tekinay · February 22, 2016, 2:05pm

I’m sure you have your reasons but I’m curious why you can’t use one table instead of two since they are identical?

Simon_Berridge · February 23, 2016, 11:14am

This is a financial app that loads OFX, CSV and QIF files downloaded from a bank/credit card web site. These files are then parsed by the app and each transaction should be added to the database. However, there will be items within the file(s) that have appeared on a previous download file.

My app currently takes each ‘new’ item and checks whether it exists and, if not, adds it to the transactions table. This app currently has every transaction since 2003 (13 years worth) and the table is now huge! The checking takes a few seconds to complete and I was looking for a faster way to do this rather than one transaction at a time.

So, I came up with the idea of dynamically creating a lookalike table and storing all the imported items directly in there. Then I needed to delete those transactions that were already in the database and I was sure that it could be done with an SQL statement. Following the deletions I could then directly import the data from the temporary table straight in to the standard table.

After the help I received here I set up a test where I copied ALL of the data from the main table to the temporary table, interspersed with some ‘new’ data. Then I ran the suggested SQL, then ran an SQL to add the left-over transactions to the main table. This I did with a script and the whole process ran in under a second! In fact, it was so fast that I thought it had failed!

Sorry about the long explanation but I wondered if anybody would ask that question so thought it worthwhile giving my reasons!

Simon.

Maximilian_Tyrtania · February 23, 2016, 1:04pm

The standard way to avoid those duplicates would be to create unique indexes (can be multi column). Then just do the inserts and deal with errors gracefully.

Tobias_Bussmann · February 23, 2016, 2:06pm

be careful when comparing concatenated keys: ‘12’ || ‘3’ = ‘1’ || ‘23’

Jeff_Tullin · February 23, 2016, 3:23pm

Extremely good point.

In this situation , the OP should be fine as the text appears to be transactional.
eg ‘Walmart Texas’ || ‘2435.50’

But in a situation where the fields are of the same type (eg A||BC = AB||C ) ,
it is advisable to concatenate with an uncommon value that clearly breaks up the two parts.

eg

delete from Table2 where aid ||'^^^'|| fid in (select aid ||'^^^'|| fid from Table1) ;

DaveS · February 23, 2016, 3:27pm

[quote]In this situation , the OP should be fine as the text appears to be transactional.
eg ‘Walmart Texas’ || ‘2435.50’[/quote]

You are making a dangerous assumption

'Walmart Store #124' || '2435.50''
'Walmart Store #1' || '2435.50''

Simon_Berridge · February 23, 2016, 3:33pm

The concatenation of the two fields makes the result unique.

The aid is actually the account id, this is bank sort code and account number for the account id and then the fid is a transaction number for the account based upon the date, i.e. 201602200001. When concatenated you will get a unique ‘key’. So none of the above will apply.

This is exactly what the system currently does. However there are over 300,000 records and checking each new entry against this is a programming task and works one at a time. Whether or not I check for the existence of the record or check for a key violation results in the same time to update all the new keys. The new method that I now have the whole update takes way less than a second and gives the same result.

Simon.

Jeff_Tullin · February 23, 2016, 3:35pm

…which is the great thing about a forum.
Together we form a hive mind… many hands and all that…
Now the OP knows to go check, and is alerted to a potential issue. Hooray!

edit: has checked…

DaveS · February 23, 2016, 3:42pm

Thats my point… No it does not. It MIGHT in this particular situation, but it is a dangerous concept to get in the habit of relying on

'Walmart Store #124' || '35.50''
'Walmart Store #1' || '2435.50''

Two different stores, two different amounts, two IDENTICAL keys

Using Indexes, and not concatenting values is the only fast/safe way to do it…

Norman_P · February 23, 2016, 3:43pm

[quote=248971:@Simon Berridge]The concatenation of the two fields makes the result unique.

The aid is actually the account id, this is bank sort code and account number for the account id and then the fid is a transaction number for the account based upon the date, i.e. 201602200001. When concatenated you will get a unique ‘key’. So none of the above will apply.[/quote]

I think some of the point of the above is it will cost you next to nothing to make SURE that some magic combination of account code + transaction ID NEVER results in a concatenation that could be mistaken for something else.
If the transaction id is shorter or longer then you could have an issue.
Or if it rolls over.
Or you get an account code that is shorter or longer.
if account codes & transactions ids are always numeric and your separator is non-numeric than it is very easy to avoid any potential issue.

Used to write payroll software and there were occasionally issues kind of like this with the payroll processor we used then and there were always very odd workarounds.
One was putting A’s in as the leading character in a numeric field for the destination account numbers that were very short as zero fills resulted in the transaction being rejected but with A’s as the lead in the processor knew to strip them off and not send them to the destination bank.
Tons of fun.
That was a VERY long time ago though

Simon_Berridge · February 23, 2016, 3:56pm

[quote=248973:@Dave S]Thats my point… No it does not.

'Walmart Store #124' || '35.50''
'Walmart Store #1' || '2435.50''

Two different stores, two different amounts, two IDENTICAL keys

Using Indexes, and not concatenting values is the only fast/safe way to do it…[/quote]
Dave

The statement that you highlighted was taken out of context. In my real app the concatenation does give a unique key. In all of the years that I have doing this particular software solution there is no duplication.

Bank sort code (in the UK) is 99-99-99. This six digit code identifies a back branch. There is no duplication of this number.

Then the next part of the account id is the bank account number which is 99999999. This account number may be duplicated (I am advised by the banks that this is not so and I have not come across this). So the account id becomes 99999999999999. This will be a unique number for your bank account.

The FITID is in two parts. The first part is the transaction date in the form YYYYMMDD (20160220). The second part is a numeric count starting at 1 and incremented by one for each transaction against your account on that particular day. So, if you have 5 transactions appearing on 20 February then you will get 2016022000001 through 2016022000005. The next day the first transaction will become 2016023000001.

When these two parts are concatenated then you will get a unique reference like 611004645306362016022000001.

I fully understand from where you are coming, and adding something like ‘^^^’ is a safeguard that is worth having. This will then give the key 61100464530636^^^2016022000001.

In my testing, though, there is no single duplication on over 350,000 records.

This is an interesting topic, though, and I appreciate all your inputs to the debate.

Simon.