SQL question

Hi all,

I have two tables PROTEINS and PEPTIDES

Each Protein in PROTEINS gets chopped into PEPTIDES

PEPTIDES has Parent_Protein_ID as foreign key so that I know where each peptide derives from.

Each peptide has properties like mass, charge, etc

Now I’m looking for all Proteins which have ONLY peptides below a certain value of mass.

How do I achieve that?

TIA

Markus

Can you try an Inner Join?

Select p.Protein_Id From PROTEINS As p INNER JOIN PEPTIDES As pt On p.ProteinID = pt.Parent_ProteinID AND pt.Mass < ?????

INNER JOIN will only return rows where the JOIN condition is met.

Don’t know what DB platform your on but, hopefully, this will get you going.

But wouldn’t that also get proteins which have peptides with masses both below and above the threshold?

For example

Protein_1 is chopped into Peptide_1a with a mass of 50, Peptide_1b with a mass of 80, and Peptide_1c with a mass of 90.

Protein_2 is chopped into Peptide_2a with a mass of 70 and Peptide_2b with a mass of 120.

My problem is to find those proteins where all the peptides with the same Protein_ID have masses below the threshold, let’s say 100.

So I would want Protein_1 but not Protein_2.

(The reason for doing this is that we can’t detect peptides below a certain mass so all proteins which are chopped into only small peptides can’t be detected by the given method. Chopping them up in a different way might help though; but the question remains: how many of the proteins can we not detect within a given experiment).

If I’ve read your requirements right. No

In your example

Protein_2 in the Peptides table has 2 rows . One row (mass = 70) meets your threshold and the other (mass = 100) doesn’t. The inner join criteria will only be met by the first row (mass = 70). So only that row is returned by the Inner Join, the other row doesn’t get returned as it fails the Inner Join.

damn - can’t delete my post!!! Just realised I have misread your requirements!!!

Think your going to need a correlated subquery to do this. What DB platform you on.

You have to reload the page before you can edit.

I think I got it (or something like it):

SELECT ProteinID
FROM tbl_Peptides
WHERE ProteinID NOT IN(
SELECT ProteinID
FROM tbl_Peptides
WHERE mass < 100 )

And I’m using SQLite

Thanks Patrick!

I believe you will have the same protein showing more than once. In this case SELECT DISTINCT…

Writing the same thing :slight_smile: Though SELECT DISTINCT may not help either…

True. Thanks Rick!

[quote=105457:@Markus Winter]I think I got it (or something like it):

SELECT ProteinID
FROM tbl_Peptides
WHERE ProteinID NOT IN(
SELECT ProteinID
FROM tbl_Peptides
WHERE mass < 100 )

And I’m using SQLite[/quote]
Just to clarify, given your example criteria above, you did mean to write WHERE mass >= 100 as you wanted Proteins where all their Peptides had mass less than 100?

How come people are posting 2 seconds before me today!!!

SELECT DISTINCT ProteinID
FROM tbl_Peptides
WHERE mass >= 100 )

Yup. Thanks!

Because I’ve moved to New Zealand I now live in the future!

[quote=105486:@Rick Araujo]SELECT DISTINCT ProteinID
FROM tbl_Peptides
WHERE mass >= 100 )[/quote]
I’m pretty sure that doesn’t compile :wink:

Also it gets any protein that has ANY peptide in the detectable mass range

Well… For me it’s the optimized version of:

SELECT DISTINCT ProteinID
FROM tbl_Peptides
WHERE ProteinID NOT IN(
SELECT ProteinID
FROM tbl_Peptides
WHERE mass < 100 )

Let me try to understand. The proteins you want, each one, must have ALL the peptides with mass >=100? Or ANY peptides with mass >= 100?

You can’t detect peptides with a mass below 100.

Which proteins can you not detect because ALL their peptides have a mass below 100.

So no, it is not an optimised version as it gives a very different answer: which proteins have ANY peptide in the detectable mass range.

Notice the difference between ALL and ANY.