Connecting two tables - What is the best approach?

Kayla_Gordon · July 6, 2015, 12:00pm

Hey everyone,

I’m trying to connect two tables, but I’m not sure which method to choose. Here is my situation:

Step 1) Starting out, the user inputs information such as first and last name, and an ID number. They click the Save button and that information is stored in the database. This info is saved in Table 1 and displayed in Listbox1.

Step 2) Now the user has the option to add additional information for the person they just added in Step 1. By selecting that person’s name, the Add Address button will become enabled and that person’s address can be added and saved into the database. This is saved into Table2 and displayed in Listbox2 (below Listbox 1).

My Goal: After the user has completed Step 1 and Step 2, I want them to be able to browse the list of people (in List box1), click a person’s name, and then their address(es) will populate in Listbox2, below. Only the addresses belonging to the name selected will appear. I’m hoping to tie the person to the address, which is why selecting a person’s name from Listbox1 is mandatory for adding addresses.

I’ve considered the INNER JOIN command, but in my mind that means combining the information from Table1 and Table 2 to make a new table, Table 3. I just don’t know if that is the best approach for what I have in mind.

I have PKs set for both tables.

Does anyone have any advice to offer?

Giulio_Mastrosanti · July 6, 2015, 12:11pm

HI,
you have a table, let’s call it person, that has an ID field, unique, autoincrement
you have a second table, let’s call it address, that will have too its ID field, unique auto increment, but will have also a person_id number, containing the ID of the person it refers to.
so, when you you put the persons in listbox1, you will display their info on the table itself, but you will put somewhere ( i.e. on the rowtag ), the id of the person.
when you click on a person row, you must simply take the id of the person row and perform a simple querY:
select * from address where person_id = id_of_the_person
no need for a JOIN in this case, the JOIN will help you when you will need to have a list of both persons and addresses data on a unique query

Louis_D · July 6, 2015, 12:34pm

[quote]you have a table, let’s call it person, that has an ID field, unique, autoincrement
you have a second table, let’s call it address, that will have too its ID field, unique auto increment, but will have also a person_id number, containing the ID of the person it refers to.[/quote]

This approach is correct. Now, depending on the DB engine, I may opt for a slightly different approach. Table 1 (Person) would have an Id field set to autoincrement. Table 2 (address) however, would not have its Id field set to autoincrement, but instead woud use the same id number as the Id in table 1. There is little advantage in having a unique id for table 2 that is independent of table 1, since the data in table 2 is necessarily related to table 1. I think that this approach is best for the OP’s stated requirements.

In an other scenario, let’s take a sales order. We would have the order header (client number and name, ship-to number, document currency, document type, etc.) and then item lines (item line number, material number, quantity, net item price, etc.) The header table would have an Id field (the order number). The items table would combine the order number (order Id field) and the item number (line number in the order, say line Id) as the unique index. In table 2 neither the order number nor the line number are by themselves unique. However, the combination of the two fields yields a unique index. Querying Table 2 with the order number would bring out all the item lines for the order.

A select statement returns a recordset that in a sense is a table, with all the fields keyed to the index of the main table and to values of the inner join. That “table” only exists in the context of the recordset. (now, there would be ways to make queries that create tables, but I assume that this is not your intent). In effect, depending on the DB engine, it is possible to issue a select command on the results of an other select query.

For the stated requirement, an inner join is only useful if you wish to query the whole data set (table 1 and 2) at once, and then display specific fields from the recordset upon request. In some cases, this is the better approach and in some cases not. It depends on whether you need to minimize the number of connexions or minimize overall traffic (the first approach uses only one connexion to get all of the data, and the second approach queries only necessary data, but may require multiple connexions)

DaveS · July 6, 2015, 2:56pm

First off. I would highly suggest reading up on SQL… (all of you actually)

Some of the information I see presented here is “dangerous” to a newbie as it is incomplete. Especially if TableA and TableB share ANY fields with the same field name (which is a high probability in a properly designed database).

The most simple SQL statement is a SIMPLE join… the use of INNER or OUTER is not necessary. It is a good practice to ALWAY alias your tables, as this shortens the SQL (in some cases), but (in my opinion) ALWAYS makes it easier to read, not to mention easier to change a table name (change the SELECT statement without having to touch the WHERE)

Now to provide a specific SQL statement to answer this post is not possible without knowing the structure of TableA and TableB so I will provide a “template”

SQL=" _
SELECT a.field1,a.field2,a.field3
   FROM first_table A, 
second_table B
WHERE a.field1=b.field1"

//  Execute the SQL (assumes active database connection, recordset object etc)

And in regards to “always createing table3” that is only if you have a need to do so. Normally you would create a recordset (which “could” be construed as an “in-memory” table

Also, performance may become an issue if either tableA or tableB is large, fragmented, or remote. Indexes MAY be required for example

Kayla_Gordon · July 6, 2015, 3:09pm

Wow! Thanks for all the information! I’m going to play some more with it later tonight and see what happens.

Paul_Raulerson1 · July 6, 2015, 6:23pm

I agree with being careful here, but I probably would not do a join either, since this is always a one to many relationship. One name may have many addresses. Simple select from table two would do it for now.

That’s a pretty good start towards a real contact system by the way! Most people don’t immediately pick up on a single contact having multiple addresses - or rather, they do not account for the possibility of a contact having more than a home and office address.

I would definitely recommend a unique key on the addresses table though, perhaps a key combination of key from table one plus an Address Type, plus a sequence. So John Doe could have something like this:

00010001HOME
00010002OFFICENY
00010003OFFICELA
-   -   ----------
|   |   +------> Office Identifier
|   +-----> Address Sequence ID
+-----> Person ID

Don’t forget that each address may have one or more telephone numbers associated with it. Something like the above makes it trivially easy to select the addresses by the with using a LIKE statement or a wildcard.

Select * from Table2 where key like “0001%”

That kind of thing.

-Paul

Mark_Walsh · July 6, 2015, 6:38pm

What happens when your Person ID spills over 9999? How will you tell that the fifth digit of 10001XXXX is the end of the Person ID ‘10001’ and not the beginning of the Address ID for person ‘1000’?

DaveS · July 6, 2015, 6:49pm

And who is to say that a one-many relationship is a bad thing… Using a loop over Table1 and using data from it to issue a SELECT from Table 2 does nothing but waste time and resources and is a very very bad approach (especially if one or both tables is remote)

An INDEX provides much more performance value (if applied properly) than building a “fake” key.
Plus you can use as many fields to define a key as you want, no need to create another field that has to be maintiained that in fact duplicates actual data elements… might as well have a flat file or an array at that point.

Louis_D · July 6, 2015, 7:03pm

I am not very familiar with SQLite, but I would never do that on MSSQL. Fake keys are a bad idea in my book. Here is an example of what I am talking about:
fieldname type allow nulls
FactId int Unchecked
ItemId int Unchecked
Itemtype nvarchar(5) Checked
ItemNum int Checked

In this table, the key is made of FactId and ItemId. (There are many other fields in the table, I did not see value in pasting all of them here) This does not show that PK_TFACTIEEM, the primary key for the table, is actually the two fields, but that is how the table is keyed. Neither field is an autoincrement in this table. The transaction manages itemId’s when creating an invoice. FactId is the invoice number and it is autoincremented in the header table. The Id number is recovered once the header is created, and reused here. The combination of invoice Id and Item Id is unique in the table.

I do not query this table through a join with the invoice header table, nor do I loop through the items. I often refer to invoice headers alone, not always together with the items. A simple select * from TFACTITEM where FACTID = 123 will return all the items for invoice 123. Which, by the way, is always done through stored procedures. I only call the procedure and pass appropriate parameters.

I will definitely learn something by reading a good SQL book, but this approach has performed flawlessly. I use joins to create views where more complex data sets are are involved.

Paul_Raulerson1 · July 6, 2015, 7:09pm

You use a larger key? Seriously, I just didn’t feel like aligning a lot of zeros in the example above… but I expect people would use small integr, integer, int16, int32, int64, fullword, or whatever is the appropriate sized key available datatype for their particular situation. Whatever their actual needs are.

But even taking the example literally, a small contact system is kind of unlikely to have 10,000 contacts in it though, or more than 9,999 addresses for each contact.

As for spillover - normally those would be three separate fields combined in a single key. I would keep them contiguous but I would expect that “spillover” would not be any kind of an issue in a database.

create table TABLE2 (
PERSON_ID INTEGER NOT NULL,
ADDR_SEQ INTEGER NOT NULL.
ADDR_NAME CHAR(4) NOT NULL,
ADDR_TYPE CHAR(4) NOT NULL,
ADDR1 CHAR(33),
ADDR2 CHAR(33),
PRIMARY KEY (PERSON_ID, ADDR_SEQ, ADDR_NAME)
);

Something like that above, depending upon the SQL variant.

-Paul

Paul_Raulerson1 · July 6, 2015, 7:23pm

[quote=198855:@Dave S]And who is to say that a one-many relationship is a bad thing… Using a loop over Table1 and using data from it to issue a SELECT from Table 2 does nothing but waste time and resources and is a very very bad approach (especially if one or both tables is remote)

An INDEX provides much more performance value (if applied properly) than building a “fake” key.
Plus you can use as many fields to define a key as you want, no need to create another field that has to be maintiained that in fact duplicates actual data elements… might as well have a flat file or an array at that point.[/quote]

I didn’t say one to many relationships were bad. Just in this case…
Who said she was doing a loop? Sounds to me like she is populating from TABLE1 and then selecting from TABLE2 as a user event. Dumb to duplicate all that information by selecting in a join, when all you need are small parts of TABLE2 at any one time. in this case, short key reads (which is what SQLLight would actually do…) are quite a useful and efficient method.
What do you mean “fake” key? Every part of that key has meaning and is unique in Table2. A foreign key, an index, and a name or type. That is far better than the alternative I think.
I am at a loss to understand why you think a Primary key is not an index? Oh, admittedly, in Oracle or some other databases, it could be defined as a constraint, but it would still be an index of course.
Might want to use other indexes in there, such as type or location, which again might be composite keys. It all depends upon how the normalized data is used, and how far it has to be un-normalized to be efficient.

In this case, contact systems are the just about the classic classroom example. I use it myself in some classes I occasionally teach. it is a perfect way to get students, or anyone just learning about such a system to understand and effectively use normalization and good key/index design.

Paul_Raulerson1 · July 6, 2015, 7:25pm

P.S. Hey Kayla - just keep on doing what you are doing. Database design, like code design, is a subject people just love to argue about. You are doing quite well in your adventure! Congrats!

-Paul

DaveS · July 6, 2015, 7:25pm

You two examples are not even related…
In the first you propose a “fake” key made up of concatentated data elements (which would have a problem if the range of any one element exceeded the pre-defined value)
The second (and more correct) example, proposed a multi-element primary key… which leaves all the “maintenance” to the database engine…

And for sake of argument… I would NOT create a contact system this way…
The address table should contain an AddressID (unique), and the demographics of the location (street, city etc)… and should contain ZERO information about any residents or tenants of that address.

The “people” table should contain all the information about each person (name, gender etc), but ZERO information about where they live

Then a 3rd Table should contain TWO Fields… PersonID that matches the people table, and AddressID that matches the Address Table.

A given person can live (work etc) at multiple addresses, and each address could have multiple people living (working) there.

This could in fact be a many-many relationship, not a one-many as proposed above.

Sorry Kayla, if we have gone too far afield for you at this point, but these are important design considerations

DaveS · July 6, 2015, 7:29pm

[quote=198866:@Paul Raulerson]1. I didn’t say one to many relationships were bad. Just in this case…

Who said she was doing a loop? the loop was mentioned in a response not from Kayla
What do you mean “fake” key? a “fake” key is any value the developer makes up from existing data, instead of letting the database engine to its job (ie. each data element is represented once per record)
I am at a loss to understand why you think a Primary key is not an index? Primary Key is “an index” only in regards to a single table query (true I use Oracle 99% of the time, but do an EXPLAIN PLAN sometime and you will see the difference)
.[/quote]

Mark_Walsh · July 6, 2015, 7:29pm

Either way, searching on your concatenated key…

Select * from Table2 where key like "0001%"

…is still going to be problem when (not if) you spill over your original # of digits.

Your comment was that it was trivial to do a search with a wildcard, but IMO, it’s even easier to do a search on the distinct key:

Select * from Table2 where PersonID = "0001"

or better yet

Select * from Table2 where PersonID = 1

Since that uses an integer key, instead of a text key, which is much more efficient to index/search.

Paul_Raulerson1 · July 6, 2015, 7:44pm

[quote=198870:@Dave S]You two examples are not even related…
In the first you propose a “fake” key made up of concatentated data elements (which would have a problem if the range of any one element exceeded the pre-defined value)
The second (and more correct) example, proposed a multi-element primary key… which leaves all the “maintenance” to the database engine…[/quote]

I can see where you might have decided to read it that way, but
(1) They are the same example
(2) There is no range problem when implemented as separate data elements
(3) Even in VSAM there would be no range problem when implemented as separate elements

[quote]And for sake of argument… I would NOT create a contact system this way…
The address table should contain an AddressID (unique), and the demographics of the location (street, city etc)… and should contain ZERO information about any residents or tenants of that address.

The “people” table should contain all the information about each person (name, gender etc), but ZERO information about where they live

Then a 3rd Table should contain TWO Fields… PersonID that matches the people table, and AddressID that matches the Address Table. [/quote]

Which, if you look at it is the same as I suggested.

The PERSON_ID in TABLE2 is a FK from TABLE 1. It has to be there to do what Kayla described as “tie” the tables together.

The ADDR_SEQ field does not necessarily have to be there, but it is nice when you have contacts that have more than one address. Usually you find the need to list out all the addresses for a contact, sooner or later.

The ADDR_NAME or ADDR_TYPE field is rather important, as addresses need to be identified to humans some way. As in OFFICE, HOME, SITE1, or whatever makes sense in the field. The other fields in address need to be normalized.

TABLE3 should probably be phone numbers or something like that.

Not purely in a contact system. Each person would have a unique contact ID, even if they use the same office address. Besides, even at the same office, it is not unusual for two contacts to have different addresses and phone numbers.

It is possible to design the system so that a single addresses are linked to more than one contact to avoid duplication, to do that you would need a cross reference table that does have a many to many relation, but you would not do that from the Contact Master table. More commonly done in vendor systems anyway.

Giulio_Mastrosanti · July 6, 2015, 7:48pm

Dear Kayla,
that’s exactly what you need,
a bunch of TEACHERS disagreeing each other …

my last suggestion is:

buy a good SQL book
put in front of your computer
spend a good amount of sleepless nights reading and testing yourself examples, first directly on a SQL client, and then on a Xojo application

I’m sure in a couple of months you’ll come back teaching us

and remember: blood, tears and code!!!

Giulio

DaveS · July 6, 2015, 7:49pm

well maybe there should be no blood, but double up on the tears

Giulio_Mastrosanti · July 6, 2015, 7:54pm

blood, tears and code was on a gadget shirt of Code Warrior development suite, a loooong time ago

Paul_Raulerson1 · July 6, 2015, 7:55pm

[quote=198873:@Mark Walsh]Either way, searching on your concatenated key…

Select * from Table2 where key like "0001%"

…is still going to be problem when (not if) you spill over your original # of digits. [/quote]

No, there would be no spillover of digits. There would be an exception if any field ever exceeded the max value for the field. That would be true even in a VSAM system.

This spill over of digits is a red herring, caused mostly because I am too lazy to type all those zeros to show an integer.

[quote] Your comment was that it was trivial to do a search with a wildcard, but IMO, it’s even easier to do a search on the distinct key:

Select * from Table2 where PersonID = "0001"

or better yet

Select * from Table2 where PersonID = 1

Since that uses an integer key, instead of a text key, which is much more efficient to index/search.[/quote]

That would work. Except potentially ->decreases<- the efficiency of the query.

It is very disputable that an integer key is more efficient to index or search than a text key, though I grant you, it was a few decades ago. It would be interesting to see time trials of that in SQLight, as I only have DB2 and Oracle data on that.