Store array in column or is there a better way?

Hi All-

So the project is to build a database of people and to store information about how those people are related. So for example, maybe the database has the following entries: John, Michael, Kate - and let us assume that they are all siblings. So under John’s entry I would want to list “Kate” as a contact, and “Sister” as the relationship type, “Michael” as a contact and “Brother” as the relationship type. I would also want to be able to list “Coworker” as a relationship type and the name of the company as an additional entry.

Each entry to the database is a person with up to N-1 connections, where N is the number of people in the database. For each connection we will store the connection’s name, connection type, and connection company. My question is how I should set up the DB, given that each entry will have an unknown number of connections. I would also store some other information for the person, such as email, phone, etc.

Any ideas?

You’ll have a table of persons and a table of connections. The connections table will have a foreign key for person_id, a foreign key for relatedto_id and a relationship. Note that each relationship implies two connection records, one from persona to personb and the reverse from personb to persona. You can use triggers to maintain the extra record. (Adding persona->personb would automatically add a record for personb->persona, and deleting personb->persona would automatically delete persona->personb.)

two tables
Person - with their name, gender, etc AND a unique ID
Relationships - with two ID’s (personID1 personID2 from the person table) and maybe a tag stating the relationship

so in Person you might have
John male 1
Michael male 2
Kate female 3

in relationship you might have
1 3 Sister
1 2 Brother

To extend this since each person might have 0 or more “contact info records” you could add a “Contact Info table”
Since they are unlikely to EVER be shared (i.e. two people with the same contact info record) I would not add a “relationship” kind of table. Something like ContactInfo which has personID, contact type, contactDetails
So to have John with an email of “john@company.com”, and michaels home phone as (123)456-7890

ContactInfo
1 email john@company.com
2 home (123)456-7890

ok so (and i’m new to databases) I would create one table with “CREATE TABLE People (name varchar, phone varchar, email varchar)” but then for the Connections table I dont know how many columns to specify because as I add more people, the potential number of connections increases. Would I destroy and rebuild the Connections table when I add new entries (new people) to the people table?

O I think I see … Two unique IDs

I would NOT put the email IN the person table.
What if a person doesn’t have one ?
You have a column thats unused.
Or, if, like me, they have 8 or 9 ?

And for the connections table it is 1 row PER connection
Literally each row is “this person is related to that person by ….”
More connections = more rows

EDIT : there IS one trick to the “relationships” table.
There may be an implied “directionality”.
For a lot of relationships it won’t matter - but for many it will.
For instance say the relationship is John IS Michaels boss.
Or that Kate is Johns Sister.

So if you put
1 2 Boss (id 1 = John and ID 2 = Michael) then you have to read that from left to right to get the correct relationship

But for a relationship like John & Michael are brothers
1 2 brother
and
2 1 brother
are equivalent

But
1 3 Sister (john is mates sister … whatttt ???)
Is NOT the same as
3 1 sister (kate is johns sister … correct)

So it might be more useful to think of the relationship rows as “person1 is person2’s ”

Norman’s approach is more flexible. 3 tables: person, relationship, contactInfo.

So table 1 holds a unique ID in column 1, person’s first name in column 2, person’s last name in column 3, etc. - only for those attributes that apply to everyone - one column for each attribute.

Table 2 holds a NON-unique ID in column 1, where the ID# relates to the (unique) person/ID from table 1, in column 2 there is an “information type” variable such as phone/email/reg mail, and column 3 has the column2 type information for person in column1.

Table 3 is for connections and will have a non-unique ID in column 1 related to person 1, non-unique ID in column 2 related to person 2, and column 3 will specify how person 1 is related to person 2 (not necessarily how person 2 is related to person 1, due to potential directionality).

Thanks alot guys. This makes sense to me. Let me know if I have specified anything incorrectly. Again, thank you very much.

p.s. - I wonder if it is advantageous to have a 4th column in table 3 that specifies how person 2 is related to person 1, so column3 is going on direction and column 4 is going the reverse direction - as opposed to creating a new entry …

I don’t think I would as it make logic weird in your app.
Sometimes you would want the value in column 3th and sometimes the 4th.

It’s much simpler to put in 2 entries
For instance if john works for Michael, and Kate works for Michael as well (Michael is both their boss’)

in relationship you might have
2 3 Boss
2 1 Boss

Now to find everyone who “works for” Michael you can simply query “person1 = 2”

But if you add a fourth column it will, in many cases, be completely redundant
in relationship you might have
2 3 Boss “Works For”
2 1 Boss “Works For”

You could look for

  1. everyone that has person1 = 2 and the first relationship is “Boss” (michael is someones boss)
  2. everyone that has person1 = 2 and the first relationship is “Works For” (this person works for michael)

That michael is Kate’s boss definitely implies that kate works for Michael - its redundant.
And in the case of “sister” you have to be VERY careful about which “direction” you read the relationship

With a db one thing you WANT to remove / reduce is redundancy