Quick addition to the db normalisation concept:
A pure normalised database will not store the same unrelated data twice. For example, if you had a table with a users score for each room solved, you could also store the users password and address in the same table. This is fine. But it's not normalised, which means it will take a huge amount of space. Each new score would store the user details each time. You would have:
User | Score | Password | Adress
Syntax | 123 | easytoremember | Some street
Syntax | 56 | easytoremember | Some street
Other | 741 | easytoremember | Some street
Normalising this would be recognising you have unrelated data (usually identified by repetion). For example, the above table may be called "
Scores"
. If there was to be a password and/or address in there it should be directly linked to the score itself. In this case it's not, hence this table lends itself to be normalised. The important concept behind normalisation is that every row has to be uniquely identifiable. In the case above, I could have 2 scores in different rooms with 123 move score. Adding an ID such as level and room would solve this (as long as only 1 demo can be stored per room).
User | Score | Password | Adress | Room
Syntax | 123 | easytoremember | Some street | JtRH (1N, 1W)
Syntax | 56 | easytoremember | Some street | Machina (17S, 2W)
Other | 741 | easytoremember | Some street | Ninja (1N, 166E)
Each row is now identifiable. In order to normalise a table you'll need an *external* identifier. In this case, you could use a combination of all columns (any fewer an you may get identical matches). So you add a *primary key*. This is something which uniquely identifies each row. As you see above, a row can only be identified by considering the entire row.
So we and an id. This will be unique for each row:
ID |User | Score | Password | Adress | Room
1 |Syntax | 123 | easytoremember | Some street | JtRH (1N, 1W)
2 |Syntax | 56 | easytoremember | Some street | Machina (17S, 2W)
3 |Other | 741 | nicetoremember | Some street | Ninja (1N, 166E)
Now we have a distinct way of referencing each row, without considering the whole row. In my example this is not necessary to do, but it's a good guideline as you'll now see. How do we avoid this password/address duplication?
Create a new User table, again with it's own identical identifier:
ID | Name | Password | Address
1 | Syntax | easytoremember | Some street
2 | Other | nicetoremember | Some street
And now, can identify each user uniquely - this format could even allow 2 players with the same usernames if we wanted to. In this case, for example, we have 2 users with the same address as they have separate ids.
One more step to normalisiation. We'll change the first table to this:
So we and an id. This will be unique for each row:
ID |User | Score | UserID | Room
1 |Syntax | 123 | 1 | JtRH (1N, 1W)
2 |Syntax | 56 | 1 | Machina (17S, 2W)
3 |Other | 741 | 2 | Ninja (1N, 166E)
This representation of date is now a lot more compact. A user could have 5000 high scores yet only once would the user details be stored, and you'd only have 5000 UserIDs stored instead which is a lot of space gained.
In this sort of example, the IDs are called "
primary keys"
as these are the unique identifiers. It is possible, and sometimes better to have cross-column primary keys, but a good rule of thumb is to always assign a primary key in case you later want to reference that table.
The referencing column (UserID in this case) is called the "
foreign key"
.
Be careful not to over-normalise (you could for example store every possible address in the world in a seperate table and use an ID as above to reference it). DB design is the art of understanding the data, above all, and normalising redundancy without compromising efficiency and storage.
[Last edited by Syntax at 12-11-2006 12:24 AM]