mySQL table(s) structure

GJSNeptune · Apr 13, 2008

On my site I want to have a box showing my most recently uploaded content. It'll be from multiple categories, like blog, poetry, fiction, nonfiction, and possibly others. It's a basic list of the titles of that content, and next to each title is the category each one is under.

At first I did this using one table for all my writing to go into. That way it'd be easy; just sort the whole table by date and I'd get my most recent activity. I've now run into a problem, but perhaps it's just my table structure, which is basically:

id
type
date
title
text

This works fine, except pieces aren't unique to their category. In other words, my second blog entry might have the id of 7, instead of a unique blog id of 2. I thought about adding fields to the table, such as blog_id, poetry_id, fiction_id, etc., but I'm thinking there's an easier way.

Is there a way to select from multiple tables my most recent activity? I'd have to somehow analyze each table (four or more) and determine the most recent (let's say 7) pieces inserted into the database. Is this possible? Would it be taxing on the database and therefore quite slow or inefficient?

ameoba · Apr 13, 2008

could do something like

Code:

select post_type, id, date, title from (
select 'blog', blog_id, date, title from blog_posts
union
select 'poetry', poem_id, date, title from poetry_posts
union
select 'fiction', fiction_id, date, title from fiction_posts
) order by date
limit 10

if you have different posts segregated into different tables.

Another thing to do would be to put all posts in one table and then use another table to categorize them. I'd probably go this way.

GJSNeptune · Apr 13, 2008

Hmm. How would I use another table to categorize them?

mikeblas · Apr 13, 2008

I'm pretty lost because of the way you describe your model. I guess you've got blog entries, which turn out to be in different categories. Is that true?

I would've thought that the ID field, then, would be a globally unique number for all the blog entries, and that the type would be something that showed the category.

By the way, when building a database, I think it's a great idea to name a column the same way you name the entity in real life; why not call the "type" column the "category" column, as you do in your written description? What's the name of the table we're talking about here? "Pieces"? I'll assume that's correct.

If I have dcoded your intent correctly, you might have these sample rows. (Some columns are missing to save me some time, since I'm a volunteer.)

Code:

PIECES
======
id,category,text
1,poetry,some poem
2,poetry,some other poem
3,fiction,a fiction story
4,poetry,more poetry
5,fiction,another fiction story

You then say "this works fine, except pieces aren't unique to their category". I can't figure out what this means; are you saying that you want to have a single piece be in more than one category?

If so, then the only realistic alternative is to have a table that relates a particular piece to the different categories you want. The same pieces above would be modelled as follows, but let's further say #3 is both fiction and poetry.

Code:

PIECES
======
id,text
1,some poem
2,some other poem
3,a fiction story
4,more poetry
5,another fiction story


PIECESTOCATEGORIES
==================
id,category
1,poetry
2,poetry
3,fiction
3,poetry
4,poetry
5,fiction

Note how we use a different table, and Piece #3 ends up getting two entries there; one for fiction, and one for poetry. This is how we manage multiple categories for the same piece -- PIECE itself no longer has a category column.

Your most recent activity would be a join between the two tables, grouping by each column and going for the max. (Or, just ordering by the most activity date column in the PIECES table.)

Again, if you can provide a bit more definition of what it is you'd like to model, I'm sure we can help you find a solution.

GJSNeptune · Apr 14, 2008

Sorry. I'll try to clarify.

On my site I'll have blog entries, poetry, fiction, nonfiction, and possibly other types of writing pieces. I have one table for all of these types of writing:

id
type
date
title
text

The 'type' column is a more specific identifier so I can sort the table by the type of piece. A blog entry is type bl, a poetry piece is po, fiction is fi, etc.

The problem I'm running into is that pieces don't have successive IDs under their type. My second blog entry might have an ID of 3 in the table, whereas I'd like it to also have a blog_id of 2. That way the URL would be blog.php?entry=2 instead of entry=3. I don't want there to be gaps.

So I thought of two possible ways (and maybe three now) to go about this. Either I keep the one table and have blog_id, poetry_id, fiction_id, etc. columns, or give each type of piece its own table like I had initially. And I guess I could also keep the one table and use another table to categorize the information, although that sounds like the most complicated solution.

BGM · Apr 14, 2008

your getting messed up with your identity seed..

sql server does not just randomly assign unique ids. i am going to take a stab in the dark and suggest that you have deleted a row for testing or something and then added a new one back in and are wondering why its been given the next id and not the id you just deleted?

try truncating the table and starting again, or if you have loads of data in there (or more likely) referential constraints, do a delete from and reseed it

Code:

delete from t2
alter table t2 auto_increment = 0;

p.s. that is untested, i work with MSSQL mostly

Edit: Reading your post again i dont think your going to be able to do what you want... it seems that you want the urls to be blog.php?entry=2 and then poetry.php?entry=2 ??

This is never going to happen if you store them in the same table, unless you do some funky sql to renumber them when you read out of the table... but i see little point in this.

Why not just completely randomise your id column, and that way you wont notice that they are out of order!

GJSNeptune · Apr 14, 2008

I'm not following. I haven't been deleting anything. I'm working with a new WAMP install, so MySQL is a clean slate.

Correct me if I'm wrong, but I can see it working with one table if I have other columns such as blog_id, poetry_id, etc. When I SELECT and sort the information, I can manipulate the specific IDs, and when I insert a piece I can specify which type_id to use, so for a blog entry, only the blog_id receives an auto-incremented value. I could probably get this working, but I wanted to see if there was an easier, less cumbersome solution.

mikeblas · Apr 14, 2008

GJSNeptune said:
Sorry. I'll try to clarify.

On my site I'll have blog entries, poetry, fiction, nonfiction, and possibly other types of writing pieces. I have one table for all of these types of writing:

id
type
date
title
text

The 'type' column is a more specific identifier so I can sort the table by the type of piece. A blog entry is type bl, a poetry piece is po, fiction is fi, etc.

I guess this is where you're losing me. Not to bust balls -- but if you're not using terminology we can both understand, it's hard to help.

I would expect that the ID is the specific identifier for this table and that the type is an attribute of a particular entry, not a part of its identifying characteristic. The ID alone could be a number, and a different number for each entry that's functionally independent of the type. That's the most unique way to identify the pieces you're writing--each one gets its own unique number, period. Type is actually less specific, since there are multiple pieces of each type. If I'm thinking of ID=3, then I know there's either one or none with that ID number. If I'm looking for ID="Poetry", then there could be one, none, or hundreds.

GJSNeptune said:
The problem I'm running into is that pieces don't have successive IDs under their type. My second blog entry might have an ID of 3 in the table, whereas I'd like it to also have a blog_id of 2. That way the URL would be blog.php?entry=2 instead of entry=3. I don't want there to be gaps.

I see. Why is that? Why do you not want the ID alone to be unique? What you want is doable, but is neither easy nor efficient. At that point, it seems like you must have a very good reason in order to continue pursuing a solution that's more difficult than it needs to be.

Using two columns to uniquely identify an item is called a "composite key". Composite keys can be necessary, but in this case I can't understand why you think they are. Composite keys are slower than a single column key because the key is wider, harder to compare, and takes more space.

GJSNeptune said:
So I thought of two possible ways (and maybe three now) to go about this. Either I keep the one table and have blog_id, poetry_id, fiction_id, etc. columns, or give each type of piece its own table like I had initially. And I guess I could also keep the one table and use another table to categorize the information, although that sounds like the most complicated solution.

Actualy, using more tables is the more complicated solution. You'll have overap, you'll need to keep reference tables and so on. For example, how would you know that { 3, "Poetry" } is a valid entry, but { 3, "Fiction" } is not? Searching around in multiple tables is more expensive and more complicated than looking in a single table.

If you want to pursue your goal of having a composite key, then the issue is finding the next available ID number given the type you want to create when you're adding a new piece to the table.

GJSNeptune · Apr 14, 2008

Then let's say my new table would be:

id (INT, auto-increment (could I still use this to sort for most recently submitted pieces?)
blog_id
poetry_id
fiction_id
nonfiction_id
etc.

type (probably don't need this anymore, as I can sort by the specific ID?)
date
title
text
etc.

Is there a limit to how many fields can be primary keys?

mikeblas · Apr 14, 2008

What you've shown is a single new table, is it? What is it that you'll store in those fields? Can you show some sample data, as I've done in my previous posts?

Given, for example, a row with ID = 5 and Type = "Fiction", what would be in the blog_id, poetry_id, fiction_id, and nonfiction_id columns?

Your table can only have one primary key. That one primary key can be a single columns or can span multiple columns. If there are multiple columns, then uniqueness is determined across the whole tuple. For example, {3, "Poetry"} is unique to {4, "Poetry"} and to {3, "Fiction"}.

GJSNeptune · Apr 14, 2008

This is what I want to do, and hopefully I explain it clearly enough so that you can suggest the best way to do it or how to structure my table(s).

A user clicks on my Blog link, blog.php. The main column shows my most recent blog entry, and the right column contains a few boxes with additional information. The first box is a paginated list of blog entries for navigation. Each entry's link passes the variable through the URL, so if the user is viewing the most recent entry, each title/link (descending) in the box should have an ID one less than the previous one. I would prefer to keep these IDs in succession, as in 10, 9, 8, 7, etc. instead of being staggered, like 10, 7, 6, 3, 1.

This would require each blog entry to receive an ID unique to the Blog type regardless of any other identifiers. The same would go for the other types of pieces.

The second box lists my most recently submitted pieces regardless of type, and next to each title is what type it is.

mikeblas · Apr 14, 2008

I suggested the best approach for the problem that I understand you to have in Post #4 in this thread.

As far as I can tell, the only difference between what I understand your problem to be in that post and the issue you're trying to solve is the inscrutable requirement that you have sequential ID numbers for each group of items. I've asked why you need this, but you've ignored my question. Beyond my question are lots of others: for example, if you require that the numbers are sequential, what happens when you want to delete one of the existing items? Will you renumber all of the items with a higher ID?

As I've pointed out, the requirement that each group have their own sequential IDs makes your solution a bit more difficult because you'll have to find the next available ID within the category, as well as carray around multiple columns in the keys. If you can relax this requirement, then you just use an identity column for the ID and carry only the ID around as the key.

GJSNeptune · Apr 14, 2008

I wouldn't be deleting any entries. Only editing. The reason I want sequential IDs is because I'm very particular I guess.

GJSNeptune · Apr 14, 2008

Additionally, without sequential IDs unique to the writing type, a user could manually type a URL and pull up a piece of the wrong category. Let's say the user types blog.php?entry=4, where the piece corresponding to ID 4 is actually a poetry piece. It's not the fourth blog entry, but the fourth piece I inserted.

mikeblas · Apr 14, 2008

GJSNeptune said:
Additionally, without sequential IDs unique to the writing type, a user could manually type a URL and pull up a piece of the wrong category. Let's say the user types blog.php?entry=4, where the piece corresponding to ID 4 is actually a poetry piece. It's not the fourth blog entry, but the fourth piece I inserted.

Or, it just notices that there's no blog entry with ID #4, and returns an error.

ameoba · Apr 14, 2008

If a user is URL hacking - they deserve what they get.

GJSNeptune · Apr 14, 2008

It's not hacking. It could be an honest mistake.

mike, I'm at work, but when I get home I'll look into your Post #4 tips. It sounds pretty complicated though.

generelz · Apr 14, 2008

I will echo what the people above are saying - the requirement that the ids in the URLs be sequential for the reasons you've given is not a very compelling argument, especially in the face of the facts Mikeblas has given that it makes your database model much harder to maintain.

It seems like the reason you want to do this is simply to make the URLs easy to generate. You can do this just as easily with a database query and have your cake and eat it too:

Code:

recent_entry_ids = mysql_query("select id from Posts where type = ? order by posted_date desc limit 5");

for (id in recent_entry_ids)
  generate_entry_url(id);

What's the harm in doing this? By ensuring you have an index on posted_date and re-use the same connection you've already obtained earlier in the data fetch process, this will be very performant.

I think the argument about URL hacking is spurious for many reasons - when the user steps out of the bounds of your application (by not following a given link for example), they are entering uncharted territory. I think you simply have the idea in your head that the URLs should be sequential ids after seeing some sites where it might work that way - without thinking it through too much.

mikeblas · Apr 14, 2008

It's not that complicated; it's just more complicated than it needs to be. You'll end up with this table--again, with some columns missing for brevity:

Code:

PIECES
======
id,category,text
1,poetry,some poem
2,poetry,some other poem
1,fiction,a fiction story
3,poetry,more poetry
2,fiction,another fiction story

Let's say, now, that you want to add another piece, and it's "Fiction". You'll have to figure out what the next available key is.

The simple way to do that is kind of broken. It would be to select the MAX(ID) from the table over the Type = "fiction" rows:

Code:

SELECT MAX(ID)
  FROM PIECES
 WHERE TYPE = 'Fiction'

Add 1 to that max, and you have your next key value. The reasons this approach are broken are kind of subtle. One reason is that this requires you to traverse to the end of the index, find the number you want, and then return it. This has the side effect of temporarily locking the whole table, which hurts concurrency.

It also has the problem of not being transactional; you'll have to get this MAX, then insert your new row with the new value, all in the same transaction. Otherwise, waht happens if some other user gets the next key value at the same time? They'll also see the same number you did, and they'll try to insert with that value.

Even if you're the only writer to the table, you should stil be at least a bit concerned with the locking of the table to compute the max value; it can limit the scalaiblity of your site by locking the table when others are trying to read it; to assure that doesn't happen, you end up doing a bunch of work to make sure you're using the right locking mode, don't have an indext type which interferes with that mode, and so on.

Another approach would be to keep a small table indicating the next key. You'll have some contention here, but at least its away from the data table you're working. But you still have a transactionality problem; what if you increment this item, then the INSERT fails for another reason? You'll end up with an unused ID if you don't, which is something that you insist you don't want.

These issues aren't horrible, but they're a lot more complicated than the automatic identity column you've declined using. Perhaps avoiding these problems is why you were interested in using separate tables for each of the different entities. That's aviaable solution, as long as you're sure you odn't need to do anything across all entities with any efficiency; otherwise, you're going to be doing multiple hits and churning. Again, not a big deal, depending on how often it needs to happen, but it's a concern for me because it's pretty clear you haven't fully expressed the complete design or intention of the system in question.

GJSNeptune · Apr 14, 2008

generelz said:
I will echo what the people above are saying - the requirement that the ids in the URLs be sequential for the reasons you've given is not a very compelling argument, especially in the face of the facts Mikeblas has given that it makes your database model much harder to maintain.

It seems like the reason you want to do this is simply to make the URLs easy to generate. You can do this just as easily with a database query and have your cake and eat it too:

Code:

recent_entry_ids = mysql_query("select id from Posts where type = ? order by posted_date desc limit 5"); for (id in recent_entry_ids) generate_entry_url(id);

What's the harm in doing this? By ensuring you have an index on posted_date and re-use the same connection you've already obtained earlier in the data fetch process, this will be very performant.

I think the argument about URL hacking is spurious for many reasons - when the user steps out of the bounds of your application (by not following a given link for example), they are entering uncharted territory. I think you simply have the idea in your head that the URLs should be sequential ids after seeing some sites where it might work that way - without thinking it through too much.

That code looks like it pulls recent entries from a certain type. I need it to pull recent entries regardless of type.

(Duh) on not thinking it through too much. Hence my thread, right? It can be done and I want to do it. Please spare me the backhanded jabs to make yourself feel superior. I'm trying to learn.

mikeblas · Apr 14, 2008

GJSNeptune said:
That code looks like it pulls recent entries from a certain type. I need it to pull recent entries regardless of type.

As I explain above, it gets the next key for the given type that you'd like to add to the table. You need that code because you can't simply get a sequential number over all items -- you need one per type.

GJSNeptune said:
(Duh) on not thinking it through too much. Hence my thread, right? It can be done and I want to do it. Please spare me the backhanded jabs to make yourself feel superior. I'm trying to learn.

Sorry to have offended you. I've patiently tried to answer your questions even though you've responded to only a few of mine, and earnestly done my best to help you. It's remarkable to me that you can't see that, though not surprising because you're not reading through my posts and asking questions like the one above when the answer has already been provided to you.

Good luck with your project!

GJSNeptune · Apr 14, 2008

No, you've helped me a lot. That part was directed at generelz. Sorry I didn't quote. I didn't think you'd sneak a post in.

GJSNeptune · Apr 14, 2008

I guess I didn't think it'd be so much work. I guess I'll give each type its own table and maybe try harder to display recently submitted pieces better. I certainly don't want the table to lock up.

GJSNeptune · Apr 15, 2008

ameoba said:
could do something like

Code:

select post_type, id, date, title from ( select 'blog', blog_id, date, title from blog_posts union select 'poetry', poem_id, date, title from poetry_posts union select 'fiction', fiction_id, date, title from fiction_posts ) order by date limit 10

if you have different posts segregated into different tables.

Another thing to do would be to put all posts in one table and then use another table to categorize them. I'd probably go this way.

Wow. I tried the UNION function and I managed to get it to work. Was a lot easier than I thought it'd be. Thanks!

mySQL table(s) structure

[H]F Junkie

Supreme [H]ardness

[H]F Junkie

[H]ard|DCer of the Month - May 2006

[H]F Junkie

Limp Gawd

[H]F Junkie

[H]ard|DCer of the Month - May 2006

[H]F Junkie

[H]ard|DCer of the Month - May 2006

[H]F Junkie

[H]ard|DCer of the Month - May 2006

[H]F Junkie

[H]F Junkie

[H]ard|DCer of the Month - May 2006

Supreme [H]ardness

[H]F Junkie

Limp Gawd

[H]ard|DCer of the Month - May 2006

[H]F Junkie

[H]ard|DCer of the Month - May 2006

[H]F Junkie

[H]F Junkie

[H]F Junkie