Feasible to create a table per entry? (in order to emulate a 2D table)

dalearyous

[H]ard|Gawd
Joined
Jun 21, 2008
Messages
1,922
hard to explain, but i basically need to create a 2D table. right now, the only way i think i can do it is to create a table for each entry that is submitted into the main table. problem is, doing so would create thousands of small tables. i don't really care, but i am questioning whether or not this is the most optimal solution.

am i missing something completely obvious?
 
Usually when I need to store that kind of data I just encode it into a BLOB and store that into a column, with additional indexing columns as needed. But you need to tell more about the application.
 
i want to store the distance between 2 points for a lot of points

What rdbms? What sorta query volume, number of users, expected rate of feature growth, etc? You could store the points, create spatial indexes and calculate the distance from your query:

http://technet.microsoft.com/en-us/library/bb964712.aspx
http://technet.microsoft.com/en-us/library/bb895270.aspx

Or perhaps fact/dimension tables will be more scalable.

We definitely need a lot more data to help you make a good choice; But my gut feeling is that creating a new table for each record might not be optimal in many of the ways mentioned above.
 
the table could potentially grow to hundreds of thousands of points ... user probably couple hundred ... it will grow quickly at first and then pretty much stop.

sorry i am being so vague, i didn't know quite what to ask.

*edit, your links are extremely helpful and it seems the second one is almost exactly what i want ... this is very new, will have to do some reading/researching/testing but seems to be going in the right direction
 
Last edited:
when i said distance between two points, i meant that i will be running a function, calculating the distance, and i just need to store it.
 
hundreds of thousands of rows is no big deal for competent databases.

We need more info on your problem.
 
i need to be able to pass in two points, x & y coordinates into the database and have it find me the distance between the two (and no its not a straight line)

so, normally i would do this:
3 columns: ID, xCoord, yCoord

BUT, if i give you two points (so any 2 rows in the entire table) i need to check distance between the two.

crude example:
basically from point 1 to point 2 is 5 (whatever) and so on
tableexample.png
 
Why? Why not just store X and Y and do the calculations at runtime (or querytime)?
 
Maybe I'm missing something, but I don't see what you'd do with thousands of tables here... Wouldn't the simplest structure just be one table with (ID1, ID2, Dist) for each pair?

Why? Why not just store X and Y and do the calculations at runtime (or querytime)?
Yeah, unless your distance function is slower than a database lookup, you're probably better off calculating it on the fly.
 
because the calculation is the most expensive part ... i need to limit how many times i calculate distance

"Maybe I'm missing something, but I don't see what you'd do with thousands of tables here... Wouldn't the simplest structure just be one table with (ID1, ID2, Dist) for each pair?" - i am not seeing this?
 
Is calculating the distance really a bottleneck? Have you profiled the performance? Having a "2D" table means that your database size increases with the square of the number of rows, which sounds utterly unsustainable and considerably more expensive.
 
i am not seeing this?
You mean you don't understand what I'm talking about?

Say point1, point2 etc. are the primary keys (the "ID" column) in your coordinate table. Then your distance table, for the example you gave, would look like this:

(ID1, ID2, Dist) =
(point1, point1, 0),
(point1, point2, 5),
(point1, point3, 7),
(point1, point4, 9),
(point1, point5, 11),
(point2, point1, 5),
(point2, point2, 0),
...

because the calculation is the most expensive part ... i need to limit how many times i calculate distance
It may be the most expensive part, but that doesn't mean this will make it faster. A disk access is an eternity from a CPU's point of view. Just make sure it's actually helping before you commit to anything.
 
As everyone keeps mentioning, more specific details will be helpful.

Do the data points change over time? How often is the same distance between points calculated? Storing the data in a caching server would result in faster reads than storing it in SQL server, but (since it's a caching server), the storage is not durable. Depending on data set size and how static your data points are, storing it in your application's memory could be even faster. Does the underlying data (and computed distance between points) need to be stored reliably prior to presenting results to the user/consumer?

There are hundreds of business requirements (that none of us know about) that could change any of the advice provided in this thread...

For example, you could say:
Hypothetical Quote from dalearyous said:
I need to store the data points at every road intersection and destination for a delivery company. The data points will be stored with some associated reference information like house number or intersection. This data rarely changes; I will be computing the fastest route between two locations using this data, but I need to guarantee that full details for every computed route is stored to two database servers in two separate data centers before providing the route to a customer.

If that was your use case, then we could make some great recommendations backed with real world examples.

sorry i am being so vague, i didn't know quite what to ask.

No need to ask more questions; Just explain what you're trying to do so that we can help you solve the problem. Be as specific, clear and concise as possible and try to include every requirement. Currently, we have no idea why you are considering creating a new table for every data point entry; With more details, this conclusion (and a more appropriate solution) might become apparent.
 
No need to ask more questions; Just explain what you're trying to do so that we can help you solve the problem. Be as specific, clear and concise as possible and try to include every requirement. Currently, we have no idea why you are considering creating a new table for every data point entry; With more details, this conclusion (and a more appropriate solution) might become apparent.
This emphasizes the key point -- We don't know why you've already settled on the approach you are questioning. We don't have the back-story, the other options you've considered and thrown out, the special considerations that are being factored, etc.

We also don't know how this data would be used, and what actual examples there would be. To me, this sounds like a delivery/logistics application. So let's assume that X and Y are addresses. Would taking two sets of X and Y be the, respective, origin and final destinations? Or would a set of X and Y be hops on the trip, and would culminate into the final trip path? Or does X and Y each represent a unique location? This isn't completely apparent.
 
As others have pointed out. If you wanted to store this information in a database, you would need the 1 table with a list of all your points. It would also have an ID column which is simply a unique integer for each row (point) in the table. For example { ID, X, Y, ....} (whatever information defines your point other than just X,Y)

Then you would need one more table to store your distances. The distances table would have 3 columns {first_id, second_id, distance} to store the unique ID of the point from the first table for both the points and the distance between them.

I'm not going to tell you that calculating distances is faster on the CPU then using a database because I have no idea what you are doing. However, I will say that you should make sure that it is worth the time it takes to read it from disk. Unless the computation is somewhat complicated it is probably not a good idea to store it in a database. Depending on your problem you may be better off storing a large lookup table in RAM and caching the most frequently requested results and computing ones that you do not know.

As others have mentioned the number of rows in the distances table will grow proportional to the square of the number of points. Or more specifically, it will be equal to the number of combinations possible which is exactly 1/2 * N * (N - 1) where N is the number of points.
 
You don't need the ID column, necessarily. You don't need another table for the distances unless there are somehow more than one distance per location pair. A table storing { X, Y, Distance } is fine, with a primary key over { X, Y }.

The estimate of 1/2 * N * (N - 1) only holds if the distances between two points are equal in the same direction.
 
Assuming you actually need to store the distances (which I reject) you just need two tables....
TBL_Points
TBL_Distances

TBL_Points has the ID, X_Coord, Y_Coord
TBL_Distances has ID1, ID2, Distance

This is still way more than you need... why do you need to store a linear transformation of your data?
Why not just store the points and cross-join it with a formula in your query?
 
You don't need the ID column, necessarily. You don't need another table for the distances unless there are somehow more than one distance per location pair. A table storing { X, Y, Distance } is fine, with a primary key over { X, Y }.

The estimate of 1/2 * N * (N - 1) only holds if the distances between two points are equal in the same direction.

I have to presume Displacement(a,b) is the same as Displacement(b,a)

Storing the distance is silly, as it's nothing more than a transformation of your point locations.
In which case it should just be a table of points (described by their locations)

This problem sounds like homework to me, and it sounds like OP doesn't really understand how databases work :(
 
The distance may or may not be the same in both directions. It's not a presumption I'd be willing to make without an assurance that it was true. Only the OP has that information.

The distance is a "transformation" of the point locations, but it may or may not be a trivial transform. Not storing it means that it can't be used to filter queries. Failing to store the value means it has to be computed for each row examined, and that's really expensive.

Let's say the table has a list of locations in the US. We have an algorithm to compute the actual driving distance (on roads, not just great circle distance or some other trivial transformation) of the value. Maybe there's 16 million locations in the US, so we might have 256 million rows or 128 million rows. Since routed driving directions are usually directional, we probably have 256 million rows.

If I start at my house (one of the 16 million locations) and want to find out how far I can drive in 30 miles. To do this, I have to consider 16 million other locations and re-calculate the route. That's expensive for CPU, and can't be helped by any indexing.

If I have stored the distance, then I can read the 16 million other locations and find the ones that have a stored distance of less than 30 miles. That's very fast -- doesn't require any CPU. And I can help the query along with an index.

I'm not sure it's homework, but it certainly doesn't sound like the OP knows what he wants or even understands the problem he's trying to solve.
 
The distance may or may not be the same in both directions. It's not a presumption I'd be willing to make without an assurance that it was true. Only the OP has that information.

The distance is a "transformation" of the point locations, but it may or may not be a trivial transform. Not storing it means that it can't be used to filter queries. Failing to store the value means it has to be computed for each row examined, and that's really expensive.

Let's say the table has a list of locations in the US. We have an algorithm to compute the actual driving distance (on roads, not just great circle distance or some other trivial transformation) of the value. Maybe there's 16 million locations in the US, so we might have 256 million rows or 128 million rows. Since routed driving directions are usually directional, we probably have 256 million rows.

If I start at my house (one of the 16 million locations) and want to find out how far I can drive in 30 miles. To do this, I have to consider 16 million other locations and re-calculate the route. That's expensive for CPU, and can't be helped by any indexing.

If I have stored the distance, then I can read the 16 million other locations and find the ones that have a stored distance of less than 30 miles. That's very fast -- doesn't require any CPU. And I can help the query along with an index.

I'm not sure it's homework, but it certainly doesn't sound like the OP knows what he wants or even understands the problem he's trying to solve.

Scrolling up (I should have RTFT), OP indicates that distance is not a straight line.
Then I am inclined to agree with you that we shouldn't make that assumption.

Then, given that, it should still only require two tables at most.
 
lol i am really glad i got the "this sounds like homework so i don't want to help you or you should know more" guy to respond ... Zangmonkey, you were helpful. the solution was easy and obvious and i was over thinking the solution. i am limited to how many distances i can calculate per day which is why i want to save it so i don't have to make the same calculation twice.

if the distance was a straight line yes, i would never store it and just make the calculation in the query. anyway, thankyou! zangmonkey.
 
lol i am really glad i got the "this sounds like homework so i don't want to help you or you should know more" guy to respond ... Zangmonkey, you were helpful. the solution was easy and obvious and i was over thinking the solution. i am limited to how many distances i can calculate per day which is why i want to save it so i don't have to make the same calculation twice.

if the distance was a straight line yes, i would never store it and just make the calculation in the query. anyway, thankyou! zangmonkey.

<3

Good luck.
Remember to support StackOverflow
 
Back
Top