Most Advanced SQL book

stevedave

Supreme [H]ardness
Joined
Mar 6, 2007
Messages
5,737
I'd consider myself an advanced / expert in SQL but i want to verify this. Is there a SQL resource that only covers the most advanced topics?

Ii think in order to advance I'm going to have to branch out and incorporate programming elements into my sql work but before I start delving into that I'd like to make sure I have mastered SQL.
 
I'd consider myself an advanced / expert in SQL but i want to verify this. Is there a SQL resource that only covers the most advanced topics?

Ii think in order to advance I'm going to have to branch out and incorporate programming elements into my sql work but before I start delving into that I'd like to make sure I have mastered SQL.

What's an outer join? Provide a few use cases. What are the IO subsystem performance characteristics of this type of join?

For your self-test, don't use google. Since this is a forum, there's no need to post the answer since good definitions can be found trivially with 5 seconds of searching. But if you can't immediately start talking (accurately) about questions like that, I wouldn't consider you an "expert in SQL."

Speaking of expert, what do you mean by "expert in SQL?" DBA tasks? OLAP? OLTP? Data modeling? Performance tuning/scalability? Data reliability? Creating high performance, reliable, accurate, inexpensive solutions for business problems?
 
What's an outer join? Provide a few use cases. What are the IO subsystem performance characteristics of this type of join?

For your self-test, don't use google. Since this is a forum, there's no need to post the answer since good definitions can be found trivially with 5 seconds of searching. But if you can't immediately start talking (accurately) about questions like that, I wouldn't consider you an "expert in SQL."

Speaking of expert, what do you mean by "expert in SQL?" DBA tasks? OLAP? OLTP? Data modeling? Performance tuning/scalability? Data reliability? Creating high performance, reliable, accurate, inexpensive solutions for business problems?

knew both answers...though I have never once needed a right outer join and maybe once for a left outer join.

Data analyst/business analyst/reporting analyst level expert. a master of joins, sub selects, temp tables, triggers, case, if , converts, cast, sp, SSIS/DTS/OLAP, Data reliability and validation.

I never really delved into inexpensive solutions because I'm a corporate whore and always will be so I just pop out a server or vm depending on the use.

I could use some more knowledge in performance tuning as all I have done in that area is write code one way than another way and see which is faster example OR vs IN which as it turns out they are the same.

Know basic DBA stuff but don't really want to get into that area. I installed 3 or 4 SQL server environments setup users, backups, roles, imports, exports ect.... nothing too heavy I could probably walk on as a Junior SQL Server DBA nothing higher.

Never did much data modeling I typically tap into existing data from closed sources and importing to SQL server for manipulation.

I'm just wondering if there is a level deeper than I am currently at. The few books I have looked at didn't really cover anything I didn't already know...Its that you don't know what you don't know type of thing.
 
knew both answers...though I have never once needed a right outer join and maybe once for a left outer join.

Data analyst/business analyst/reporting analyst level expert. a master of joins, sub selects, temp tables, triggers, case, if , converts, cast, sp, SSIS/DTS/OLAP, Data reliability and validation.

I never really delved into inexpensive solutions because I'm a corporate whore and always will be so I just pop out a server or vm depending on the use.

I could use some more knowledge in performance tuning as all I have done in that area is write code one way than another way and see which is faster example OR vs IN which as it turns out they are the same.

Know basic DBA stuff but don't really want to get into that area. I installed 3 or 4 SQL server environments setup users, backups, roles, imports, exports ect.... nothing too heavy I could probably walk on as a Junior SQL Server DBA nothing higher.

Never did much data modeling I typically tap into existing data from closed sources and importing to SQL server for manipulation.

I'm just wondering if there is a level deeper than I am currently at. The few books I have looked at didn't really cover anything I didn't already know...Its that you don't know what you don't know type of thing.

Your value to many businesses is limited by your decisions about what aspects of (data) engineering you're willing to invest in.

You've never used left or right outer joins, so I have no confidence that your understanding of the IO characteristics of either is accurate. Without having used either, it's also unlikely that you would be able to accurately discuss their roles in software engineering. This is also surprising, given your initial assertions.

Your comment about comparing OR vs IN was very encouraging to me! This shows that you are curious, and explored the characteristics of how one of your tools (the query optimizer) is interpreting your request. In an interview situation, we would (hopefully) be able to spend 10 minutes discussing the tools you used to analyze some test queries, and then probe the edges of that knowledge - under what conditions would the two syntaxes result in differing execution plan? I'd encourage you to perform many more tests like this! As you are examining execution plans, you should what each operator is doing, what the QO is optimizing for, how it relates to CPU, memory, IO subsystem, and network capabilities, etc.

Given your lack of experience with performance tuning, data modeling, database administration, I regret to inform you that you have not yet reached "SQL Expert" or even advanced.


As for your question, I'd start with some engineering texts:

Mike's essay for programming students is good. Post #2 in that thread has some great software engineering texts you should consider reading.

Read these:
MSDN - Queries
MSDN - Indexes

Eventually, SQL For Smarties will be useful.

But really, at this point, you need to find a project you can complete on your own. It should be data intensive since that's what you're interested in - but it is the process of hitting roadblocks, and solving them yourself that will be invaluable. Next, build a load test framework, so you can start tuning the server and learn how to handle the common bottlenecks you'll encounter. If your project can be used by real users, that's even better. Start taking their feedback, iterating, and improving!
 
As an Amazon Associate, HardForum may earn from qualifying purchases.
Your value to many businesses is limited by your decisions about what aspects of (data) engineering you're willing to invest in.

You've never used left or right outer joins, so I have no confidence that your understanding of the IO characteristics of either is accurate. Without having used either, it's also unlikely that you would be able to accurately discuss their roles in software engineering. This is also surprising, given your initial assertions.

Your comment about comparing OR vs IN was very encouraging to me! This shows that you are curious, and explored the characteristics of how one of your tools (the query optimizer) is interpreting your request. In an interview situation, we would (hopefully) be able to spend 10 minutes discussing the tools you used to analyze some test queries, and then probe the edges of that knowledge - under what conditions would the two syntaxes result in differing execution plan? I'd encourage you to perform many more tests like this! As you are examining execution plans, you should what each operator is doing, what the QO is optimizing for, how it relates to CPU, memory, IO subsystem, and network capabilities, etc.

Given your lack of experience with performance tuning, data modeling, database administration, I regret to inform you that you have not yet reached "SQL Expert" or even advanced.


As for your question, I'd start with some engineering texts:

Mike's essay for programming students is good. Post #2 in that thread has some great software engineering texts you should consider reading.

Read these:
MSDN - Queries
MSDN - Indexes

Eventually, SQL For Smarties will be useful.

But really, at this point, you need to find a project you can complete on your own. It should be data intensive since that's what you're interested in - but it is the process of hitting roadblocks, and solving them yourself that will be invaluable. Next, build a load test framework, so you can start tuning the server and learn how to handle the common bottlenecks you'll encounter. If your project can be used by real users, that's even better. Start taking their feedback, iterating, and improving!

Never 'need' and never 'used' are two different things. I have used plenty of outer joins...but I have yet to 'need' one in a business situation. Had do some IO analysis when attempting to move into a VM environment setting up permon and sitecope counters to measure primarly IOPs but also everything else. Used query analyzer to test indexed and non index and inner join vs left join. but I'd hardly call that performance tuning more performance observing so the tuning part I hope sql for smarties covers.....my vm didn't make the cut. Bad san.

Also I have a degree in computer information systems and deliberately went into sql to avoid programming(been doing sql work for about 6 years now). After taking a programming job right out of college I realized it wasn't for me when I can do sql work for equal/more pay at much less effort.

Your Queries and indexes links were great examples of what I already know....but I will grab that SQL for Smarties.

In your opinion will I eventually need to incorporate programming elements into my SQL knowledge? I know a couple of people who do. All their sql code is embedded in C# but they are web developers and I am not.
 
Last edited:
As an Amazon Associate, HardForum may earn from qualifying purchases.
I'll start by saying I am no expert in SQL. I've been a DBA/BI and ETL developer/junior programmer for about 5 years now and the more I learn the more I realize how little I truly know.

I would suggest you join some communities and start writing/blogging. This will give you feedback very quickly about how much you know. Even if you have a deep understanding of writing TSQL, by putting your thoughts out for others to view you will get feedback on other ways to approach an issue. I have no doubt you are proficient in writing TSQL and understanding the impact of how you structure your queries but it sounds like you could learn more about how the SQL Server engine works. For example, why does SQL use a seek instead of a scan and what performance impact does that have on your query/IO/memory? How does auto-growth on your log file affect performance and what considerations should be taken in your queries to avoid affecting transaction log backups due to the internal structure of the log file? Knowing the internals of the engine will make you better at writing efficient TSQL because nothing in SQL works functions in a vacuum.

I would also suggest you read up on data modeling if you are most concerned about being a high level analyst. I would suggest you start by reading Kimball's books. You should also be looking forward at big data and learning Hadoop, blah blah blah.

As to your question about programming it depends on what skills you want to possess and what jobs you want to be able to fill. I vote you should learn other languages and "best" practices so you have a better understanding of how they interact and access your databases. That will make you much more effective at communicating with your programming staff and being a better team member.
 
As an Amazon Associate, HardForum may earn from qualifying purchases.
Expanding your mastery of SQL should include a deep understanding on query plans, what sorts of scans are preferable for certain data sets (hash, index, clustered etc), process ordering, paging, replication practices and sharding among other things.

I don't have a book recommendation for you, but there is a DBA stackexchange... go there and have a goal of 5k rep to get you started. The best way to hone mastery is to teach peers.
 
Yes, there is.

Have you ever processed resumes/applications where somebody rates themselves a "10/10" on something and they actually are?

I used to work for a company that had this process pre-interview with a self-assessment.
If somebody marked themselves a 10/10 on something I pretty much always tossed it.
 
Programming SQL server 2008 (ms press), is a pretty good book, definitely beyond the basics.

Also I recommend looking at some nosql alternatives to widen your knowledge.
 
How does auto-growth on your log file affect performance and what considerations should be taken in your queries to avoid affecting transaction log backups due to the internal structure of the log file? Knowing the internals of the engine will make you better at writing efficient TSQL because nothing in SQL works functions in a vacuum.
Kimball's books. You should also be looking forward at big data and learning Hadoop, blah blah blah.

As to your question about programming it depends on what skills you want to possess and what jobs you want to be able to fill. I vote you should learn other languages and "best" practices so you have a better understanding of how they interact and access your databases. That will make you much more effective at communicating with your programming staff and being a better team member.


Yeah this kind of stuff would be nice to learn. Just got my director to buy this and the SQL for smarties books.

For now I will avoid re-re-learning c#. I know the basics of programming but not doing it every day keeps me in a constant state of rust...plus I don't enjoy it. I'd rather focus on becoming a SQL expert since I enjoy it.

Thanks for the book suggestions.
 
As an Amazon Associate, HardForum may earn from qualifying purchases.
Found an example on where I'd like some advanced sql server knowledge.

A while back I replicated our environment into a VM for fun.
Generally the VM performs around 20% better on our Stored Procedures and they both use the same execution plan.

Except for 1 SP which takes 12 seconds on the physical box and 120 second on the vm. They run completely different Execution plans. The Physical does parallelism and the VM doesn't. I've done a few small things like adjusting the cost of parallelism or max ram and nothing changes.

How would I find the reasoning behind the execution plan? I can easily see what happened by looking at the execution plan....but I can't seem to find the WHY.
 
I don't know if anyone's mentioned it, but branch out into other DBMS systems like Cassandra, MongoDB, RavenDB and Redis. It's a different paradigm and it's not just the simplified syntax, but also possibilities. No longer do you have to setup huge IBM DB2 data warehouses when you can use NoSQL.
 
Have you ever processed resumes/applications where somebody rates themselves a "10/10" on something and they actually are?

I used to work for a company that had this process pre-interview with a self-assessment.
If somebody marked themselves a 10/10 on something I pretty much always tossed it.

Yes and know: I certainly know and have worked with people who are legitimately 10/10, but I haven't seen their resumes -- I didn't interview or hire them.

I'd tend to agree that candidates can over-estimate their own skills, but unless your application process makes it clear what the numeric score really means, you might end up throwing out some that are good enough for you even if they're not the 10/10 you think they're asking.
 
Found an example on where I'd like some advanced sql server knowledge.

A while back I replicated our environment into a VM for fun.
Generally the VM performs around 20% better on our Stored Procedures and they both use the same execution plan.

Except for 1 SP which takes 12 seconds on the physical box and 120 second on the vm. They run completely different Execution plans. The Physical does parallelism and the VM doesn't. I've done a few small things like adjusting the cost of parallelism or max ram and nothing changes.

How would I find the reasoning behind the execution plan? I can easily see what happened by looking at the execution plan....but I can't seem to find the WHY.

Go grab yourself a book an execution plans. I can't tell you why it's behaving that way because I'm not that skilled but my guess would be something related to the stats on the tables. You replicated the environment but once the DB is not longer under the same load differences start forming. Are you performing the same maintenance on your new environment as your old (updating stats and indexes)? Have you dropped your SP cache so that a new plan is created?
 
Found an example on where I'd like some advanced sql server knowledge.

A while back I replicated our environment into a VM for fun.
Generally the VM performs around 20% better on our Stored Procedures and they both use the same execution plan.

Except for 1 SP which takes 12 seconds on the physical box and 120 second on the vm. They run completely different Execution plans. The Physical does parallelism and the VM doesn't. I've done a few small things like adjusting the cost of parallelism or max ram and nothing changes.

How would I find the reasoning behind the execution plan? I can easily see what happened by looking at the execution plan....but I can't seem to find the WHY.

This just showed up in my inbox SQL Server: Troubleshooting Query Plan Quality Issues, haven't had a chance to view it. They have a 200 minute free trial.
 
Found an example on where I'd like some advanced sql server knowledge.

A while back I replicated our environment into a VM for fun.
Generally the VM performs around 20% better on our Stored Procedures and they both use the same execution plan.

Except for 1 SP which takes 12 seconds on the physical box and 120 second on the vm. They run completely different Execution plans. The Physical does parallelism and the VM doesn't. I've done a few small things like adjusting the cost of parallelism or max ram and nothing changes.

How would I find the reasoning behind the execution plan? I can easily see what happened by looking at the execution plan....but I can't seem to find the WHY.


The VM likely outperforms on non-parallel versions of the same query plan because it has loaded more information into memory on the virtual host.

VMs (typically) are limited in their host awareness and the threading capabilities of the host are "none of the VM's business". In fact, many threading processes are limited in the VM by the host, so it's completely reasonable that a threading execution plan may not have been selected.
 
The VM likely outperforms on non-parallel versions of the same query plan because it has loaded more information into memory on the virtual host.

VMs (typically) are limited in their host awareness and the threading capabilities of the host are "none of the VM's business". In fact, many threading processes are limited in the VM by the host, so it's completely reasonable that a threading execution plan may not have been selected.

that is possible is know very little about VM's. The physical server has 36 gigs of ram 30 dedicated to SQL server and the VM has 8 with 6.8 available to sql server.....Its hard to imagine the VM outperforming the Physical in anything ram related.

The real question is; Is there a way to find out? Some type of log that explains SQL servers decision making process?
 
Last edited:
The real question is; Is there a way to find out? Some type of log that explains SQL servers decision making process?
Given that this is a shared environment with asynchronous contentions for resources, why not run Performance Monitor first (on both the VM and the host) to identify the kind of bottleneck? This could even be run in parallel with your systems/network admins to do a full trace analysis.
 
The real question is; Is there a way to find out? Some type of log that explains SQL servers decision making process?

No. For a trivial query, the QO might only evaluate a couple of different access paths to the data. For a more involved query, there might be dozens of thousands of different plans generated and evaluations completed. Such a log wouldn't be really readable, and would take a great deal of background knowledge to understand.

Through experience, you can learn to read execution plans and reverse engineer the attributes that cause the QO to select a particular plan. You can augment your experience by reading up on query plans and their meaning in any number of books. The keys are understanding that there are multiple access paths to data, multiple operators with different run time characteristics, and different overall strategies to generating a query plan. Once you understand these, knowing which one the QO might choose can become more intuitive.

You're saying that your physical server chooses a parallel plan and your VM doesn't choose a parallel plan. There can be several different reasons for that choice. One is that the VM doesn't have as many spindles for the data at its source. Another is that the VM is running a different edition and bitness of SQL Server. Another is that the VM's instance of SQL Server is configured to not enable parallel plans, or doesn't see as many (or enough) available processors. Parallel plans can take far more memory than single plans, and if the QO believes the memory grant won't be available, it might not issue a parallel plan -- so the difference in available memory in each instance might be an issue.

Almost all of the input the QO considers when building a plan is visible to you. If you're comparing plan generation from one machine to another and find a difference, you should start figuring out why by comparing that input from one machine to another.
 
Last edited:
Back
Top