SQL Server has some really nice features. It’s best isn’t
even really a part of the core database; it’s the Management Studio used to
access it. If MS Sql is the only thing you’ve ever used, you’re spoiled. If you’ve
never used MSSQL, you don’t know what you’re missing. It really does provide
great tools for scripting data constructs, exporting and importing data, database
profiling, etc.
But it always annoys me what is missing from MS tools and
how we’re told we don’t really want it anyway. In SQL, it’s the “group by”.
Suppose you have a set of sales records in your database. You want to find any
that don’t have master account records (because something failed) and add the
account record, so you won’t have dangling transactions. (Looking at the
dangling transactions, dangling participles seem harmless.)
Now, I know… I know. You should have a database constraint
that creates a foreign key between the sales table and the accounts table so
this can never happen. But, the world should be at peace too. You can’t always
control those things.
So you sit down to write the query: “select account_number
from sales where [it doesn’t exist in accounts] [then insert it]“. And you
realize that there can be multiple sales on that account, but account number is
unique in the accounts table. So all you want to do is de-dupe a little.
So far this isn’t too bad. Until you realize that “Customer_Name”
is in both sales and account. From a pure normalization perspective, maybe that
doesn’t make sense, but it actually does make some sense from a practical view, since not all the sales on the account
may be done by the same person. And the world still isn’t at peace, so you
shrug it off and move on.
But how do you do it? You need one of the names from sales to
put into the account. Frankly, which one doesn’t really matter. So you try the
obvious:
Select account_number, Customer_Name from sales group by
account_number.
This works in MySql. It gives one row per account number
with a somewhat randomly chosen name. In SQL Server, it throws a compile error
and complains that the Customer_Name needs to be part of the group by.
Easy to fix.
Select account_number, Customer_Name from sales group by
account_number, Customer_name, right?
No. Now it will give you duplicate account numbers, if
there are multiples sales with different names.
After a week of cussing, you end up coming up with a
solution, but it’s ugly. It will involve inner selects, MAX/MINs (which won’t
work the way you’d think because you’ll have to be concerned with NULLS and the
like) or getting the row id with a “Partition Over”. It’s all ugly. It gets
worse when you have a combined key, so “account_number” alone isn’t unique, but
it’s account_number + store_id + district_code + random_silly_grouping_value.
If you ask MS (or one of their fanbois) they'll tell you that you don't really want to do that. I love it when they say that. They'll say "you don't want that because you can't guarantee which Customer_Name you're getting." When you say you don't care, they'll tell you that the database is mis-designed then. OK... and the world is not at peace. I usually want to give them my boss' number and have them explain how he doesn't really want what he wants, and if he does, he needs to shell out a few million $$ to re-design the data.
If you ask MS (or one of their fanbois) they'll tell you that you don't really want to do that. I love it when they say that. They'll say "you don't want that because you can't guarantee which Customer_Name you're getting." When you say you don't care, they'll tell you that the database is mis-designed then. OK... and the world is not at peace. I usually want to give them my boss' number and have them explain how he doesn't really want what he wants, and if he does, he needs to shell out a few million $$ to re-design the data.
It’s the kind of thing that drives me nuts about Microsoft.
Doing simple things is so easy. Doing complex things is easy too. But varying off the path – easy or
hard – is so painful.
If you don't believe me, try this: Go to Management Studio and right click on a database and run the wizard to script all the objects in the database. Now, try to do the same thing from a command line, so you can actually.... you know, schedule it. Turns out to be impossible (without rolling your own tool).
Getting up at 3am each night to script things is on the path. Automating it is off.
And the world is not at peace.
Getting up at 3am each night to script things is on the path. Automating it is off.
And the world is not at peace.
--kevin
No comments:
Post a Comment