Friday, July 27, 2012

MS Sql and my pain.



SQL Server has some really nice features. It’s best isn’t even really a part of the core database; it’s the Management Studio used to access it. If MS Sql is the only thing you’ve ever used, you’re spoiled. If you’ve never used MSSQL, you don’t know what you’re missing. It really does provide great tools for scripting data constructs, exporting and importing data, database profiling, etc.

But it always annoys me what is missing from MS tools and how we’re told we don’t really want it anyway. In SQL, it’s the “group by”. Suppose you have a set of sales records in your database. You want to find any that don’t have master account records (because something failed) and add the account record, so you won’t have dangling transactions. (Looking at the dangling transactions, dangling participles seem harmless.)

Now, I know… I know. You should have a database constraint that creates a foreign key between the sales table and the accounts table so this can never happen. But, the world should be at peace too. You can’t always control those things.

So you sit down to write the query: “select account_number from sales where [it doesn’t exist in accounts] [then insert it]“. And you realize that there can be multiple sales on that account, but account number is unique in the accounts table. So all you want to do is de-dupe a little.

So far this isn’t too bad. Until you realize that “Customer_Name” is in both sales and account. From a pure normalization perspective, maybe that doesn’t make sense, but it actually does make some sense from a practical  view, since not all the sales on the account may be done by the same person. And the world still isn’t at peace, so you shrug it off and move on.

But how do you do it? You need one of the names from sales to put into the account. Frankly, which one doesn’t really matter. So you try the obvious:

Select account_number, Customer_Name from sales group by account_number.

This works in MySql. It gives one row per account number with a somewhat randomly chosen name. In SQL Server, it throws a compile error and complains that the Customer_Name needs to be part of the group by.

Easy to fix.
Select account_number, Customer_Name from sales group by account_number, Customer_name, right?

No. Now it will give you duplicate account numbers, if there are multiples sales with different names.

After a week of cussing, you end up coming up with a solution, but it’s ugly. It will involve inner selects, MAX/MINs (which won’t work the way you’d think because you’ll have to be concerned with NULLS and the like) or getting the row id with a “Partition Over”. It’s all ugly. It gets worse when you have a combined key, so “account_number” alone isn’t unique, but it’s account_number + store_id + district_code + random_silly_grouping_value.

If you ask MS (or one of their fanbois) they'll tell you that you don't really want to do that. I love it when they say that. They'll say "you don't want that because you can't guarantee which Customer_Name you're getting." When you say you don't care, they'll tell you that the database is mis-designed then. OK... and the world is not at peace. I usually want to give them my boss' number and have them explain how he doesn't really want what he wants, and if he does, he needs to shell out a few million $$ to re-design the data.

It’s the kind of thing that drives me nuts about Microsoft. Doing simple things is so easy. Doing complex things is  easy too. But varying off the path – easy or hard – is so painful.

If you don't believe me, try this: Go to Management Studio and right click on a database and run the wizard to script all the objects in the database. Now, try to do the same thing from a command line, so you can actually.... you know, schedule it. Turns out to be impossible (without rolling your own tool).
Getting up at 3am each night to script things is on the path. Automating it is off.

And the world is not at peace.
--kevin

Wednesday, July 18, 2012

Microsoft and the courts

According to the news from Reuters, the EU is investigating whether Microsoft is in violation of anti-trust rules for blocking non-MS browsers from having fill access to the Windows8 API.

http://in.reuters.com/article/2012/07/18/eu-microsoft-browser-idINL6E8IIALJ20120718

 Here’s a prediction.
Microsoft will say, basically, “It doesn’t matter, we’re not a monopoly anymore.”

To prove that, they’re argue that Windows8 is not a PC operating system, but an OS for PC’s, phones, Xboxes, notepads and greeting cards that play music when you open them. Since, they will say, Microsoft is getting walloped in non-PC devices, they don’t count as a monopoly. It’s analogous, they will say, to saying that Toyota has a monopoly on cars because it sells more hybrids. The PC, they will argue, is only one part of the target market for Windows8, in the same way that the Prius is only one target for windshields.

 Microsoft has a lot of reasons to push Windows8. They seriously have fallen behind in their business strategy. But one of the key reasons, I’m guessing, is that by creating a “one OS to rule them all” platform, they create a legal hole to crawl into.

 For years, they’ve been a monopoly (as defined by law, anyway) and they’ve made a zillion dollars. But they’ve also paid billions in fines for breaking anti-trust laws. If they can convince the courts that Windows8 combines markets the way Toyota combines car styles, they can still keep their current markets and profits without the annoying billions of dollars in fines they face all the time. To them, it must seem like a huge win.

Have your cake + eat it = no more anti-trust.

Personally, I think their argument is actually decent and the courts may go for it. I also think we’ll see more restrictions and modifications to Windows8 to make it better fit the model and support the legal argument. I think that’s actually their hidden reason for forcing PC users to use the Metro start screen. And to that end, ultra-books are a Godsend for Redmond. With them, they get to launch the same ARM-based Windows8 they plan to put on notepad and say “see? A laptop is just a glorified notepad. Since we’re not a monopoly in notepads, we’re not one in notebooks either.”

Will be interesting following what happens in the European courts.

 --kevin

Wednesday, July 11, 2012

Distinct on custom objects in datatables and lists

I use this blog a lot to make notes for myself so that I don’t … umm… what’s the word?

Anyway, here’s some of the latest notes that caused me a research headache. Maybe it will save you a headache, or maybe I’ll remember to look here the next time my headache starts.

Suppose you have a data table in C#.
Let’s say it has a date time and a custom object (type = “myclass”).
The “myclass” thing is simple – it’s just a class with one property “v” which is a DateTime (I’ll plug that in later) and the code to set up and load the table is here:

DataTable dt = new DataTable("sometable"); 
DataColumn dc = new DataColumn("someDate", typeof(DateTime));
 dt.Columns.Add(dc);
 dt.Columns.Add(new DataColumn("someObj", typeof(myclass))); 
List l = new List(); for (int i=0;i<10; i++) 
    DataRow dr = dt.NewRow();
    dr[0] = DateTime.Now; 
    myclass m = new myclass();
    m.v = DateTime.Now.ToString();
    dr[1] = m;
    dt.Rows.Add(dr); l.Add(m); 

So we end up with a table with two columns: one is a DateTime and the other is a myclass. Conceptually, the DateTime and the myclass objects are about the same thing.
You head shouldn’t hurt yet. If it does, maybe you should stop now. But get the aspirin ready:
var n = (from p in dt.AsEnumerable() select p["someDate"]).Distinct();
This is good. Walk through a foreach on n and it gives one instance for every unique date. In theory, each value in the table is distinct because it’s a DateTime and the loop iteration takes a fraction of a second. In practice, the code is so fast, that the runtime engine eats the fraction of a second, and you’ll get one unique entry in n, so your foreach will have one loop iteration. My head is still pain free. Until I do this:
var n = (from p in dt.AsEnumerable() select p["someObj"]).Distinct();
Now the “Distinct” acts on the custom objects and you get 10 items in n. The headache is small, though, because I get that .Net doesn’t know how to compare my custom objects. So it can’t distinct them. MSDN says that myclass needs to implement IEquatable. There are some examples out there that seem to work.
For simplicity, let me the load 10 custom objects into a List, rather than a data table, and I’ll modify myclass to implement IEquatable.
public class myclass: IEquatable

public object v;
public override string ToString()
{
return v.ToString();
}
public override int GetHashCode()
{
return v.GetHashCode();
}
public bool Equals(myclass other)
{
return true;
}
}

I already snuck the List into the code above, so I can use it (cuz I’m sneaky like that).

So I can do:
var lc = (from pp in l
select pp).Distinct();

And this is *supposed* to call the “GetHashCode()” on the custom objects to do the distinct…. But it doesn’t. If you loop through lc, it will give you 10 iterations.

The good news is that I can add a IEqualityComparer to the mix:
class TheComparer:IEqualityComparer
{
public bool Equals(myclass x, myclass y)
{
return true;
}
public int GetHashCode(myclass m)
{
return m.v.GetHashCode();
}
}

Now if I set the EqalityComparer to the Distinct() on the list, it seems to work: TheComparer c = new TheComparer();

var lc = (from pp in l
select pp).Distinct(c);

Yay! So I should be able to apply the same EqualityComarer to the data table’s Distinct() and it should work, right? var mmm = (from p in dt.AsEnumerable() select p["someObj"] ).Distinct(c);

Nope. Not only doesn’t this work, it gives a *compile error*. This is where I got into some really serious aspirin. Turns out that this statement returns a result of IEnumerable, while the previous one returns IEnumerable. They’re both generics that implement IEnumerable, so they support the same methods, right? Not exactly.

The EqualityComparer has to be type cast. If you look at the definition, it is bound to a type. So, even though a DateTime is an object, creating an EqualityComparer with an object type GenericComparer:IEqualityComparer And then trying to use it on a Distinct() for a List will throw a compiler error.

Took me around a day to figure all this out and the solution. The solution is to type cast the Linq:

var mm = ((IEnumerable)(from p in dt.AsEnumerable()
select p["someObj"] as myclass)).Distinct(c);

Note that the only real difference between this and the code above is that I typecast IEnumberable with my custom object. But now, I don’t get a compiler error. Also note that this will not compile unless I also typecast the select.
*NOW* the Distinct will correctly accept and use the EqualityComparer and it can call the Equatable methods on the incoming objects to do the compare.

So far, this is the only way I’ve found to do a Distinct() on a DataTable with custom objects. Microsoft’s internal DataTable (non-Linq) methods don’t like custom objects at all. In fact, they pretty much ignore them.

Next time I’ll probably have a section on how to use custom object in data tables and bind to databound grids dynamically.

--kevin