Wednesday, July 11, 2012

Distinct on custom objects in datatables and lists

I use this blog a lot to make notes for myself so that I don’t … umm… what’s the word?

Anyway, here’s some of the latest notes that caused me a research headache. Maybe it will save you a headache, or maybe I’ll remember to look here the next time my headache starts.

Suppose you have a data table in C#.
Let’s say it has a date time and a custom object (type = “myclass”).
The “myclass” thing is simple – it’s just a class with one property “v” which is a DateTime (I’ll plug that in later) and the code to set up and load the table is here:

DataTable dt = new DataTable("sometable"); 
DataColumn dc = new DataColumn("someDate", typeof(DateTime));
 dt.Columns.Add(dc);
 dt.Columns.Add(new DataColumn("someObj", typeof(myclass))); 
List l = new List(); for (int i=0;i<10; i++) 
    DataRow dr = dt.NewRow();
    dr[0] = DateTime.Now; 
    myclass m = new myclass();
    m.v = DateTime.Now.ToString();
    dr[1] = m;
    dt.Rows.Add(dr); l.Add(m); 

So we end up with a table with two columns: one is a DateTime and the other is a myclass. Conceptually, the DateTime and the myclass objects are about the same thing.
You head shouldn’t hurt yet. If it does, maybe you should stop now. But get the aspirin ready:
var n = (from p in dt.AsEnumerable() select p["someDate"]).Distinct();
This is good. Walk through a foreach on n and it gives one instance for every unique date. In theory, each value in the table is distinct because it’s a DateTime and the loop iteration takes a fraction of a second. In practice, the code is so fast, that the runtime engine eats the fraction of a second, and you’ll get one unique entry in n, so your foreach will have one loop iteration. My head is still pain free. Until I do this:
var n = (from p in dt.AsEnumerable() select p["someObj"]).Distinct();
Now the “Distinct” acts on the custom objects and you get 10 items in n. The headache is small, though, because I get that .Net doesn’t know how to compare my custom objects. So it can’t distinct them. MSDN says that myclass needs to implement IEquatable. There are some examples out there that seem to work.
For simplicity, let me the load 10 custom objects into a List, rather than a data table, and I’ll modify myclass to implement IEquatable.
public class myclass: IEquatable

public object v;
public override string ToString()
{
return v.ToString();
}
public override int GetHashCode()
{
return v.GetHashCode();
}
public bool Equals(myclass other)
{
return true;
}
}

I already snuck the List into the code above, so I can use it (cuz I’m sneaky like that).

So I can do:
var lc = (from pp in l
select pp).Distinct();

And this is *supposed* to call the “GetHashCode()” on the custom objects to do the distinct…. But it doesn’t. If you loop through lc, it will give you 10 iterations.

The good news is that I can add a IEqualityComparer to the mix:
class TheComparer:IEqualityComparer
{
public bool Equals(myclass x, myclass y)
{
return true;
}
public int GetHashCode(myclass m)
{
return m.v.GetHashCode();
}
}

Now if I set the EqalityComparer to the Distinct() on the list, it seems to work: TheComparer c = new TheComparer();

var lc = (from pp in l
select pp).Distinct(c);

Yay! So I should be able to apply the same EqualityComarer to the data table’s Distinct() and it should work, right? var mmm = (from p in dt.AsEnumerable() select p["someObj"] ).Distinct(c);

Nope. Not only doesn’t this work, it gives a *compile error*. This is where I got into some really serious aspirin. Turns out that this statement returns a result of IEnumerable, while the previous one returns IEnumerable. They’re both generics that implement IEnumerable, so they support the same methods, right? Not exactly.

The EqualityComparer has to be type cast. If you look at the definition, it is bound to a type. So, even though a DateTime is an object, creating an EqualityComparer with an object type GenericComparer:IEqualityComparer And then trying to use it on a Distinct() for a List will throw a compiler error.

Took me around a day to figure all this out and the solution. The solution is to type cast the Linq:

var mm = ((IEnumerable)(from p in dt.AsEnumerable()
select p["someObj"] as myclass)).Distinct(c);

Note that the only real difference between this and the code above is that I typecast IEnumberable with my custom object. But now, I don’t get a compiler error. Also note that this will not compile unless I also typecast the select.
*NOW* the Distinct will correctly accept and use the EqualityComparer and it can call the Equatable methods on the incoming objects to do the compare.

So far, this is the only way I’ve found to do a Distinct() on a DataTable with custom objects. Microsoft’s internal DataTable (non-Linq) methods don’t like custom objects at all. In fact, they pretty much ignore them.

Next time I’ll probably have a section on how to use custom object in data tables and bind to databound grids dynamically.

--kevin

No comments:

Post a Comment