Kevin's geek blog: April 2012

My blog updates have been sparse lately. And by sparse, I mean non-existent. I mean rarer than a Google+ post.
There have been a lot of reasons for this – it’s been a rough winter and a busy spring so far.

But I’ve stacked up a lot of geek things to blog. I’ve got a little C# stuff, some SQL tricks, and a bit of general industry and gaming thoughts (for example, how Apple is now becoming more evil than Microsoft). For now, let me start with the C# stuff.

Anonymous types. My guess is you either
a) have no clue what they are,
b) think they’re the most evil thing since 8am meetings and are probably responsible for global warming or
c) love them so much you’re thinking about rewriting every line of code you’ve ever written (even things for companies you don’t work for anymore) to change all variables to anonymous types.

I’ve been a little surprised how little understanding people have of them. I blame Microsoft (as I often do). Why, oh why, did they select the “var” keyword when declaring anonymous types? I’ve had to explain to even senior technical people that they are *not* “variant” data types – like they kind you’d find in Javascript, for example. If you write
var x=””;
you *cannot* ever do
x=new Datetime();

the compiler will stop you.

Anonymous data types are bound at *compile* time, not run time. As I had to explain to someone just this week, there is no binary difference in the assembly between saying:
string x=””;
and
var x=””;
If you look at the MSIL, they are exactly the same.

So, as someone asked me this week: why use them?
Primarily, of course, they’re used for Linq. Because, at design time, the data set you are getting back from a Linq query isn’t known, you can’t very well use named data types. This is where it gets cool though.

If you think about what Linq does, it returns a set of data. That set is unknown at design time (as mentioned), but known at compile time through introspection. So, .Net uses reflection to type the data coming out of the Linq query and then creates class to contain it. The compiler does the work of creating the class definition and setting up a property (get()/set()) for each data element in the set.

The resulting class is “hidden” in that it’s not generated until compile time and in that it’s not named. Of course, names don’t really mean anything to the compiler anyway, so it doesn’t care, but that makes it complicated to actually refer to the specific class in code.
Enter the badly named “var”.

Here’s the cool thing. Since Linq is requiring the compiler to generate a class (really, it could *almost* use a struct, but it doesn’t for another reason which I’ll mention later), this means that the compiler now has the ability to generate class code. This means you can use that outside of Linq queries.

Why would I care? (I mean besides my own mental issues?)

Suppose you have a simple method which, say, reads the lines from a file and stores some info about them. For the sake of simplicity, lets say you want to know the original line number and the actual text from the input file—and you’ll be re-ordering the lines later, so you can’t just use a simple count, you have to persist the original.

You *could* create a class like this:
Class LinesAndCounts
{
int _lineNum =0;
int LineNumber
{
get
{
return _lineNum;
}
set
{
_lineNum=value;
}
}
…
…
…

And so forth.
It’s a lot of typing. And the class gets scoped to the assembly or the project or the namespace, not just the method. What if you’re just wanting to read a file, make some modifications to it, and re-write it, with the original line numbers. That shouldn’t require a class that gets scoped to the whole world, right?

What anonymous types let you do is this:
var x = new {LineNumber=0, LineText=”abc”};

The compiler will automagically generate code more or less the same as what I started to write above (before I got lazy). Frankly, I think that is so beautiful, it brings tears to my eyes (*sniff*,*sniff*). In one line of code, I’ve created a name-less class and assigned values that I can use later. Moreover, that class is scoped to the method/scope-block that created it. Once execution leaves the method, not only does “x” go out of scope, but the class itself gets de-scoped, and the garbage collector takes it back to the city dump.

So let me take a contrived, concrete example. Suppose I want to read all the lines of an input file. Let’s take a simple one:
its line one
its line two

And I want to print all these lines with their corresponding line numbers reversed. So the “its line two” would have line number 1, and “its line one” would have line number 2. And then I’m going to add a header line to throw this number scheme off.
OK, this is an overly simple example, and there are lots of efficient ways to do this using the file system objects and such. But let’s pretend that the operations are more complex than they really are.

I could create a class (call it MyClass) that would have 3 properties: the original line number, the text of the line, and the new line number. Then, I could add a method to this class that would transform the line numbers by taking the original line number and subtracting it from the total number of lines like

int Transform( int origLineNumber, totalLines)
{
return totalLines-origLineNumber;
}

I could create a List to hold the objects, and (for the sake of the demo) step through this original list to move the items to a new List with the re-numbered lines. Pretty inefficient, but demo-worthy and useful sometimes.

Doing that with anonymous types, though, I don’t need to create any of the classes. The code is here:

public class AnnonExample
{
public void messWithFile()
{
int lineNumber = 0;
var firstLine = new { LineNumber = lineNumber, LineText = "this is the new file", NewLineNumber=0 };
var lineList = (new[] { firstLine }).ToList();
lineNumber++;

foreach (string line in System.IO.File.ReadAllLines("fileStuff.txt"))
{
var theLine=new {LineNumber=lineNumber, LineText=line, NewLineNumber=0};
lineList.Add(theLine);
lineNumber++;
}

for (int i=0; i < lineList.Count; i++)
{
var l = lineList.ElementAt(i);
var p2 = Transform(l, (p) => new { LineNumber = p.LineNumber, LineText=p.LineText, NewLineNumber=((lineList.Count ) - (p.LineNumber)) });
Console.WriteLine(p2.LineNumber + ":" + p2.NewLineNumber + ":" + p2.LineText);
}

}
static T Transform(T element, Func transformFunc)
{
return transformFunc(element);
}
}

Note that the classes are all anonymous and loaded into lists and manipulated later. The “Transform” is accepting a lambda expression to do the dirty work, so it could be reused generically.

Under the covers, .Net is using the same anonymous class for each of the lines. It generates it once, then re-uses it. And it enforces type-checking on it – try to assign theLine=”x” at the bottom of the first loop and you’ll get a compile error. The generic list is the same way. You can’t fill it half-way with the lines above, then fill the rest with strings. It’ll bark.

My jury is still out on anonymous types. I can see their uses. But I do think they have the tendency to reduce readability if they’re over used. Still, they can be incredibly useful for the quickie class that you need just to hang onto for one iteration of a loop so that you can modify something, grab the results, then throw it away. In the example above, I could’ve done the same thing with parallel arrays, but I think this is much less messy. It isn’t always obvious how parallel arrays are connected. Here, it’s very straightforward.

In any case, it’s very cool stuff.

--kevin

Kevin's geek blog

Monday, April 23, 2012

Locks in C#

Friday, April 13, 2012

anonymous types & C#

Followers

Blog Archive

About Me