Current area: HOME -> Blogs -> pheaven's Blog -> Read Post

When Linq and ref Parameters Meet

Posted on Tuesday, April 01, 2008 at 7:17 AM
It's fair to say that I'm a pretty heavy user of Linq. You'll find uses of it scattered across my code, from the obvious (using DLinq to query a database) to the slightly more exotic (writing queries over collections obtained from classes in System.Reflection). Linq often allows you to express a problem very neatly, resulting in compact, readable code. It also factors out the application of operations and leaves you to worry about the operations themselves, likely decreasing bugs.

Fixing A Bug

Today I ran across some code that took a parameter, then used it in a Linq query. Omitting the clutter, it looked something like this:
public void Lookup(string URL, ref int ID, ref int Status)
{
    // ...stuff...
    var Result = from D in DB.Datas
                 where D.URL == URL
                 select D;
    // ...more stuff...
}
It then set the couple of ref parameters based upon the results of the query (they really should have been out parameters, and I'll likely do some further refactoring later). However, the code calling this also appeared to be assuming that that the URL parameter would be updated to exactly match the entry in the database. This mattered because the "where" clause would actually match if the case was different, and the calling code wanted to know when this was the case. Clearly, this code couldn't be updating URL; it wasn't a ref parameter. Making it a ref parameter here and in the calling code was the obvious fix, so I did that. The code now looked more like this:
public void Lookup(string ref URL, ref int ID, ref int Status)
{
    // ...stuff...
    var Result = from D in DB.Datas
                 where D.URL == URL
                 select D;
    // ...more stuff, including updating URL...
}
And then I hit compile.

The Compile Error

Making the above change gave me the following compile error.
Cannot use ref or out parameter 'URL' inside an anonymous
method, lambda expression, or query expression
Or, re-phrased to fit this specific problem, "you can't use ref parameters in a Linq query". My first thought was, "huh, what the...". My second was, "OK, we can easily work around that." But of course, the main question that lingered after the initial "just do something that works so I can fix the bug" was, "but why can't I?"

The Workaround

The workaround is straightforward enough: you just make a local variable and assign the ref parameter to it.
public void Lookup(string ref URL, ref int ID, ref int Status)
{
    // ...stuff...
    var Lookup_URL = URL;
    var Result = from D in DB.Datas
                 where D.URL == Lookup_URL
                 select D;
    // ...more stuff, including updating URL...
}
This compiles and works just fine. So if that's all you were looking for, you have an answer. But now let's dig into the "why".

Introducing ILDASM

ILDASM is short for Intermediate Language Disassembler. A .Net executable contains a sequence of low-level instructions encoded in a binary format that we call bytecode, as well as tables of metadata about the classes and methods it contains. ILDASM turns that bytecode into IL, or Intermediate Language: a human readable representation of the binary data.

To get us started, let's just look at disassembling the Hello World program. First, we compile the following C# into an EXE file:
class Program
{
    static void Main(string[] args)
    {
        Console.WriteLine("Hello, world!");
    }
}
And then we use ILDASM to get the Intermediate Language. Stripping away some sections we need not care about, we get:
.class private auto ansi beforefieldinit Program
       extends [mscorlib]System.Object
{
  .method private hidebysig static void  Main(string[] args) cil managed
  {
    .entrypoint
    // Code size       13 (0xd)
    .maxstack  8
    IL_0000:  nop
    IL_0001:  ldstr      "Hello, world!"
    IL_0006:  call       void [mscorlib]System.Console::WriteLine(string)
    IL_000b:  nop
    IL_000c:  ret
  } // end of method Program::Main
  .method public hidebysig specialname rtspecialname
          instance void  .ctor() cil managed
  {
    // Code size       7 (0x7)
    .maxstack  8
    IL_0000:  ldarg.0
    IL_0001:  call       instance void [mscorlib]System.Object::.ctor()
    IL_0006:  ret
  } // end of method Program::.ctor
} // end of class Program
You'll see that there is a class declaration and two method declarations. The first is the Main method that we wrote. We can see that it loads the string "Hello, world!" onto the stack, then calls the the System.Console class' WriteLine method. Note that it finds its argument on the stack. The second method is actually a constructor and was generated for us; all it does is call our superclass' constructor.

As an aside for those of you wandering, "why the two nop (no operation) instructions", the reason for these is that I build this in debug mode. It inserts nop instructions anywhere that you could want to insert a breakpoint, and then at runtime actually patches the bytecode when you attach a debugger to contain "break" rather than "nop" opcodes.

The .Net CLR Doesn't Do Anonymous Methods

Now we'll write an example with an anonymous method. This code simply creates a list of integer values, then removes all of those that are less than or equal to five. I've highlighted the anonymous method - written with the compact lambda syntax - in red. It takes a single parameter, x, written to the left of the "=>", then checks that it is less than 5, producing a boolean result.
class Program
{
    static void Main(string[] args)
    {
        var TestList = new List<int>() { 1, 3, 8, 9, 10 };
        TestList.RemoveAll(x => x <= 5);
        foreach (var Val in TestList)
            Console.WriteLine(Val.ToString());
    }
}
If you run ILDASM on this program, you get quite a lot of output. I'll spare you the full listing (but feel free to try it) and just show you the parts that matter. The biggest thing you'll notice when going through the listing is that we have a new method.
.method private hidebysig static bool '<Main>b__1'(int32 x)
  cil managed
{
  .custom instance void [mscorlib]System.Runtime.CompilerServices.
    CompilerGeneratedAttribute::.ctor() = ( 01 00 00 00 )
  // Code size       12 (0xc)
  .maxstack  2
  .locals init ([0] bool CS$1$0000)
  IL_0000:  ldarg.0
  IL_0001:  ldc.i4.5
  IL_0002:  cgt
  IL_0004:  ldc.i4.0
  IL_0005:  ceq
  IL_0007:  stloc.0
  IL_0008:  br.s       IL_000a
  IL_000a:  ldloc.0
  IL_000b:  ret
} // end of method Program::'<Main>b__1'
This is actually our anonymous method, but it's not so anonymous once compiled. It's name is "<Main>b__1" - a name that we could never write in C# since it contains angle brackets, and thus has no chance of conflicting with anything else in our program. What's notable is that this is a method of its own, with only the name to hint to us what method it was originally contained within.

There are other interesting things in IL that we could spend a while looking at, including the fact that it makes it a delegate for this method and caches it in a private field to aid performance, however we'll put that aside for now and look at the next issue.

The .Net CLR Doesn't Do Lexical Scope

In the .Net CLR, methods can't see each other's local variables (unless we call one and pass it as a "ref"). This is normally the case in C# too, until we start writing anonymous methods. Then we can write anonymous methods that refer to variables in the enclosing method. Let's change our program to exhibit this.
class Program
{
    static void Main(string[] args)
    {
        var TestList = new List<int>() { 1, 3, 8, 9, 10 };
        int Maximum = 5;
        TestList.RemoveAll(x => x <= Maximum);
        foreach (var Val in TestList)
            Console.WriteLine(Val.ToString());
    }
}
This produces the same results as the previous version of the program. Now let's look at the disassembly. The first surprise we get is to discover that our Program class now actually contains a new, nested class!
  .class auto ansi sealed nested private beforefieldinit
    '<>c__DisplayClass2'
         extends [mscorlib]System.Object
  {
    .custom instance void [mscorlib]System.Runtime.CompilerServices.
      CompilerGeneratedAttribute::.ctor() = ( 01 00 00 00 )
    .field public int32 Maximum
    .method public hidebysig specialname rtspecialname
            instance void  .ctor() cil managed
    {
      // Code size       7 (0x7)
      .maxstack  8
      IL_0000:  ldarg.0
      IL_0001:  call       instance void [mscorlib]System.Object::.ctor()
      IL_0006:  ret
    } // end of method '<>c__DisplayClass2'::.ctor
    .method public hidebysig instance bool
            '<Main>b__1'(int32 x) cil managed
    {
      // Code size       17 (0x11)
      .maxstack  2
      .locals init ([0] bool CS$1$0000)
      IL_0000:  ldarg.1
      IL_0001:  ldarg.0
      IL_0002:  ldfld      int32 Program/'<>c__DisplayClass2'::Maximum
      IL_0007:  cgt
      IL_0009:  ldc.i4.0
      IL_000a:  ceq
      IL_000c:  stloc.0
      IL_000d:  br.s       IL_000f
      IL_000f:  ldloc.0
      IL_0010:  ret
    } // end of method '<>c__DisplayClass2'::'<Main>b__1'
  } // end of class '<>c__DisplayClass2'
Because methods can not see each other's locals, the compiler has to resort to using fields to exchange information between them. So how is this class used inside our Main method?

First, it stores an instance of this class as a local variable of Main, instantiating it right at the start of the method. Yes, that means that you have an instance of the nested class created per call to the method! This is needed to get the correct semantics in multi-threaded code.

Whenever we reference the Maximum variable in our method, even outside of the lambda expression, it actually gets compiled down to a store field or load field instruction. This means that it's not, at the bytecode level, a local variable any more, which means it will be more expensive to access at runtime.

Code That Uses ref

So what does code that uses ref look like at an Intermediate Language level? Once again, we'll write a small test program:
class Program
{
    static void Main(string[] args)
    {
        var Test = 2;
        DoubleMe(ref Test);
        Console.WriteLine(Test);
    }
    static void DoubleMe(ref int Val)
    {
        Val *= 2;
    }
}
This program produces 4. Next up, we'll disassemble it:
.class private auto ansi beforefieldinit Program
       extends [mscorlib]System.Object
{
  .method private hidebysig static void  Main(string[] args) cil managed
  {
    .entrypoint
    // Code size       19 (0x13)
    .maxstack  1
    .locals init ([0] int32 Test)
    IL_0000:  nop
    IL_0001:  ldc.i4.2
    IL_0002:  stloc.0
    IL_0003:  ldloca.s   Test
    IL_0005:  call       void Program::DoubleMe(int32&)
    IL_000a:  nop
    IL_000b:  ldloc.0
    IL_000c:  call       void [mscorlib]System.Console::WriteLine(int32)
    IL_0011:  nop
    IL_0012:  ret
  } // end of method Program::Main
  .method private hidebysig static void  DoubleMe(int32& Val) cil managed
  {
    // Code size       8 (0x8)
    .maxstack  8
    IL_0000:  nop
    IL_0001:  ldarg.0
    IL_0002:  dup
    IL_0003:  ldind.i4
    IL_0004:  ldc.i4.2
    IL_0005:  mul
    IL_0006:  stind.i4
    IL_0007:  ret
  } // end of method Program::DoubleMe
  .method public hidebysig specialname rtspecialname
          instance void  .ctor() cil managed
  {
    // Code size       7 (0x7)
    .maxstack  8
    IL_0000:  ldarg.0
    IL_0001:  call       instance void [mscorlib]System.Object::.ctor()
    IL_0006:  ret
  } // end of method Program::.ctor
} // end of class Program
I have colored the interesting lines in red. The first line shows us using the lodloca instruction rather than ldloc, which loads the address of the local variable onto the stack rather than the value of the local variable. Then, inside DoubleMe, it uses ldind and stind to load and store the value indirectly, that is, using the address.

So could they have made it work?

If all the IL has made you feel woozy, it's about to get worse. Now we've seen all the previous examples, let's try hand-editing some IL to see if we can actually get ref parameters to work in lambdas, and if not working out exactly why we can't. The first step is to write a program that is as close to what we want as we can get C# to compile, since editing IL is a pain.
class Program
{
    static void Main(string[] args)
    {
        var TestList = new List<int>() { 1, 3, 8, 9, 10 };
        int MaxVal = 5;
        RemoveVal(TestList, MaxVal);
        foreach (var Val in TestList)
            Console.WriteLine(Val.ToString());
    }
    static void RemoveVal(List<int> TheList, int Maximum)
    {
        TheList.RemoveAll(x => x <= Maximum);
    }
}
Next we can used ILDASM to get the IL code. Now let's start doing some small changes. First, we'll change the signature of RemoveVal to take an int32&, and the call to it to load the address of the MaxVal local and pass that instead of passing the value itself. Then inside RemoveVal, where we previously just loaded the parameter and assigned it to the Maximum field in the generated nested class, we'll add an extra instruction to load the indirect field.
    IL_0007:  ldarg.1
    IL_00071: ldind.i4
    IL_0008:  stfld      int32 Program/'<>c__DisplayClass2'::Maximum
Using ILASM, we can re-assemble this to an executable file, and discover that the program still works. However, that's not really got us any further than a temporary local would have in C#. What we actually want is to store the indirect reference in a field. Can we have a field of type int32&, though? To find out, in the anonymous class let's add one:
    .field public int32& MaximumTest
If we try and assemble this, it produces an EXE file. It all looks good until we try and run it, at which point the CLR crashes, leaving the following message behind:
Unhandled Exception: System.Runtime.InteropServices.COMException (0x801312E4):
Field of ByRef type. (Exception from HRESULT: 0x801312E4)
   at Program.RemoveVal(List`1 TheList, Int32& Maximum)
   at Program.Main(String[] args)
Why, though? Well, the reason is quite simply that, in general, it is unsafe to allow fields of ByRef type. This is because a ByRef is essentially a pointer into a stack frame, which is where local variables are stored. If you store a pointer into it, and that pointer exists beyond the stack frame (that is, until after we return from the method that declared the local we passed a reference to), you are pointing into unknown and very likely invalid memory. That would compromise the type-safety of the VM. I'd run across this very issue before in a different situation, so it was interesting to see it coming up as the answer to the question we set out to answer.

Summary

So, to summarize all of this drawn-out analysis, the reason we can't use ref and out parameters in an anonymous method (and thus a Lambda expression or a Linq query) is as follows. Anonymous methods compile down to separate methods. If they are accessing parameters or local variables, to share those with the method implementing the lambda expression we are required to put them in a field. To ensure the type safety of the CLR, we can not store a ByRef value (what we get when we use ref and out) in a field. I suspect that fixing this well would require adding some extra support to the .Net Common Language Runtime itself, which is relatively unlikely to happen; this happened between .Net 1.x and .Net 2.x and I suspect they won't be in a hurry to do that again (the underlying VM between .Net 2.x and .Net 3.x is actually the same, it's just some extra libraries). You could do it by making stack frames garbage-collectible so it's OK for them to live beyond an invocation (but that probably hurts performance a lot), or you could do it by adding support for lexical scopes and closures, so the compiler doesn't have to fake them using fields. The Parrot VM, which I work on, does the second of these.

Anyway, I hope this analysis has given you more of a sense of the Real Answer, rather than the compilers simple, "you can't".
Tags: .NET, C# 3.0, Linq

Comments
Thank you! - Posted on Monday, April 14, 2008 at 6:17 PM by Peter
Thank you for your work analyzing this. I've just hit the problem and the "you just can't do it" answer never satisfies me :) My intuition told me it was something along these lines, but you've managed to clear the fog.
Quick Q - Posted on Monday, April 14, 2008 at 6:53 PM by Peter
So, in a lambda expression, as long as I only use the variables that are passed in through the expression I will avoid creating an extra class, right?
Example: (p1, p2, p3) => { Method1(p1); Method2(p2); Method3(p3); }
Right - Posted on Tuesday, April 15, 2008 at 4:36 AM by pheaven
Yes, in that example you showed, no extra class will be generated. It's only if you're using variables from an outer lexical (that is, static) scope.
sdf - Posted on Monday, May 12, 2008 at 4:19 PM by sdf
http://vredit.wikidot.com/free-credit-report
http://vredit.wikidot.com/free-annual-credit-report
http://vredit.wikidot.com/free-credit-report-com
http://vredit.wikidot.com/free-online-credit-report
http://vredit.wikidot.com/my-free-credit-report
http://vredit.wikidot.com/free-credit-report-on-line
http://vredit.wikidot.com/free-credit-report-and-score
http://vredit.wikidot.com/free-credit-report-government
http://vredit.wikidot.com/free-credit-score-report
http://vredit.wikidot.com/free-anual-credit-report
http://vredit.wikidot.com/get-a-free-credit-report
http://vredit.wikidot.com/www-free-credit-report-com
http://vredit.wikidot.com/get-free-credit-report
http://vredit.wikidot.com/free-yearly-credit-report
http://vredit.wikidot.com/free-credit-reports
http://vredit.wikidot.com/credit-report-for-free
http://vredit.wikidot.com/experian-free-credit-report
http://vredit.wikidot.com/free-annual-credit-report-com
http://vredit.wikidot.com/free-credit-report-no-credit-card
http://vredit.wikidot.com/free-credit-report-gov
http://vredit.wikidot.com/free-copy-of-credit-report
http://vredit.wikidot.com/free-instant-credit-report
http://vredit.wikidot.com/free-credit-report-no-credit-card-required
http://vredit.wikidot.com/totally-free-credit-report
http://vredit.wikidot.com/free-annual-credit-reports
http://vredit.wikidot.com/free-credit-reports-online
http://vredit.wikidot.com/how-to-get-a-free-credit-report
http://vredit.wikidot.com/my-free-credit-report-com
http://vredit.wikidot.com/your-free-credit-report


Sponsored links

Six Sigma Certification
100% Online-Six Sigma Certificate from Villanova - Find Out More Now.
Virtual File System SDK
Create your own file systems in Windows and .NET applications
PureCM Software Configuration Management
Version control and integrated issue tracking - powerful and easy to use. Get your FREE trial now!
Software Localization Tool Sisulizer
Localize DotNet, C++ Builder, Delphi, C/C++, Visual Basic & Java apps & html help. Try Sisulizer now
Experience Adobe? FLASH MEDIA SERVER 3
Introducing the media solution for total action without interruption. TRY IT NOW FOR FREE!


Newsletter | Submit Content | About | Advertising | Awards | Contact Us | Link to us |
© 1996-2008 Community Networks Ltd All rights reserved. Reproduction in whole or in part, in any form or medium without express written permission is prohibited. Violators of this policy may be subject to legal action. Please read Terms Of Use and Privacy Statement for more information. Development by Synchron Data - .NET development.