LINQ

Summary

Introduction

Currently there is a complexity of accessing and integrating information that is not natively defined using OO technology. The two most common sources of non-OO information are relational databases and XML. The purpose of LINQ is therefore, to add general-purpose query facilities to the .NET Framework that apply to all sources of information, not just relational or XML data. This facility is called .NET Language-Integrated Query (LINQ).

.NET Language-Integrated Query defines a set of general-purpose standard query operators that allow traversal, filter, and projection operations to be expressed in a direct yet declarative way in any .NET-based programming language. The .NET standard query operators allow queries to be applied to any IEnumerable<T>-based information source.

The .NET standard query operators are defined as extension methods in the type System.Linq.Enumerable. When examining the standard query operators, you'll notice that all but a few of them are defined in terms of the IEnumerable<T> interface. This means that every IEnumerable<T>-compatible information source gets the standard query operators simply by adding the following using statement in C#:

using System.Linq;         // makes query operators visible

Basic LINQ

A query expression operates on one or more information sources by applying one or more query operators from either the standard query operators or domain-specific operators. This expression uses three of the standard query operators: Where, OrderBy, and Select. The following example illustrates basic LINQ functionalities:

public void Tests_Basics()

{           

    // Create an array of strings. Note use of collection initializer

    string[] technologies = { "ADO.NET", "WCF", "WPF", "WWF", "LINQ" };

 

    /* LINQ - declarative query */

    // Initialize the local variable "query" with a query expression.

    IEnumerable<string> query1 = from tech in technologies where tech.StartsWith("W") select tech;

 

    // Iterate over results of query1

    foreach (string technology in query1) Trace.WriteLine(technology);

 

    /* LINQ - method-based query */

    // Query expressions are convenient DECLARATIVE shorthand over code you could write manually.

    // The query expression above is semantically identical to the statement below. This form of

    // query is called a method-based query. The arguments to the Where, OrderBy, and Select

    // operators are called lambda expressions, which are fragments of code much like delegates

    IEnumerable<string> query2 = technologies.Where(tech => tech.StartsWith("W")).Select(tech => tech);

 

    // Iterate over results of query2

    foreach (string technology in query2) Trace.WriteLine(technology);

 

    /* Other approaches */

    Func<string, bool> filter = delegate(string s)

        {

            return s.StartsWith("W");

        };

 

    Func<string, string> select = delegate(string s)

        {

            return s;

        };

    IEnumerable<string> query3 = technologies.Where(filter).Select(select);

 

    // Iterate over results of query3

    foreach (string technology in query3) Trace.WriteLine(technology);

 

    /* OBJECT INITIALIZERS */

 

    // The following query creates a new Technology object for each value in the input sequence:

    IEnumerable<Technology> query4 = technologies.Select(tech => new Technology { Description = tech, ReleaseDate = DateTime.MinValue });

    foreach (Technology tech in query4) Trace.WriteLine("Desc: " + tech.Description + "/Release Date: " + tech.ReleaseDate);

}

 

// a helper class to illustrate how object initializers can be used to create new objects

internal class Technology

{

    private DateTime _dtRelease;

    private string _desc;

 

    public string Description

    {

        get { return _desc; }

        set { _desc = value; }

    }

    public DateTime ReleaseDate

    {

        get { return _dtRelease; }

        set { _dtRelease = value; }

    }       

}

Expression Trees

Expression trees are efficient in-memory data representations of lambda expressions and make the structure of the expression transparent and explicit. The namespace System.Linq.Expressions defines a distinguished generic type, Expression<T>, which indicates that an expression tree is desired for a given lambda expression rather than a traditional IL-based method body.
 

The determination of whether the compiler will emit executable IL or an expression tree is determined by how the lambda expression is used. When a lambda expression is assigned to a variable, field, or parameter whose type is a delegate, the compiler emits IL that is identical to that of an anonymous method. When a lambda expression is assigned to a variable, field, or parameter whose type is Expression<T> for some delegate type T, the compiler emits an expression tree instead. The following example illustrates:

public void Test_ExpressionTrees()

{

    // Create an executable lambda expression

    System.Func<string, int> f = name => name.Length;

    int nLength = f("Yazan");           // returns 5

 

    // Create a non-executable lambda expression

    System.Linq.Expressions.Expression<Func<string, int>> e = name => name.Length;

    //int nLength2 = e("Yazan");          //  error CS0118: 'e' is a 'variable' but is used like a 'method'

 

    // Since 'e' is an expression tree, we can interact with it like any other data structure

    Trace.WriteLine(e.Body);            // name.Length

}

Extension Methods

The standard query operators are defined as extension methods in the type System.Linq.Enumerable. Extension methods are given the lowest priority in terms of resolution and are only used if there is no suitable match on the target type and its base types. This allows user-defined types to provide their own query operators that take precedence over the standard operators. The following example illustrates:

public void Test_ExtensionMethods()

{

    // Create and initialize the object

    MySequence seq = new MySequence();

 

    // Iterate over the object (illustrates the implemenation of an iterator with 'yield return')

    // Prints all numbers from 1 to 10

    foreach (int i in seq)

        Trace.WriteLine(i);

 

    // Now use our own implementation of the 'Where' standard query operator, and not the extension

    // method, as instance methods take precedence over extension methods

    // Prints all numbers from 6 to 10

    foreach ( int number in seq.Where(item => item > 5))

        Trace.WriteLine( number);

}

 

// Helper class that implements an iterator to show that standard-query operators

// are implemented as extension methods, and as such, have lower precedence than

// instance methods

internal class MySequence : IEnumerable

{

    // Internal collection

    private List<int> _lstData = new List<int> { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };

 

    // Implement an iterator. Recall that an iterator allows you to support foreach

    // iteration without having to implement IEnumerable on the containing class.

    // The iterator must be called GetEnumerator if you want to use MySequence directly

    // in a foreach statement, and the iterator must use 'yield return' statement

    // to return each successive element:

    //  MySequence se = new MySequence();

    //  foreach (int n in seq)              // Calls seq.GetEnumerator for each pass

    //      Trace.WriteLine( n )

    public IEnumerator<int> GetEnumerator()

    {           

        for (int i = 0; i < _lstData.Count; i++)

            yield return _lstData[i];

    }

 

    IEnumerator IEnumerable.GetEnumerator()

    {

        return this.GetEnumerator();

    }

 

    // Provide our own implementation of the Where standard query operator

    public IEnumerable<int> Where(Func<int, bool> selector)

    {

        for (int i = 0; i < _lstData.Count; i++)

        {

            // Return this item if it passes the selector function

            if (selector(_lstData[i] ))

                yield return _lstData[i];

        }

    }

}

OfType LINQ Operator

The OfType operator is one of the few standard query operators that doesn't extend an IEnumerable<T>-based information source. OfType accepts not only IEnumerable<T>-based sources, but also sources that are written against the non-parameterized IEnumerable interface that was present in version 1.0 of the .NET Framework. Typically you use OfType to convert a collection that was written against the non-parameterized IEnumerable (such as ArrayList) to an IEnumerable<T>-based collection that can then be used with LINQ operators. The following example illustrates:

public void Test_OfType()

{

    // Say we have an 'old-style' collection from the old days of .NET 1.0

    ArrayList alDoesNotSupportLINQ = new ArrayList { 1, 2, 3, 4 };

 

    // We cannot use 'alDoesNotSupportLINQ' with LINQ operators. Use OfType operator to

    // convert 'alDoesNotSupportLINQ' to an IEnumerable<T>-based collection

    IEnumerable<int> alSupportsLINQ = alDoesNotSupportLINQ.OfType<int>();

 

    // The new collection now supports LINQ operators. The following prints

    // only even numbers, 2 and 4

    foreach (int i in alSupportsLINQ.Where(item => (item % 2) == 0))

        Trace.WriteLine(i);

 

    // The OfType operator is also useful for extracts a specific type from a heterogeneous array.

    // OfType simply omits members of the original sequence that are not compatible with the given

    // type argument. For example, the following OfType extracts only integers:

    object[] aHetergenous = { 1, "Two", 3.0, 4, "Five", 6.0 };

    IEnumerable<int> aIntegersOnly = aHetergenous.OfType<int>();

    foreach (int i in aIntegersOnly) Trace.WriteLine(i);            // Writes 1 and 4 only

}

Deferred Query Evaluation

All standard query operator are implemented using the yield construct. This implementation technique is common for all the standard operators that return sequences of values. The use of yield has an interesting benefit which is that the query is not actually evaluated until it is iterated over. This deferred evaluation allows queries to be kept as IEnumerable<T>-based values that can be evaluated multiple times, each time yielding potentially different results.

To indicate that a cached copy of the results is needed, we can simply append a ToList or ToArray operator to the query like this:

IEnumerable<int> q = lstData.Where(item => item > 3).ToArray();

Both ToArray and ToList force immediate query evaluation. The same is true for the standard query operators that return singleton values (for example: First, ElementAt, Sum, Average, All, Any). The following example illustrates:

public void Test_DeferredQueryEvaluation()

{

    /* NON-CACHED QUERY */

 

    // Create and initialize a test list

    List<int> lstData = new List<int> { 1, 2, 3, 4, 5 };

 

    // Declare a query that selects all values greater than 3. Then iterate over the

    // results of the query. Note that output is 4 and 5

    IEnumerable<int> query1 = lstData.Where(item => item > 3);

    foreach (int n in query1) Trace.WriteLine(n);           // 4 and 5

 

    // Now Append lstData with more info. And iterate over it. Note that iteration

    // over query1 results shows the new values. This effectively means that the

    // the query is not actually evaluated until it is iterated over.

    lstData.Add(6);

    lstData.Add(7);

    lstData.Add(8);

    foreach (int n in query1) Trace.WriteLine(n);           // 4,5,6,7,8

 

    /* CACHED QUERY */

    // Create and initialize a test list

    List<int> lstData2 = new List<int> { 1, 2, 3, 4, 5 };

 

    // Declare a query that selects all values greater than 3. Note the use of ToArray().

    // Then iterate over the results of the query. Note that output is 4 and 5

    IEnumerable<int> query2 = lstData2.Where(item => item > 3).ToArray();

    foreach (int n in query2) Trace.WriteLine(n);           // 4 and 5

 

    // Now Append lstData with more info. And iterate over it. Note that iteration

    // over query2 results does not show new values. This effectively means that the

    // the query was evaluated and cached.

    lstData2.Add(6);

    lstData2.Add(7);

    lstData2.Add(8);

    foreach (int n in query2) Trace.WriteLine(n);           // 4 and 5

}

.NET Standard Query Operators

Sorting and Grouping

The evaluation of a query results in a sequence of values that are produced in some order that is intrinsic in the underlying information sources. To give explicit control over the order in which these values are produced, standard query operators are defined for controlling the order. The most basic of these operators is the OrderBy operator. The following example illustrates:
 

public void TestSortingAndGrouping()

{

    // Test collection

    string[] Alphabet = { "A", "B", "C", "D", "E", "F", };

 

    // General sort

    IOrderedEnumerable<string> order1 = Alphabet.OrderBy((string letter) => letter);

    WriteCollection(order1);                // A, B, C, D, E, F

 

    // Descending sort order

    IOrderedEnumerable<string> order2 = Alphabet.OrderByDescending((string letter) => letter);

    WriteCollection(order2);            // F, E, D, C, B, A

 

    // To allow multiple sort criteria, both OrderBy and OrderByDescending return

    // OrderedSequence<T> rather than the generic IEnumerable<T>. Two operators are

    // defined only on OrderedSequence<T>, namely ThenBy and ThenByDescending which

    // apply an additional (subordinate) sort criterion. ThenBy/ThenByDescending

    // themselves return OrderedSequence<T>, allowing any number of ThenBy/ThenByDescending

    // operators to be applied

    List<Person> lstPersons = new List<Person> {

                    new Person {ID=1, Name="A"},

                    new Person {ID=1, Name="B"},

                    new Person {ID=1, Name="C"},

                    new Person {ID=2, Name="A"},

                    new Person {ID=2, Name="B"},

                    new Person {ID=2, Name="C"}};

    IOrderedEnumerable<Person> order3 = lstPersons.OrderBy((Person person) => person.Name).ThenBy((Person person) => person.ID);

    WriteCollection(order3);            // A/1, A/2, B/1, B/2, C/1, C/2

 

    // LINQ also include the GroupBy operator, which imposes a partitioning over a

    // sequence of values based on a key extraction function. The GroupBy operator

    // returns a sequence of IGrouping values, one for each distinct key value that

    // was encountered. An IGrouping is an IEnumerable that additionally contains the

    // key that was used to extract its contents. Note how the foreach statement

    // prints data in each group

    IEnumerable<IGrouping<int, Person>> group1 = lstPersons.GroupBy((Person person) => person.ID);

 

    // Output of the following foreach is:

    //  Group Key: 1

    //      Person: 1/A

    //      Person: 1/B

    //      Person: 1/C

    // Group Key: 2

    //      Person: 2/A

    //      Person: 2/B

    //      Person: 2/C

    foreach (IGrouping<int, Person> group in group1)

    {

        // Ouput information on the current group

        Trace.WriteLine( "Group Key: " + group.Key);

        foreach (Person p in group)

            Trace.WriteLine("\tPerson: " + p.ID + "/" + p.Name);

    }

}

 

// Helper method to write contents of a given colleciton

private void WriteCollection<T>(IEnumerable<T> collection)

{

    bool bStart = true;

    foreach (T item in collection)

    {

        if (!bStart) Trace.Write( ", ");

        Trace.Write(item);

        if (bStart == true) bStart = false;               

    }

    Trace.WriteLine(System.Environment.NewLine);

}

 

// Helper class to illustrate sorting/grouping/etc.

internal class Person

{

    // Data members

    private int _id;

    private string _name;

 

    // Properties

    public int ID

    {

        get { return _id; }

        set { _id = value; }

    }

 

    public string Name

    {

        get { return _name; }

        set {_name = value; }

    }

 

    public override string ToString()

    {

        return Name + "/" + ID;

    }

}

Aggregation

Aggregation operators are used to aggregate a sequence of values into a single value. The most general aggregation operator is Aggregate, which makes it simple to perform a calculation over a sequence of values. Aggregate works by calling the lambda expression once for each member of the underlying sequence. Each time Aggregate calls the lambda expression, it passes both the member from the sequence and an aggregated value (the initial value is the seed parameter to Aggregate). The result of the lambda expression replaces the previous aggregated value, and Aggregate returns the final result of the lambda expression.

In addition to the general purpose Aggregate operator, the standard query operators also include a general purpose Count operator and four numeric aggregation operators (Min, Max, Sum, and Average). The following example illustrates:

public void TestAggregating()

{

    int[] numbers =  {1,2,3,4,5,6,7};

    int nSum = numbers.Sum( );

    double dAvg = numbers.Average();

 

    // Use of Aggregate to simulate sum

    int nSum2 = numbers.Aggregate(0, (int sum, int nCurrent) => nCurrent + nSum);

    int nSum3 = numbers.Aggregate(aggFunc);

}

 

private int aggFunc(int nSum, int nCurrent)

{

    Trace.WriteLine(nCurrent + "/" + nCurrent);

    return nSum + nCurrent;

}

Selecting Items

Selecting items within a collection is achieved via the Select operator which produces one value for each value in the source sequence. The following example illustrates:

public void TestSelect()

{

    // Test data

    string[] technologies = { "WCF", "WPF", "WWF", "LINQ"};

 

    // If the select function returns a value that is itself a sequence, it is up to

    // the consumer to traverse the sub-sequences manually. The statements output

    //  Current string: WCF

    //  W C F

    //

    //  Current string: WPF

    //  W P F

    //

    //  Current string: WWF

    //  W W F

    //

    //  Current string: LINQ

    //  L I N Q

    //

    IEnumerable<char[]> characters1 = technologies.Select((string input) => input.ToCharArray());

    foreach (char[] chars in characters1)

    {

        Trace.WriteLine("Current string: " + new string(chars));

        foreach (char c in chars)

            Trace.Write(c + " ");

        Trace.WriteLine(System.Environment.NewLine);

    }

 

    // The SelectMany operator works similarly to the Select operator. It differs in that the transform

    // function is expected to return a sequence that is then expanded by the SelectMany operator

    // The following statement outputs the following

    // W C F W P F W W F L I N Q

    IEnumerable<char> characters2 = technologies.SelectMany((string input) => input.ToCharArray());

    foreach (char c in characters2)

        Trace.Write(c + " ");

    Trace.WriteLine(System.Environment.NewLine);

}

Joining Results

The concept of Join refers to the operation of bringing the elements of a sequence together with the elements they "match up with" from another sequence. A more powerful cousin of Join is the GroupJoin operator. GroupJoin differs from Join in the way the result-shaping lambda expression is used: Instead of being invoked with each individual pair of outer and inner elements, it will be called only once for each outer element, with a sequence of all of the inner elements that match that outer element. The following example illustrates:

public void TestJoin()

{

    // JOIN

    {

        // Create two spearate sequences

        Customer[] customers = { new Customer{ ID=1, Name="A"},

                            new Customer{ ID=2, Name="B"},

                            new Customer{ ID=3, Name="C"}

                           };

        Order[] orders = { new Order {CustomerID=1, Details="Details1" },

                       new Order {CustomerID=2, Details="Details2" },

                       new Order {CustomerID=3, Details="Details3" },

                    };

 

        // The join operator is best understood by using the declarative syntax first.

        // Note that the join operator performs an inner join:  given sequences s1 and s2,

        // an inner join of s1 on s2, returns all elements in s2 whose id matches those

        // in s1. In the code below, 'customers' collection is referred to as 'outer', while

        // 'order' is referred to as 'inner'

        // DECLARATIVE SYNTAX

        // Output:

        //  1/A/Details1

        //  2/B/Details2

        //  3/C/Details3

        var customerOrders1 = from c in customers

                              join o in orders on c.ID equals o.CustomerID

                              select (new { c.Name, c.ID, o.Details });

        foreach (var custorder in customerOrders1)

            Trace.WriteLine(custorder.ID + "/" + custorder.Name + "/" + custorder.Details);

 

 

        // METHOD CALL

        // Output:

        //  1/A/Details1

        //  2/B/Details2

        //  3/C/Details3

        var customerOrders2 = customers.Join(orders,

                                            (Customer c) => c.ID,

                                            (Order o) => o.CustomerID,

                                            (Customer c, Order o) => new { c.Name, c.ID, o.Details }

                                            );

        foreach (var custorder in customerOrders2)

            Trace.WriteLine(custorder.ID + "/" + custorder.Name + "/" + custorder.Details);

    }

 

    // GROUPJOIN

    {

        // GroupJoin operator produces hirarchical data: outer elements paired with sequences

        // of matching inner elements. It has no direct equivalent in traditional relational

        // databases

 

        // Create two spearate sequences. A customer may have many orders

        Customer[] customers = { new Customer{ ID=1, Name="A"},

                                new Customer{ ID=2, Name="B"}};

        Order[] orders = { new Order {CustomerID = 1, Details="Details1", OrderTotal=1.1},

                           new Order {CustomerID = 1, Details="Details2", OrderTotal=2.2},

                           new Order {CustomerID = 1, Details="Details3", OrderTotal=3.3 },

                           new Order {CustomerID = 2, Details="Details3", OrderTotal=4.4 },

                           new Order {CustomerID = 2, Details="Details3", OrderTotal=5.5 }

                        };

 

        // Use join operator to group each customer with hir/her orders into co

        // where co is an anonymous type. The following effecticely creates a collection

        // where each entry in the collection contains the customers ID and the customers

        // collection of orders (identified by co)

        var customerToOrders = from c in customers

                               join o in orders on c.ID equals o.CustomerID into co

                               select (new { c.ID, Orders = co });

 

        // Print orders for each customer. vCustomerOrders is an anonymous type

        // containing an int property named ID and an enumerable property named Orders

        // The following prints:

        //  Orders for customer : 1

        //  CustomerID: 1 / Details: Details1 / OrderTotal: 1.1

        //  CustomerID: 1 / Details: Details2 / OrderTotal: 2.2

        //  CustomerID: 1 / Details: Details3 / OrderTotal: 3.3

        //  Orders for customer : 2

        //  CustomerID: 2 / Details: Details3 / OrderTotal: 4.4

        //  CustomerID: 2 / Details: Details3 / OrderTotal: 5.5

        foreach (var vCustomerOrders in customerToOrders)

        {

            Trace.WriteLine( "Orders for customer : " + vCustomerOrders.ID);

            foreach (Order o in vCustomerOrders.Orders)

                Trace.WriteLine(o.ToString());                   

        }

 

        // The following query is similar to the one above, except that it uses

        // each customers Order collection to calculate the order total

        var customerToOrders2 = from c in customers

                               join o in orders on c.ID equals o.CustomerID into co

                               select new { c.Name, Total = co.Sum( (Order o) => o.OrderTotal) };

 

        // The following prints order totals for each customer:

        //  A/6.6

        //  B/9.9

        foreach (var v in customerToOrders2)

            Trace.WriteLine(v.Name + "/" + v.Total);

    }

}

 

// Helper classs to illustrate joining/group joining

internal class Customer

{

    // Data members

    private int _id;

    private string _name;       

 

    // Properties

    public int ID

    {

        get { return _id; }

        set { _id = value; }

    }

 

    public string Name

    {

        get { return _name; }

        set { _name = value; }

    }

 

    public override string ToString()

    {

        return "Name: " + Name + "/ID: " + ID;

    }

}

 

internal class Order

{

    // Data members

    private int _id;

    private string _details;

    private double _dOrderTotal;

 

    // Properties

    public int CustomerID

    {

        get { return _id; }

        set { _id = value; }

    }

    public string Details

    {

        get { return _details; }

        set { _details = value; }

    }

    public double OrderTotal

    {

        get { return _dOrderTotal; }

        set { _dOrderTotal = value; }

    }

 

    // override to get string representation of the object

    public override string ToString()

    {           

        return "CustomerID: " + CustomerID + " / Details: " + Details + " / OrderTotal: " + OrderTotal;

    }

}

LINQ to SQL

Introduction

Modern programming languages define information in the form of objects. Relational databases use rows. Objects have unique identity as each instance is physically different from another. Rows are identified by primary key values. Objects have references that identify and link instances together. Rows are left intentionally distinct requiring related rows to be loosely tied together using foreign keys. Objects stand alone, existing as long as they are still referenced by another object. Rows exist as elements of tables, vanishing as soon as they are removed.

The best solutions so far have been elaborate database abstraction layers that transfer information between the applications domain-specific object-models and the tabular representation of the database, reshaping and reformatting the data each way. Yet by obscuring the true data source, these solutions end up throwing away the most compelling feature of relational databases; the ability for the data to be queried.

LINQ to SQL is a language-agnostic runtime-engine for managing relational data as objects without losing the ability to query. It does this by translating language-integrated queries (i.e., queries written using a programming language such as C#) into SQL for execution by the database, and then translating the tabular results back into objects you define. Your application is then free to manipulate the objects while LINQ to SQL stays in the background tracking your changes automatically.

Note that LINQ to SQL does not actually execute queries; the relational database does. LINQ to SQL translates the queries you wrote into equivalent SQL queries and sends them to the server for processing. Because execution is deferred, LINQ to SQL is able to examine you