Sorting Kata - Merge Sort

In the setup we created a solution with two projects. The purpose of these two projects was for a Sorting Kata.

In Linq OrderBy we added a Bookshelf class to hold our sorting methods.

In List Sort we used the built-in sort method of a list. This required us to implement IComparable in the Book class.

In Bubble Sort we overloaded the < and > operators so that books could be sorted through a simple sort.

Here is our Book class.

BookSorting/Book.cs
using System;

namespace BookSorting
{
    public class Book : IComparable
    {
        public Author Author { get; set; }

        public string Title { get; set; }

        public override string ToString()
        {
            return string.Format(
                "{0}, {1}. {2}",
                Author.LastName,
                Author.FirstName,
                Title);
        }

        public int CompareTo(object obj)
        {
            // By convention put the nulls at the beginning of the sort order.
            if (obj == null)
            {
                return 1;
            }

            var comparedBook = obj as Book;

            if (comparedBook == null)
            {
                throw new ArgumentException("Could not sort the object because it is not a book.");
            }

            if (Author == null
                || (Author.LastName.Equals(comparedBook.Author.LastName, 
                    StringComparison.OrdinalIgnoreCase)

                    && Author.FirstName.Equals(comparedBook.Author.FirstName, 
                        StringComparison.OrdinalIgnoreCase)))
            {
                return string.Compare(Title, comparedBook.Title, StringComparison.OrdinalIgnoreCase);
            }

            if (Author.LastName.Equals(comparedBook.Author.LastName, StringComparison.OrdinalIgnoreCase))
            {
                return string.Compare(
                    Author.FirstName, comparedBook.Author.FirstName, StringComparison.OrdinalIgnoreCase);
            }

            return string.Compare(Author.LastName, 
                comparedBook.Author.LastName, 
                StringComparison.OrdinalIgnoreCase);
        }

        public static bool operator >(Book book, Book comparedBook)
        {
            return comparedBook.CompareTo(book) > 0;
        }

        public static bool operator <(Book book, Book comparedBook)
        {
            return comparedBook.CompareTo(book) < 0;
        }
    }
}

Now we are going to write our own Merge Sort method.

We write a failing test first.

BookSorting.Test/SortingTest.cs
using System.Collections.Generic;
using System.Linq;

using NUnit.Framework;

namespace BookSorting.Test
{
    [TestFixture]
    public class SortingTest
    {
        [Test]
        public void Authors_SortedByLastName()
        {
            var book1 = new Book
                {
                    Author = new Author
                        {
                            LastName = "Partnoy",
                            FirstName = "Frank",
                        },
                    Title = "Wait: The Art and Science of Delay",
                };

            var book2 = new Book
                {
                    Author = new Author
                        {
                            LastName = "Watt",
                            FirstName = "Andrew",
                        },
                    Title = "Beginning Regular Expressions",
                };

            var book3 = new Book
                {
                    Author = new Author
                        {
                            LastName = "Weinberg",
                            FirstName = "Steven",
                        },
                    Title = "Cosmology",
                };

            var books = new List<Book>
                {
                    book2,
                    book3,
                    book1,
                };

            var bookshelf = new Bookshelf();

            books = bookshelf. MergeSort(books);

            Assert.AreEqual(book1, books.First());
            Assert.AreEqual(book3, books.Last());
        }
    }
}

We use CTRL+. to add the MergeSort method that accepts the list of books. Our goal is for the method to return a sorted list. Then we can assert that the correct book appears first.

Running the test causes the following error:

1 test failed: System.NotImplementedException

Hurray, the test failed. We use the Red, Green, Refactor pattern from Kent Beck's book, Test Driven Design By Example.

The problem is, it failed for the wrong reason. We want to see a failure because the books are not sorted. Let's take a look at the Bookshelf class.

BookSorting/Bookshelf.cs
using System.Collections.Generic;

namespace BookSorting
{
    public class Bookshelf
    {
        public List<Book> MergeSort(List<Book> books)
        {
            throw new System.NotImplementedException();
        }
    }
}

Well, that explains our "NotImplementedException".

So, what is it going to take to get this test to fail for the right reason? Let's take a look at Uncle Bob's Transformation Priority Premise. The first transformation we should try is "nil". Can we get the test to fail by doing nothing? No, our code will not compile unless we return something. The next step is a constant. Let's just return the books list as is.

BookSorting/Bookshelf.cs
using System.Collections.Generic;

namespace BookSorting
{
    public class Bookshelf
    {
        public List<Book> MergeSort(List<Book> books)
        {
            return books;
        }
    }
}

1 test failed Expected: Partnoy, Frank. Wait: The Art and Science of Delay But was: Watt, Andrew. Beginning Regular Expressions

In Linq OrderBy we overrode the Book.ToString() to get the author and title.

We saw a failing test (Red), and we made sure the message was helpful. Now we can make it green.

The Merge Sort algorithm defined on Sorting-Algorithms.com is:

Merge Sort
# split in half
m = n / 2

# recursive sorts
sort a[1..m]
sort a[m+1..n]

# merge sorted sub-arrays using temp array
b = copy of a[1..m]
i = 1, j = m+1, k = 1
while i <= m and j <= n,
    a[k++] = (a[j] < b[i]) ? a[j++] : b[i++]
    -> invariant: a[1..k] in final position
while i <= m,
    a[k++] = b[i++]
    -> invariant: a[1..k] in final position

This is a big one. So far our sorts have dealt with taking a single book off the shelf and bubbling it up to the top, inserting it, or selecting it for placement. Up until now it would be (with a little practice) easy to understand what each algorithm is doing.

With merge sort I really had to do some investigation. I read other websites, and I even watched a great YouTube video on it. I finally understand what it is doing.

Another difficult thing we encounter is recursion. It's a good thing we have been doing our Recursion Kata.

Let's translate this into C#. Instead of b, i, k, and n, I'm going to give the variables longer names.

BookSorting/Bookshelf.cs
using System.Collections.Generic;

namespace BookSorting
{
    public class Bookshelf
    {
        public List<Book> MergeSort(List<Book> books)
        {
            var bookCount = books.Count;

            if ( bookCount < 2 )
            {
                return books;
            }

            var middle = bookCount / 2;

            var firstHalf = books.GetRange( 0, middle);
            var secondHalf = books.GetRange( middle, bookCount - middle );

            firstHalf = MergeSort( firstHalf );
            secondHalf = MergeSort( secondHalf );

            var firstHalfIndex = 0;
            var secondHalfIndex = 0;
            var finalIndex = 0;

            while ( firstHalfIndex < middle 
                && secondHalfIndex < bookCount - middle )
            {
                if ( firstHalf[firstHalfIndex] > 
                    secondHalf[secondHalfIndex] )
                {
                    books[finalIndex] = firstHalf[firstHalfIndex];
                    finalIndex++;
                    firstHalfIndex++;
                }
                else
                {
                    books[finalIndex] = secondHalf[secondHalfIndex];
                    finalIndex++;
                    secondHalfIndex++;
                }
            }

            while ( firstHalfIndex < middle )
            {
                books[finalIndex] = firstHalf[firstHalfIndex];
                finalIndex++;
                firstHalfIndex++;
            }

            while ( secondHalfIndex < bookCount - middle )
            {
                books[finalIndex] = secondHalf[secondHalfIndex];
                finalIndex++;
                secondHalfIndex++;
            }

            return books;
        }
    }
}

Variables

We used a lot of variables in this algorithm.

We stored the count of books in the bookCount variable. Each time we recursively call the method, the count gets smaller. Once it reaches less than two, we return. This if statement keeps from infinitely looping.

Next, we split the books into two piles. The index of the middle is stored in a variable. We will use this to divide the piles of books and go through each pile.

We have a firstHalf and a secondHalf of books. Each gets merge sorted until we are comparing only one book. Eventually, the reoccurrence stops and we continue below the recursion to actually compare books.

We initialize three index variables:

  1. firstHalfIndex keeps track of where we are in the first half of the books.
  2. secondHalfIndex keeps track of where we are in the second half of the books.
  3. finalIndex keeps us moving through each book in both stacks.

First Loop

In the first loop we look at every book from both piles and compare them to each other. If the first book from the first pile should come before the first book in the second pile, we place the first pile book into our final, sorted list. We increment the first and final indexes so that we keep moving through the pile.

If the first book from the first pile should come after the first book in the second pile, we place the second pile book into our final, sorted list. We increment the second and final indexes so that we keep moving through the pile.

Additional Loops

At the end of the first loop we might have some books left over in either the first or second pile. We put these left over books on the end of the final, sorted book list.

Finally, we return the sorted list.

When we run the test we see:

1 test passing

Conclusion

I recommend setting a breakpoint and watching the list of books change order.

Please contact me about this Kata. I would love to get your feedback.

Other sorting methods: