Article

This is in continuation to Overriding the Equals() and GetHashCode() methods in C# - Part 1

System.Object.GetHashCode() virtual method

This method returns a number for an object that assigned by the .NET CLR. The number known as hash code is of type int. It uniquely identifies the object within an application domain and does not change for the lifetime of the object.

The major benefit is that a string, or any other type, can be represented as a unique number. Another benefit is that you can place type of an object into a hash table collection. Let us discuss the Hashtable type first.

System.Collections.Hashtable class

You use this class when you want to store information indexed by keys. It lets you access information using a key, just like you index an array by an ordinal value. It is essentially used when you want to have a fast indexed search.

A Hashtable is actually a table with two columns. The first is the key and the second is its associated value. Both key and value are objects. You can use any kind of object as a value and as a key. It can be an int, string, an object of a user-defined type etc. The keys must be unique, you cannot have two pairs with the same key.

General meaning of Hashing: It is the transformation of a string of characters into a usually shorter fixed-length value. This value is called the hash value. This value represents the original string. It is useful in transforming a long, alphanumeric key. The hashing algorithm is called the hash function.

A hash table consists of a certain number of buckets. Each bucket stores an arbitrary number of key-value pairs. Each bucket has a number that identifies it, this is much like an index that identifies an array element.

Adding a key-value pair to a Hashtable object

Hashtable internally calls GetHashCode() on the provided key. The objects used as keys must provide the GetHashCode() method. This method returns a hash code, the contents of the key is used to determine the hash code. This returned hash code is further used to identify a bucket. A key-value pair is stored in that bucket.

Let's discuss how the process of translating a key into a bucket number is carried out in the Hash function of the Hashtable class. The hash function in Hashtable class is defined as follows:

Hash value (key) = [GetHashCode(key) + 1 + (((GetHashCode(key) >> 5) + 1) % (size - 1))] % size

where size is the total number of buckets in the hash table. GetHashCode(key) is a number returned by a call to GetHashCode() on a key object.

A series of arithmetic operations are applied to produce the final result in an acceptable range. The resulting hash value will be between 0 and size-1. It is because a % b returns the remainder of a/b and the value of a/b is always between 0 and b-1. Therefore the resulting hash value will always lie within the range of available buckets.

Example

Hashing algorithm for a string object might add up the Unicode values of each character in a string, followed by value%size to get a hash value between 0 and size-1 where size is the number of buckets in Hashtable.

Obtaining a value from the Hashtable

The GetHashCode() method is called on the key that we provide. The returned hash code is used to find the right bucket. The bucket is now looked for that key. The value associated with the key is retrieved. This search is typically very fast because only one particular bucket needs to be searched.

Example

Hashtable table = new Hashtable();

//Add elements
table.Add("Australia", "Dollar");
table.Add("India", "Rupees");
table.Add("Japan", "Yen");

//Get a collection of the Hashtable keys using the Keys property
ICollection keysCollection = table.Keys;

//Iterate through the collection of keys to obtain the values
foreach (string country in keysCollection)
System.Console.WriteLine("The currencies are: {0}", table[country]);

GetHashCode() Contd..

The GetHashCode() is actually intended to be overridden. Override it using a custom algorithm designed specifically for your application and it should return a unique object identifier.

Microsoft documentation is full of warnings about using this method without overriding it. The default implementation is not particularly useful for hashing. It does not guarantee uniqueness. The returned hash code must not be used as a unique object identifier for hashing purposes. It is possible for two different objects to return the same value. After an object is garbage collected, its unique number can be reused as a hash code for a new object.

Few tips when you override GetHashCode()

a. Since it returns a hash code for instances of your type, you should use at least one instance field.

b. It should be based on an immutable field. You should be able to return the same hash code even if any changes are made to the underlying object.

c. For best performance, you should generate a random distribution for all input.

d. Its execution should be fast enough. In turn, it significantly affects the performance of a hash table. When you perform add and retrieve operations to the Hashtable, it should take constant time to search an element.

e. It must not result in circular references. If the Class X's GetHashCode() calls the Class Y's, the Class Y's must not call the Class X's either directly or indirectly.

f. It should not throw exceptions.

g. If you override Equals(), you must also override GetHashCode(). You must make sure that the objects considered equal should return the same hash code. This is the requirement of the Hashtable class, otherwise Hashtable might not work properly.

Example

class Student {
    public override int GetHashCode() {
        return id;
    }
    ...
}

This is the easiest implementation that the GetHashCode() method can get i.e. simply return a value. It is based on the assumption that the id field is immutable.

The Student instances are considered equal if they have have the same id value (refer to the Student class code). The GetHashCode() ensures that the same hash code will be returned for equal instances because we are simply returning the id value.

Using a Student object as a key, if we want to add a key-value pair in a Hashtable, then the Hashtable will internally invoke GetHashCode() method of the Student class. The returned hash code will be used to find the bucket into which the key-value pair gets stored.

Example

Let's discuss how the String class implements GetHashCode(). It returns a unique hash code for unique strings. If two String objects represent the same string value, GetHashCode() method returns the same hash code for them.

It uses all the characters in a string to generate a randomly-distributed output. It returns a well-distributed range of values, even when the input is in certain ranges, for instance, an input string may contain only the first 128 ASCII characters. Note that a C# string can include any of the 65,535 Unicode characters.