Hash Functions/Tables

JC0724

Weaksauce
Joined
Oct 11, 2008
Messages
105
So I am trying to understand Hash Functions and Tables, I know there are different types like direct and Open addressing. What is the point of hash functions and tables, how do they work in the real world or what are they used for? Also a Hash function is supposed to be like a Dictionary?

I don't understand if the table is an Array and the Array slots have linked list in them or something like that? From what I have been reading there is a Universal Set, where it has some keys which picks the slots.

Is the Universal set an array with another array of keys in it of numbers?

Can you actually put a linked list into an Array slot?

Can somebody right some basic puesdo code to kinda show me how a basic hash function and Table work with the Universal set and Keys?
 
A Hash function takes any input and reduces it to a fixed (smaller) hashvalue in a repeatable way.

It could be complicated like MD5. It could be as simple as odd/even.

oddhash(x) = x % 2;

A keyspace of 2 is very small and will yield lots of collisions(items with the same hashvalue). Which is why the array in a hashtable has linked lists, to support the multiple values that can hash into the same spot in the array.

When looking for the key in the hashtable, you hash the key, find the slot in the array, then iterate through the items in the linked list to see if they match your key(not the hash, the whole linked list matches the hash). If there are many items in the linked list, the search is just as inefficient since you're just linearly searching a linked list.


Hashtables in programming are very useful, since they allow you to store key,value pairs, and retrieve them based on the key in a fairly fast fashion. The array/linked list part is all hidden from your usage, but the author of the hashtable library has to worry about it.

Practical implementations of hashtables have a concept of how full they are getting, and will automagically expand the internal array to reshuffle the items so that the linked lists never get very long. However there are security/performance implications if a bad person can cause input items that all hash to the exact same value, they can cause the performance to drop, since the code is reduced to a linear search through the linked list.
 
In a nutshell, a hash table is an unordered array where each element of the array has a mathematical relationship to the array index it resides. The benefit of using a hash table is the O(1) time complexity, which is as good as you can get with a data structure.
 
So I am trying to figure out how to create an dynamic array of linked list for my hash table. I am using a basic linked list below that I got out of one of my C++ books. Is this the correct way to do it? How do I initialize each pointer to NULL?

What I am confused about is most example I keep finding online about creating a Hash table, it seems like they are making the array out of the pointer class not the linked list class, this is very confusing to me.

Code:
//My example

#include <iostream>
using namespace std;

class IntNode {
    public:
        IntNode(){}
        IntNode(int theData, IntNode* theLink) : data(theData),link(theLink) {}
        IntNode* getLink() const {return link; }
        int getData() const {return data; }
        void setData(int theData) {data = theData; }
        void setLink(IntNode* pointer) {link = pointer; }

private:
    int data;
    IntNode *link;

};

typedef IntNode* IntNodePtr;

class LinkedList1 {
    public:
        LinkedList1() {head = NULL;}
        //void headInsert(IntNodePtr& head, int theData);
        void headInsert(int theData);
        void insert(IntNodePtr afterMe, int theData);
        void deleteHeadNode();
        void remove(int theData);
        void output();
    private:
        IntNodePtr head;
};



int main() {
    char ans;
    int tableSize=0;

    cout<<"Please enter in the table size: ";
    cin >> tableSize;

    LinkedList1* first;

    first = new LinkedList1[tableSize];




    do {
        first[0].headInsert(2);
        first[0].headInsert(3);
        first[0].headInsert(4);
        first[0].headInsert(5);
        first[0].headInsert(6);
        first[0].output();
        first[0].deleteHeadNode();
        cout<<endl;
        first[0].output();
        first[0].remove(3);
        cout<<endl;
        first[0].output();

        cout << "Would you like to start over again(y = yes/n = no)? ";
        cin >> ans;

    } while (ans != 'n');

    return 0;
}//end of main


void LinkedList1::headInsert(int theData) {
    head = new IntNode(theData, head);
}//end of headInsert

void LinkedList1::insert(IntNodePtr afterMe, int theData) {
    afterMe->setLink(new IntNode(theData, afterMe->getLink()));
}//end of insert

void LinkedList1::deleteHeadNode() {
    if (head != NULL) {
        head = head->getLink();
    }//end of if statement
    else {
        cout<<"You are at the first node already"<<endl;
    }//end of else
}//end of deleteHeadNode

void LinkedList1::remove(int theData) {
    IntNodePtr before,discard,test;
    before = head;
    discard = head;
    test = head;
    int count1 = 0;
    int count2 = 0;
    int count3 = 0;

    if (before == NULL) {
        cout<<"List is empty"<<endl;
    }
    else {
        while(discard->getData() != theData && discard->getLink() != NULL) {
            count1++;
            discard = discard->getLink();
        }//end of while loop

        int i = count1 - 1;

        while (count2 != i) {
            before = before->getLink();
            count2++;
        }//end of while loop

        while(test->getLink() != NULL) {
            test = test->getLink();
            count3++;
        }

        if(count3 == count1) {
            cout<<"Value is not in the list"<<endl;
        }
    }

    before->setLink(discard->getLink());
}//end of remove function

void LinkedList1::output() {
    IntNodePtr position = head;

    if(position == NULL) {
        cout<<"The list is empty"<<endl;
    }
    else {
        while(position != NULL) {
            cout<<"The values of the list are "<<position->getData()<<endl;
            position = position->getLink();
        }//end of while loop
    }
}//end of output list
 
Not sure what you are trying to do. You should really be using a library for any type of hash function. The usual method to use a hash as an index will be to utilize both an array and also a linked list. The array should be the maximum size of your data set. A function is used to make a hash value, something like a CRC32 on your data or likely other faster methods. This is then mod against array size to find where it inserts. If the slot is empty it is inserted, else it is a collision with a prior value; it goes into the end of the linked list chain.

As an example of the usefulness of hashes, say you have a database and you want to index 20 fields. You can create an index with the 20 fields or you can create a hash index in the record, the hash value generated from concatenating the 20 fields. Then when you search on the hash value it will group similar records together. This is not precise as there can be collisions, but as long as you know to look for them it works very well.
 
So when creating a linked list, I know there is a node class, and a linked list class.

Where creating a hash table is there a node class, linked list class and a Hash Table class?
 
Back
Top