Saturday, January 28, 2017

Hashing & Hashtable

HashTable:-

Hash table is faster than trees in terms of search and insertion (and sometimes deletion also). 

The time complexity is constant for these operations O(1) , no matter how many data items are there.

Disadvantages of hash table:-

1) Hash Tables are based on array. Arrays are difficult to expand incase we do not have information on count of input.
2) Sorting the data in hash table is not very convinient.

Basically hash table is a table where each value has a key or we can say an id. This id can be treated as address for that value in a given data structure like array.

So if we are using an array for creating our hash table, then in that case the array index can be used as id/key. Now, we can use a function/method to calculate the key for a given value to be stored in the array acting as hash table.


Hashing:-

What prompted hashing:-

Though many scenarios prompted hashing, but one of the examples we can use is below:-

Want to create a dictionary of 50,000 words in computer memory. To do so, we can do the following:-

Approach 1 :-  (Not a good approach)

Create an array of 50,000 length and store every word in each cell. But problem will be that how we will know which word lives in which index. So searching a word will become very slow.

Approach 2 :- (Not a good approach)

We know that we generally follow different flavors of character encoding standards like ASCII, UTF-8 etc. In these character encoding standards, we basically represent character with a number. Now with this approach we can sum all the numbers of the characters of a word and use that sum as index value which can be used to store the word. But that will create problems for palindrome words as they will have same index value when stored in array.

Approach 3 or Hashing :- (Good approach)

In hashing, we basically convert a large range into a number which is in a small range. The smaller range corresponds to index numbers an array. This function this operation of converting large range into small range is called hash function.

We can achieve this by doing following:-

To squeeze a large range into a number is small range, we can use modulo operator (%) which finds remainder when one number is divided by another. Lets understand this with an example:-

Large range :- 0 to 199
Small range :- 0 to 9

Now, size/length of small range is 10 (0,1,2,3,4,5,6,7,8,9)

Use below formula --> to find small number for a number from large range, say 157

small number = (large number) % (size/length of small range)
                      =  157 % 10
                      = 7



Now we can store word corresponding to 157 at index 7 in an array with an array size/limit of 10 (0 to 9)

Similarly we can index value in a small range array to store 50,000 dictionary words which is a large range.

So, in this case, the above formula  small number = (large number) % (size/length of small range) can be called as hash function where we compute array index in the form of small number from the above formula.

HashTable :- An array into which data is inserted using a hash function is called a hash table.


An array where data is inserted using hash function is called a hash table.


Thursday, January 26, 2017

Red Black Trees

Red-Black Trees:-

In continuation to binary search trees. Searching, deleting and inserting are very quick in a BST given that the BST is balanced or somewhat balanced. But the problem comes when data to be inserted is a sorted data either ascending form or descending form.

For example :- {3,5,7,8,9,12,28,30}  — asc sorted
                         {30,28,12,9,8,7,5,3} — desc sorted

When we try to create a BST out of sorted data, the tree becomes unbalanced and starts growing in either left direction if desc sorted or in right direction if asc sorted. So basically, it will be safe to say that it kind of becomes a linked list and with linked list comes its own issues.

Searching, insertion & deletion becomes O(N) instead of O(LogN).

So for a BST or BT (binary tree) to remain balanced , it is important that data insertion happens in random order instead of data being sorted. Though random insertion can also make a BST/BT unbalanced if a sequence of input data in between is in sequential order — for example :- {6,4,7,5,2,8,9,10,11,34,23,55,43} where 8,9,10,11 is a sequence which will make the tree unbalanced towards right side in some way.

Now to avoid a BST/BT in becoming unbalanced at the time of insertion, there are certain techniques available while doing insert and red-black tree is one such technique.

RB-Tree is basically some set of rules which should be followed while we are doing insert operation on BST/BT. These rules will ensure that remains balanced.

1) Every node is either red or black
2) The root node is always black
3) If a node is red, its children must be black
4) Every path from root to a leaf or to null child must contain same number of black nodes.
5) Every nil node is always black

Some light on the above 4 rules:-

 (a) Node being red/black is basically to conceptually understand the technique. To identify a node as red/black we can add a boolean flag in Node class to set it as true or false for red/black.
 (b) “Null Child” refereed in point 4 is basically a potential child to non-leaf node.For example potential left child to non-leaf node with right child or the other way round.

Black Height :- Number of black nodes while traversing from root to leaf. Re-enforcing rule 4, black height for root to leaf node should be same.


Duplicate Keys :- 

Incase we come across a sequence like 50,50,50 while performing insert , these duplicate records will be insert in the order:-

1) Insert first node
2) Insert first duplicate as right child of the first node.
3) Insert second duplicate as left child of the first node.




Tuesday, January 24, 2017

Time Complexity of Algorithms (Big O also covered)

Time complexity of an algorithm is the time take by an algorithm to complete the process. Now we need a way to define/express (metric) the time complexity for an algorithm. For example, to define/express (metric) weight of something, we use Kg/Lbs. In the world of algorithms, we use Big O to define/express time complexity.

When we define time complexity for any algorithm, we take into account worst-case time complexity and not best-case time complexity. 

Worst-case time complexity :- Maximum time taken by an algorithm for any input size.

Best-case time complexity :- Minimum time taken by an algorithm for any input size.

How to calculate time complexity ?

Now we have take into account that we will be using Big O notation as the metric for calculating time complexity.

Big O -- What is that ?

Big O for any algorithm determines how an algorithm responds (take time) to different variety of inputs or may be how much slow an algorithm is if the input is very large.

Note:-
An algorithm is =>

(a) Linear -- If doing something with every item in one dimension.
(b) Quadratic -- If doing something with every item in two dimension.
(c) Logarithmic -- Dividing the item in half in every iteration.



Note:- We ignore constants in Big O

Lets jump on to coding examples to figure out Big O

Example 1 :-

public void printText (int N){
     
System.out.println("We are calculation big o");   //This line is also example 2 (O(1))

for(int i=0;i<N;i++){
     System.out.println(i);
}

}

In the above program, System.out.println("We are calculation big o");    is a statement which is constant and will not change based on input size. So we can ignore this statement and statement/variables like this when determining Big O and treat them as constants.

Time complexity for the for loop will be linear, which means time taken by for loop will be directly proportional to the size of input which is N in this case.

When we say linear, that means O(N) or Order of N. Linear is more in terms of graphical representation of O(N).

This is linear because line is straight.

Example 2 :-

Consider the below example:-

public int printText (int N){
     
     return N;
}

Time complexity or Big O for this program will be O(1). The reason is because irrespective of how large the input is, the time will always be same.


O(1) is constant time which is also considered as the best-case scenario



Example 3 :-

Consider the below example:-

public void performOperation(int N){
     
     for(int i=0 ; i<N ; i++){
              for(int j=0 ; j<N ; j++){
                  System.out.println("printing the value");
             }
      }
}


In the above case, there are two for loops iterating twice for give input. So time complexity in this case is quadratic. O(n^2) is the big O in this case.

Below is the comparison graph for the above three examples:-




Example 4 :-

Consider the below example:-

public void performDivideAndConquere( int target){
     while(low <= high) 
{
  mid = (low + high) / 2;
  if (target < list[mid])
    high = mid - 1;
  else if (target > list[mid])
    low = mid + 1;
  else break;
}
   }

The above algorithm break any given input into halves to search a particular field like we do in any divide and conquere type of algorithm. The time complexity of this algorithm will be logarithmic because this is opposite of growing exponentially 2 times and actually dividing the input into half with each iteration.

Big O for this algorithm will be Log2 (N) or just Log (N)




Example 5 :-

Example 4 has a time complexity of Log (N). What if the program in example 4 is in itself is also repeated N times . In that case the time complexity of such an algorithm will be  N * Log (N)

The running time consists of N loops (iterative or recursive) that are logarithmic, thus the algorithm is a combination of linear and logarithmic.

















Monday, January 23, 2017

Logarithms


In day to day to life, we often sometimes say that something grows exponentially. Say we invest some money somewhere, it grows exponentially every year. Now growing exponentially can mean money became double, triple etc

For example,  start color blueD, 2, end color blueD raised to the start color greenE, 4, end color greenE, start superscript, t, h, end superscript power equals start color goldD, 16, end color goldD. This is expressed by the exponential equation start color blueD, 2, end color blueD, start superscript, start color greenE, 4, end color greenE, end superscript, equals, start color goldD, 16, end color goldD.

2*2*2*2 = 16

What if the opposite happens. Opposite of exponential growth is logarithmic growth.

In plain terms, if we keep multiplying something with X times will be exponential and if we keep dividing something X times will be logarithmic.

24=16log2(16)=4


Both equations describe the same relationship between the numbers start color blueD, 2, end color blueDstart color greenE, 4, end color greenE, and start color goldD, 16, end color goldD, where start color blueD, 2, end color blueD is the basestart color greenE, 4, end color greenE is the exponent, and start color goldD, 16, end color goldD is the power
The difference is that while the exponential form isolates the power, start color goldD, 16, end color goldD, the logarithmic form isolates the exponent, start color greenD, 4, end color greenD.
Here are more examples of equivalent logarithmic and exponential equations. 
Logarithmic formExponential form
log, start subscript, start color blueD, 2, end color blueD, end subscript, left parenthesis, start color goldD, 8, end color goldD, right parenthesis, equals, start color greenD, 3, end color greenDstart color blueD, 2, end color blueD, start superscript, start color greenD, 3, end color greenD, end superscript, equals, start color goldD, 8, end color goldD
log, start subscript, start color blueD, 3, end color blueD, end subscript, left parenthesis, start color goldD, 81, end color goldD, right parenthesis, equals, start color greenD, 4, end color greenDstart color blueD, 3, end color blueD, start superscript, start color greenD, 4, end color greenD, end superscript, equals, start color goldD, 81, end color goldD
log, start subscript, start color blueD, 5, end color blueD, end subscript, left parenthesis, start color goldD, 25, end color goldD, right parenthesis, equals, start color greenD, 2, end color greenDstart color blueD, 5, end color blueD, start superscript, start color greenD, 2, end color greenD, end superscript, equals, start color goldD, 25, end color goldD

Definition of a logarithm

Generalizing the examples above leads us to the formal definition of a logarithm. 
logb(a)=cbc=a
Both equations describe the same relationship between  start color goldD, a, end color goldDstart color blueD, b, end color blueD, and start color greenE, c, end color greenE:
  • start color blueD, b, end color blueD is the start color blueD, b, a, s, e, end color blueD,
  • start color greenE, c, end color greenE is the start color greenE, e, x, p, o, n, e, n, t, end color greenE, and
  • start color goldD, a, end color goldD is the start color goldD, p, o, w, e, r, end color goldD.
log, start subscript, start color blueD, 2, end color blueD, end subscript, left parenthesis, start color goldD, 16, end color goldD, right parenthesis, equals, start color greenE, 4, end color greenE