Searching Techniques
Files and Records: A record is a collection of related information and Collection of records is called a file.
Ordered and unordered
———————————–
Unordered– There is no relation between the records
Ordered –The keys are ordered in a specific fashion to enhance searching.
Keys: Specific field in a record to differentiate each record
Internal and External searches
1. Internal search –When the searching is performed the entire database or table is copied to the main memory.
2. External search– When only a part of database is copied to the main memory due to huge size of database.
Sequential search
———————————–
Simple and easiest method to implement.
The key field in each record is compared with the given key value starting from the first record. If there is a match the index of the key field is returned.
int seqSearch( int key, int array[ ], int n )
{
for ( int i=0; i<n; i++ )
if ( array[i] == key )
return i;
return -1;
}
Sequential search on a sorted array
——————————————————————-
Sequential search if applied on a sorted array reduces the number of comparisons if some modifications are made.
This is faster than the ordinary sequential searches.
Sequential search on a sorted array
—————————————————-
int seqSearchsorted( int key, int array[ ], int n )
{
for ( int i=0; i<n; i++ )
{
if ( array[i] == key )
return i;
else if( array[i] > key )
return -1;
}
}
Indexed sequential search
—————————————-
Procedure: The index table and the records must be sorted on the key. Not all keys are indexed in to the index table. The record table is divided equally and the key in the record, in the beginning of each division along with a pointer to that record is entered in to the index table.
int indseqSearch( int key, int indexsize, int indtable[ ], int *indpointer[ ] int array[ ] )
{
int llimit,hlimit;
for( int i=0; indtable[i] <= key && i<indexsize; i++);
llimit = (i==0) ? 0 : indpointer[i-1];
hlimit = (i==indexsize) ? n-1 : indpointer[i-1];
for( int llimit; j <= hlimit && array[j] != key; j++);
return ( ( j > hlimit ) ? -1 : j );
}
Advantages: Fastest search among all
Disadvantages: Uses an extra table to store indexes,which may consume spaces for large databases.
Binary search Technique
—————————————
Procedure: The given sorted records are divided into two halves. The key is first compared with the key field of the middle record. If a match is found, the key index is returned. If it does not match, then required key must be either in the lower or upper half. If the key is less than the key field of the middle record, the key is searched in the lower half otherwise it is checked in the upper half.
int binsearch(int records[], int key, int llimit, int hlimit)
{
int middle;
if( llimit > hlimit)
return (-1);
middle = (llimit + hlimit) / 2;
if( key == records[middle] )
return (middle);
else
{
if( key < records[middle] )
return (binsearch( records, key , llimit ,middle – 1) );
else
return (binsearch( records, key, middle + 1, hlimit) );
}
}
Advantages of Binary search: Most effective technique applied on sorted arrays.
Disadvantages: Cannot be implemented on unsorted array.
Hashing: Process to develop a hash function to determine where the records can be stored
Example : r=h(k)
h(k)- Hash function
k – Key
r – an offset when added to the starting address of the block of memory, where the records are to be stored, yields the particular record.
Properties of Hashing: Any hash function developed should return a unique value Must be able to minimize the number of collisions.
Hash collision: If the hash function returns the same value for two different records it is called hash collisions.
Mid square method: In this method, the key is squared first. Then required number of digits is chosen from the middle of the squared number. The number of digits to be chosen depends on the size of the record.
Resolving Hash Collisions
———————————-
Methods for minimizing the collisions
Open address method
Method of finding the next available space to store the record
Also known as rehashing
The two types of probing are
Linear Probing
Quadratic probing
Double Hashing: Solves the problem of double clustering
Two hashing methods used h1(k) and h2(k)-One is primary and the other is secondary.