# Measuring Efficiency

Why do we care about measuring algorithmic efficiency of searching and sorting?

* We want programs that run "fast"
* We want programs that run "efficiently"
* We want programs that run "optimally"

How do we measure the efficiency of our program?

We want programs that run "fast", but how should we measure this? One idea:  use a stopwatch to see how long it takes - Is this a good method?  What is the stopwatch really measuring?

How long does this piece of code takes on this machine on this particular input.

We want an analysis that is machine (and input) dependent because we want to evaluate our programâ€™s efficiency, not the machine's speed. We cannot make any general conclusions using a stopwatch as it might not tell us how fast the program runs on different inputs/machines.

## Searching

_Problem Definition:_ Given a list `a_lst` of length `n` and an `item`, is `item` present in `a_lst`?  
* If `item` is in `a_lst`, return `True` 
* else return `False`

### Searching in an Unsorted List

Let's start with a unsorted list first.

In [None]:
def linear_search(a_lst, item):
    for el in a_lst:
        if item == el:
            return True
    return False

In [None]:
linear_search([12, 16, 23, 2, 7], 16)

In [None]:
linear_search([12, 16, 23, 2, 7], 13)

In [None]:
linear_search(['a', 'e', 'i', 'o', 'u'], 'u')

In [None]:
linear_search(['a', 'e', 'i', 'o', 'u'], 'a')

In [None]:
linear_search(['hello', 'world', 'silly'], 'hi')

### Searching in a Sorted List

Can we do better if the given list `a_lst` is sorted?  Yes!  

Let's implement a binary search function below.

In [None]:
def binary_search(seq, item):
    """Assume seq is sorted. If item is 
    in seq, return True; else return False."""

    n = len(seq)

    # base case 1
    if n == 0:
        return False
    
    mid = n // 2
    mid_elem = seq[mid]

    # base case 2
    if item == mid_elem:
        return True
    
    # recurse on left
    elif item < mid_elem:
        left = seq[:mid]
        return binary_search(left, item)
        
    # recurse on right
    else:
        right = seq[mid+1:]
        return binary_search(right, item)

In [None]:
binary_search(['a', 'e', 'i', 'o', 'u'], 'a')

In [None]:
binary_search(['a', 'e', 'i', 'o', 'u'], 'b')

In [None]:
binary_search(sorted(['hello', 'world', 'silly']), 'hi')

In [None]:
binary_search(sorted(['hello', 'world', 'silly']), 'hello')

In [None]:
num_lst = sorted([23, 1, 2, 90, 0, 10, 12, 120, 45])

In [None]:
num_lst

In [None]:
binary_search(num_lst, 11)

In [None]:
binary_search(num_lst, 45)

Although the above approach works, it is actually not O(log n)!  The problem is that list splicing is actually an O(n) operation.  In order to write a truly logarithmic binary search, we have to recursively pass index values rather than creating list copies using splicing.  

In [4]:
def binary_search_helper(seq, item, start, end):
    '''Recursive function used in binary_search_recursive.
    Makes log n recursive calls in the worst case, 
    where n is the length of the sequence. 
    ''' 
        
    # base case 1
    if start > end:
        return False
    
    mid = (start + end) // 2 
    mid_elem = seq[mid]

    if item == mid_elem:
        return True
    
    # recurse on left
    elif item < mid_elem:
        return binary_search_helper(seq, item, start, mid-1)
        
    # recurse on right
    else:
        return binary_search_helper(seq, item, mid+1, end)

In [5]:
def binary_search_improved(seq, item):
    '''Given a sorted sequence of items that can be compared 
    with operators == and < and a item, returns True if
    item is in seq, else returns False.'''

    return binary_search_helper(seq, item, 0, len(seq)-1)

In [6]:
binary_search_improved(sorted(['hello', 'world', 'silly']), 'hi')

False

In [7]:
binary_search_improved(sorted(['hello', 'world', 'silly']), 'hello')

True