Lab 10

Sorting and Searching

Objectives

In this lab you will gain experience with classic searching and sorting algorithms and compare how their execution times (as measured by the “wall-clock” running time) scale with growing input sizes. Note that this is different than the definition of efficiency we’ve focused on in class; we argued that efficiency should be measured by counting the number of operations that an algorithm takes as a function of its inputs, in the worst case. Hopefully, this lab will show us that our in-class definition of efficiency does actually capture something useful about algorithm running times!

Before you start this lab, review the binary search and selection sort algorithms from lecture.

Getting Started

Clone the lab resources from the GitLab repository, as usual:

cd cs134
git clone https://evolene.cs.williams.edu/cs134-labs/23xyz3/lab10.git

where your CS username replaces 23xyz3. The description of the files provided in the starter is available in README.md and discussed below.

We have provided you with four Python scripts: search.py, sort.py, compare.py and plot.py. These contain completed implementations of some algorithms (linear_search, binary_search_recursive and merge_sort), as well functions to read data and plot running times of various sorting and searching algorithms. You should review this code and understand how it works, but there is no need to change it!

Instead, your tasks are to:

fix the broken/incomplete implementations of binary_search_iterative and selection_sort.
plot the wall-clock running times of each of the algorithms (both the provided “correct” algorithms and your newly fixed algorithms).

Each of these tasks are described in detail below.

Collaboration

Each student must complete and submit their own lab solution by committing and pushing to their own individual Git repository. However, you are allowed to discuss your code, your debugging strategies, and your findings with a labmate during your scheduled lab period. Additional guidelines for collaboration:

You may actively work on the same lab problem together: that is, you may discuss at a high-level your process with a partner who is simultaneously working on the same question that you are. However, you may not seek help from any student who has already completed a question. In other words, you may work collaboratively on the solving of a question, but you may not use or benefit from another student’s completed work.
You may discuss the code for any of the implementations of the completed functions that we have provided, debugging strategies, python syntax, and error messages without restriction.

Q1. Iterative Binary Search

In lecture, we discussed a recursive implementation of the binary search algorithm. This implementation is included in search.py. Your task is to translate this recursive implementation to an interactive one. That is, complete binary_search_iterative in search.py that uses a while loop to perform binary search. The outline of the iterative solution has been provided with five TODO#s highlighted for you in comments.

After you complete the TODOs in binary_search_iterative,

test your implementation by running the provided testing code in the if __name__ == "__main__" block, and
add at least one additional test to the if __name__ == "__main__" block that tests behavior that is not already exercised by the provided test.

Of course, Q1 is not complete until your implementation passes both tests!

Q2. Debug Selection Sort

In the file sort.py we have included a buggy implementation of the selection sort algorithm discussed in lecture. The function selection_sort takes as input a list my_lst and tries to sort it in-place by mutating the list (in other words, the value of my_list is changed rather than returning a new copy of the my_list where the elements are in sorted order as is done by the sorted function). The function selection_sort is attempting the use the selection sort algorithm, which is (correctly) described below:

For each index i from 0 to n-1 (where n is the length of the list), do the following:

Find a smallest element between indices i and n-1 in the list (choosing exactly one of the smallest elements if there is a tie)
Place this min element in the ith position by swapping it with the current element at index i.

Thus, selection sort finds the smallest element in the entire list and swaps the item at index 0 with this item, then finds the second-smallest element (which is the smallest in the new list my_lst[1:]) and swaps it with the item at index 1 and so on.

The function selection_sort in sort.py does not correctly sort the input. You may verify this by running the file as a script and seeing that the provided function merge_sort sorts correctly, but selection sort does not. The bugs in the selection sort implementation are in the implementation of the helper function, get_min_index.

Your task is to fix the helper function and make sure selection_sort is working correctly.

Helpful hints. You may want to add some print statements to figure out what is going wrong. Unlike question 1, we have not highlighted the number of steps or position of bugs.

Q3. Plotting wall-clock runtimes

In class we discussed the Big Oh complexity of the following search and sort algorithms:

linear search: O(n)
binary search: O(log n)
selection sort: O(n^2)
merge sort: O(n log n)

While the Big Oh complexity is a pessimistic and approximate estimate of the total number of steps, it is a good predictor of the scaling of actual runtimes in practice. To see this, we will plot the wall-clock time taken by the above search and sort algorithms for varying values of n, and we will verify that the predicted Big Oh trends hold true.

To keep track of the the number of seconds these functions take to search/sort and return their output, we will use Pythons time module. The code that calls and plots the runtimes is already provided for you in plot.py.

Your task is to call the plotting functions in the if __name__ == "__main__": block in the file compare.py. In particular, you must do the following:

call the plot_search_times function from the plot module, providing arguments num_lst and item. This will generate a plot of the runtime comparison of three search algorithms: linear_search, binary_search and Python’s built-in in operator. Observe whether the trends match the Big Oh predictions. What does the plot indicate about the complexity of Python’s built-in in operator?
call the read_data function in compare.py to read in all the text from the book prideandprejudice.txt. We will sort this list of strings using three different sorting algorithms: merge_sort, selection_sort and Python’s built-in .sort() method for lists. To compare their runtimes, call the plot_sort_times function on the word list returned by read_data. You may have to X out of the search plot to ensure that the sorting plot shows up. Observe whether the Big Oh complexity trends match with the observed runtimes.

Submitting your work

When you’re finished, commit and push your work (the modified python files) to the server as in previous labs. Good luck as you head into finals week!