In this lab you will gain experience with classic searching and sorting algorithms and compare how their execution times (as measured by the “wall-clock” running time) scale with growing input sizes. Note that this is different than the definition of efficiency we’ve focused on in class; we argued that efficiency should be measured by counting the number of operations that an algorithm takes as a function of its inputs, in the worst case. Hopefully, this lab will show us that our in-class definition of efficiency does actually capture something useful about algorithm running times!
Before you start this lab, review the binary search and selection sort algorithms from lecture.
Clone the lab resources from the GitLab repository, as usual:
cd cs134
git clone https://evolene.cs.williams.edu/cs134-labs/23xyz3/lab10.git
where your CS username replaces 23xyz3
. The description
of the files provided in the starter is available in
README.md
and discussed below.
We have provided you with four Python scripts:
search.py
, sort.py
, compare.py
and plot.py
. These contain completed implementations of
some algorithms (linear_search
,
binary_search_recursive
and merge_sort
), as
well functions to read data and plot running times of various sorting
and searching algorithms. You should review this code and understand how
it works, but there is no need to change it!
Instead, your tasks are to:
binary_search_iterative
and
selection_sort
.Each of these tasks are described in detail below.
Each student must complete and submit their own lab solution by committing and pushing to their own individual Git repository. However, you are allowed to discuss your code, your debugging strategies, and your findings with a labmate during your scheduled lab period. Additional guidelines for collaboration:
In lecture, we discussed a recursive implementation of the binary
search algorithm. This implementation is included in
search.py
. Your task is to translate this recursive
implementation to an interactive one. That is, complete
binary_search_iterative
in search.py
that uses
a while
loop to perform binary search. The outline of the
iterative solution has been provided with five TODO#
s
highlighted for you in comments.
After you complete the TODO
s in
binary_search_iterative
,
if __name__ == "__main__"
block, andif __name__ == "__main__"
block that tests behavior that is
not already exercised by the provided test.Of course, Q1 is not complete until your implementation passes both tests!
In the file sort.py
we have included a buggy
implementation of the selection sort algorithm discussed in lecture. The
function selection_sort
takes as input a list
my_lst
and tries to sort it in-place by mutating the list
(in other words, the value of my_list
is changed rather
than returning a new copy of the my_list
where the elements
are in sorted order as is done by the sorted
function). The
function selection_sort
is attempting the use the selection
sort algorithm, which is (correctly) described below:
For each index i
from 0
to n-1
(where n
is the length of the list), do the following:
i
and
n-1
in the list (choosing exactly one of the smallest
elements if there is a tie)i
th position by swapping
it with the current element at index i
.Thus, selection sort finds the smallest element in the entire list
and swaps the item at index 0
with this item, then finds
the second-smallest element (which is the smallest in the new list
my_lst[1:]
) and swaps it with the item at index
1
and so on.
The function selection_sort
in sort.py
does
not correctly sort the input. You may verify this by running the file as
a script and seeing that the provided function merge_sort
sorts correctly, but selection sort does not. The bugs in the selection
sort implementation are in the implementation of the helper function,
get_min_index
.
Your task is to fix the helper function and make sure
selection_sort
is working correctly.
Helpful hints. You may want to add some print statements to figure out what is going wrong. Unlike question 1, we have not highlighted the number of steps or position of bugs.
In class we discussed the Big Oh complexity of the following search and sort algorithms:
O(n)
O(log n)
O(n^2)
O(n log n)
While the Big Oh complexity is a pessimistic and approximate estimate
of the total number of steps, it is a good predictor of the scaling of
actual runtimes in practice. To see this, we will plot the wall-clock
time taken by the above search and sort algorithms for varying values of
n
, and we will verify that the predicted Big Oh trends hold
true.
To keep track of the the number of seconds these functions take to
search/sort and return their output, we will use Pythons
time
module. The code that calls and plots the runtimes is
already provided for you in plot.py
.
Your task is to call the plotting functions in the
if __name__ == "__main__":
block in the file
compare.py
. In particular, you must do the following:
plot_search_times
function from the
plot
module, providing arguments num_lst
and
item
. This will generate a plot of the runtime comparison
of three search algorithms: linear_search
,
binary_search
and Python’s built-in in
operator. Observe whether the trends match the Big Oh predictions. What
does the plot indicate about the complexity of Python’s built-in
in
operator?read_data
function in compare.py
to read in all the text from the book
prideandprejudice.txt
. We will sort this list of strings
using three different sorting algorithms: merge_sort
,
selection_sort
and Python’s built-in .sort()
method for lists. To compare their runtimes, call the
plot_sort_times
function on the word list returned by
read_data
. You may have to X
out of the search
plot to ensure that the sorting plot shows up. Observe whether the Big
Oh complexity trends match with the observed runtimes.When you’re finished, commit and push your work (the modified python files) to the server as in previous labs. Good luck as you head into finals week!