Table Examples#

Note: This notebook is set up to give you practice with the material we’ve covered so far this semester. The solutions to the questions posed on this page are visible if you show the hidden cells, but we encourage you to try to solve them on your own. You may do so on paper or by opening this notebook. It has blank cells for you to write the solutions to each question from scratch.

from datascience import *
from cs104 import *
import numpy as np
%matplotlib inline

1. Group and Pivot Practice#

Question. Write a function volume that takes the radii for some spheres as an array and returns the sum of the volumes of those spheres. Test your solution with several check operations.

Recall that the volume of a sphere with radius \(r\) is: \( \frac{4}{3} \pi r^3 \). Thus, the volume for a sphere with radius 1 is approximately 4.1888, and the volume when the radius is 2 is 33.5103.

def volume(radii):
    return sum(4/3 * np.pi * radii ** 3)

check(volume(make_array(1)) == approx(4.1888))
check(volume(make_array(1,2)) == approx(4.1888 + 33.5103))

Here is a small table with marble data.

marbles = Table().with_columns('Color', np.random.choice(['Red', 'Yellow', 'Blue'], 10),
                               'Transparency', np.random.choice(["Opaque", "Transparent"], 10),
                               'Radius', np.random.choice(np.arange(1,5), 10))
marbles.show(4)
Color Transparency Radius
Red Opaque 1
Blue Opaque 4
Blue Transparent 1
Red Opaque 3

... (6 rows omitted)


Question. Create a new table that shows the total volume of the marbles of each color. The columns of the new table should be ‘Color’ and ‘Volume’.

volumes = marbles.group('Color', volume)
volumes = volumes.relabeled('Radius volume', 'Volume')
volumes = volumes.select('Color', 'Volume')
volumes
Color Volume
Blue 540.354
Red 263.894
Yellow 41.8879

Question. Create a new table that shows the total volume of the marbles of each combination of color and transparency. Opaque or Transparent. The columns of the new table should be ‘Color’ and ‘Opaque’, ‘Transparent’, and there should be one row for each color.

marbles.pivot('Transparency', 'Color', 'Radius', volume)
Color Opaque Transparent
Blue 268.083 272.271
Red 117.286 146.608
Yellow 4.18879 37.6991

2. Hopkins Trees#

hopkins_trees = Table.read_table("data/hopkins-trees.csv")
hopkins_trees = hopkins_trees.drop("genus", "species")
hopkins_trees.sample(4)
plot common name count
p0925 Oak, red 5
p0402 Maple, sugar 5
p1119 Shadbush 2
p1039 Elm, American 1

Question. Create a table showing the number of trees of each species in whole forest, sorted from most to least.

tree_counts = hopkins_trees.drop("plot").group("common name", sum).sort("count sum", descending=True)
tree_counts = tree_counts.relabeled('count sum', 'count')
tree_counts.show(4)
common name count
Beech, American 42922
Maple, striped 8939
Maple, red 5564
Maple, sugar 5193

... (73 rows omitted)


Question. Show the top ten tree species in a bar chart.

tree_counts.take(np.arange(0,10)).barh('common name', 'count')
../_images/13-table-examples_25_0.png

Question. Create a table with counts for each type of maple for each plot. You should have a column for each species of Maple tree, and a row for each plot.

maples = hopkins_trees.where('common name', are.containing('Maple'))
maples.pivot('common name', 'plot', 'count', sum)
plot Maple, Norway Maple, mountain Maple, red Maple, striped Maple, sugar
p00-1 0 0 8 28 12
p00-2 0 0 2 43 14
p0000 0 0 13 0 4
p0001 0 3 20 11 0
p0002 0 0 12 1 0
p0003 0 0 4 41 1
p0004 0 0 2 9 3
p0005 0 0 5 17 0
p0006 0 0 3 26 14
p0007 0 0 6 123 67

... (410 rows omitted)


Question. Make a histogram of the counts of red maples in each plot. Set bins to be np.arange(0,100,5) when creating the histogram.

red_maples = hopkins_trees.where("common name", are.equal_to("Maple, red"))
red_maples.show(3)
red_maples.hist('count', bins=np.arange(0,100,5))
plot common name count
p00-1 Maple, red 8
p00-2 Maple, red 2
p0000 Maple, red 13

... (348 rows omitted)

../_images/13-table-examples_31_1.png

Questions about the plot you’ve created:

  • What are the units here?

  • How many plots have fewer than 10 red maples?


Question. Write a function that will produce a histogram of the counts of any given species in each plot.

def plot_counts(common_name):
    counts = hopkins_trees.where("common name", are.equal_to(common_name)).sort("count", descending=True)
    counts.hist('count')
with Figure(1,3):
    plot_counts('Maple, red')
    plot_counts('Oak, red')
    plot_counts('Beech, American')
../_images/13-table-examples_37_0.png

Question. Write an interactive visualization that will show the histogram for the species selected from a popup menu.

all_names = hopkins_trees.sort('common name', distinct=True).column('common name')
interact(plot_counts, common_name=Choice(all_names))

3. Sky Scrapers#

Here’s a new dataset featuring skyscrapers in the US.

sky = Table.read_table('data/skyscrapers_v2.csv')
sky.show(5)
name material city height completed
One World Trade Center mixed/composite New York City 541.3 2014
Willis Tower steel Chicago 442.14 1974
432 Park Avenue concrete New York City 425.5 2015
Trump International Hotel & Tower concrete Chicago 423.22 2009
Empire State Building steel New York City 381 1931

... (1776 rows omitted)


Question. Add an age column that indicates the age of each skyscraper in years.

sky = sky.with_column('age', 2024 - sky.column('completed'))
sky.show(3)
name material city height completed age
One World Trade Center mixed/composite New York City 541.3 2014 10
Willis Tower steel Chicago 442.14 1974 50
432 Park Avenue concrete New York City 425.5 2015 9

... (1778 rows omitted)


Question. Create an overlaid histogram of building age for the different materials. What does it tell you about skyscrapers?

sky.hist('age', group='material')
../_images/13-table-examples_50_0.png

Question. For each city, what’s the tallest building for each material? Do it two ways: with group and with pivot.

tallest = sky.select('material', 'city', 'height').group(make_array('city', 'material'), max)
tallest.show(5)
city material height max
Atlanta concrete 264.25
Atlanta mixed/composite 311.8
Atlanta steel 169.47
Austin concrete 208.15
Austin steel 93.6

... (86 rows omitted)


sky_pivot = sky.pivot('material', 'city', 'height', max)
sky_pivot.show(5)
city concrete mixed/composite steel
Atlanta 264.25 311.8 169.47
Austin 208.15 0 93.6
Baltimore 161.24 0 155.15
Boston 121.92 139 240.79
Charlotte 265.48 239.7 179.23

... (30 rows omitted)


Question. For each city, what’s the height difference between the tallest steel building and the tallest concrete building?

sky_diff = sky_pivot.with_column(
    'difference', 
    abs(sky_pivot.column('steel') - sky_pivot.column('concrete'))
)
sky_diff.show(5)
city concrete mixed/composite steel difference
Atlanta 264.25 311.8 169.47 94.78
Austin 208.15 0 93.6 114.55
Baltimore 161.24 0 155.15 6.09001
Boston 121.92 139 240.79 118.87
Charlotte 265.48 239.7 179.23 86.25

... (30 rows omitted)


Question. Show a map with heights of tallest in each city.

# Load in some useful geo information
cities = Table().read_table('data/uscities.csv').where('population', are.above(300000))
cities = cities.select('city', 'lat', 'lng')
cities.show(5)
city lat lng
New York 40.6943 -73.9249
Los Angeles 34.1141 -118.407
Chicago 41.8375 -87.6866
Miami 25.784 -80.2101
Dallas 32.7935 -96.7667

... (141 rows omitted)

tallest = sky.select('material', 'city', 'height').group('city', max)
tallest = tallest.join('city', cities).select('lat', 'lng', 'city', 'height max')

points = tallest.with_columns('colors', 'blue',
                              'areas', tallest.column("height max"))
Circle.map_table(points).show()
Make this Notebook Trusted to load map: File -> Trust Notebook

(We haven’t done much with maps – you can use this as a reference for your own work, but we’d never ask you to create maps like this on, say, the midterm…)