Table Examples#

Note: This notebook is set up to give you practice with the material we’ve covered so far this semester. The solutions to the questions posed on this page are visible if you show the hidden cells, but we encourage you to try to solve them on your own. You may do so on paper or by opening this notebook. It has blank cells for you to write the solutions to each question from scratch.

from datascience import *
from cs104 import *
import numpy as np
%matplotlib inline

1. Group and Pivot Practice#

Question. Write a function volume that takes the radii for some spheres as an array and returns the sum of the volumes of those spheres. Test your solution with several check operations.

Recall that the volume of a sphere with radius \(r\) is: \( \frac{4}{3} \pi r^3 \). Thus, the volume for a sphere with radius 1 is approximately 4.1888, and the volume when the radius is 2 is 33.5103.

def volume(radii):
    return sum(4/3 * np.pi * radii ** 3)

check(volume(make_array(1)) == approx(4.1888))
check(volume(make_array(1,2)) == approx(4.1888 + 33.5103))

Here is a small table with marble data.

marbles = Table().with_columns('Color', np.random.choice(['Red', 'Yellow', 'Blue'], 10),
                               'Transparency', np.random.choice(["Opaque", "Transparent"], 10),
                               'Radius', np.random.choice(np.arange(1,5), 10))
marbles.show(4)

Color	Transparency	Radius
Red	Opaque	1
Blue	Opaque	4
Blue	Transparent	1
Red	Opaque	3

... (6 rows omitted)

Question. Create a new table that shows the total volume of the marbles of each color. The columns of the new table should be ‘Color’ and ‘Volume’.

volumes = marbles.group('Color', volume)
volumes = volumes.relabeled('Radius volume', 'Volume')
volumes = volumes.select('Color', 'Volume')
volumes

Color	Volume
Blue	540.354
Red	263.894
Yellow	41.8879

Question. Create a new table that shows the total volume of the marbles of each combination of color and transparency. Opaque or Transparent. The columns of the new table should be ‘Color’ and ‘Opaque’, ‘Transparent’, and there should be one row for each color.

marbles.pivot('Transparency', 'Color', 'Radius', volume)

Color	Opaque	Transparent
Blue	268.083	272.271
Red	117.286	146.608
Yellow	4.18879	37.6991

2. Hopkins Trees#

hopkins_trees = Table.read_table("data/hopkins-trees.csv")
hopkins_trees = hopkins_trees.drop("genus", "species")
hopkins_trees.sample(4)

plot	common name	count
p0925	Oak, red	5
p0402	Maple, sugar	5
p1119	Shadbush	2
p1039	Elm, American	1

Question. Create a table showing the number of trees of each species in whole forest, sorted from most to least.

tree_counts = hopkins_trees.drop("plot").group("common name", sum).sort("count sum", descending=True)
tree_counts = tree_counts.relabeled('count sum', 'count')
tree_counts.show(4)

common name	count
Beech, American	42922
Maple, striped	8939
Maple, red	5564
Maple, sugar	5193

... (73 rows omitted)

Question. Show the top ten tree species in a bar chart.

tree_counts.take(np.arange(0,10)).barh('common name', 'count')

Question. Create a table with counts for each type of maple for each plot. You should have a column for each species of Maple tree, and a row for each plot.

maples = hopkins_trees.where('common name', are.containing('Maple'))
maples.pivot('common name', 'plot', 'count', sum)

plot	Maple, mountain	Maple, red	Maple, striped	Maple, sugar
p00-1	0	8	28	12
p00-2	0	2	43	14
p0000	0	13	0	4
p0001	3	20	11	0
p0002	0	12	1	0
p0003	0	4	41	1
p0004	0	2	9	3
p0005	0	5	17	0
p0006	0	3	26	14
p0007	0	6	123	67

... (410 rows omitted)

Question. Make a histogram of the counts of red maples in each plot. Set bins to be np.arange(0,100,5) when creating the histogram.

red_maples = hopkins_trees.where("common name", are.equal_to("Maple, red"))
red_maples.show(3)
red_maples.hist('count', bins=np.arange(0,100,5))

plot	common name	count
p00-1	Maple, red	8
p00-2	Maple, red	2
p0000	Maple, red	13

... (348 rows omitted)

Questions about the plot you’ve created:

What are the units here?
How many plots have fewer than 10 red maples?

Question. Write a function that will produce a histogram of the counts of any given species in each plot.

def plot_counts(common_name):
    counts = hopkins_trees.where("common name", are.equal_to(common_name)).sort("count", descending=True)
    counts.hist('count')

with Figure(1,3):
    plot_counts('Maple, red')
    plot_counts('Oak, red')
    plot_counts('Beech, American')

Question. Write an interactive visualization that will show the histogram for the species selected from a popup menu.

all_names = hopkins_trees.sort('common name', distinct=True).column('common name')
interact(plot_counts, common_name=Choice(all_names))

common_name

3. Sky Scrapers#

Here’s a new dataset featuring skyscrapers in the US.

sky = Table.read_table('data/skyscrapers_v2.csv')
sky.show(5)

name	material	city	height	completed
One World Trade Center	mixed/composite	New York City	541.3	2014
Willis Tower	steel	Chicago	442.14	1974
432 Park Avenue	concrete	New York City	425.5	2015
Trump International Hotel & Tower	concrete	Chicago	423.22	2009
Empire State Building	steel	New York City	381	1931

... (1776 rows omitted)

Question. Add an age column that indicates the age of each skyscraper in years.

sky = sky.with_column('age', 2024 - sky.column('completed'))
sky.show(3)

name	material	city	height	completed	age
One World Trade Center	mixed/composite	New York City	541.3	2014	10
Willis Tower	steel	Chicago	442.14	1974	50
432 Park Avenue	concrete	New York City	425.5	2015	9

... (1778 rows omitted)

Question. Create an overlaid histogram of building age for the different materials. What does it tell you about skyscrapers?

sky.hist('age', group='material')

Question. For each city, what’s the tallest building for each material? Do it two ways: with group and with pivot.

tallest = sky.select('material', 'city', 'height').group(make_array('city', 'material'), max)
tallest.show(5)

city	material	height max
Atlanta	concrete	264.25
Atlanta	mixed/composite	311.8
Atlanta	steel	169.47
Austin	concrete	208.15
Austin	steel	93.6

... (86 rows omitted)

sky_pivot = sky.pivot('material', 'city', 'height', max)
sky_pivot.show(5)

city	concrete	mixed/composite	steel
Atlanta	264.25	311.8	169.47
Austin	208.15	0	93.6
Baltimore	161.24	0	155.15
Boston	121.92	139	240.79
Charlotte	265.48	239.7	179.23

... (30 rows omitted)

Question. For each city, what’s the height difference between the tallest steel building and the tallest concrete building?

sky_diff = sky_pivot.with_column(
    'difference', 
    abs(sky_pivot.column('steel') - sky_pivot.column('concrete'))
)
sky_diff.show(5)

city	concrete	mixed/composite	steel	difference
Atlanta	264.25	311.8	169.47	94.78
Austin	208.15	0	93.6	114.55
Baltimore	161.24	0	155.15	6.09001
Boston	121.92	139	240.79	118.87
Charlotte	265.48	239.7	179.23	86.25

... (30 rows omitted)

Question. Show a map with heights of tallest in each city.

# Load in some useful geo information
cities = Table().read_table('data/uscities.csv').where('population', are.above(300000))
cities = cities.select('city', 'lat', 'lng')
cities.show(5)

city	lat	lng
New York	40.6943	-73.9249
Los Angeles	34.1141	-118.407
Chicago	41.8375	-87.6866
Miami	25.784	-80.2101
Dallas	32.7935	-96.7667

... (141 rows omitted)

tallest = sky.select('material', 'city', 'height').group('city', max)
tallest = tallest.join('city', cities).select('lat', 'lng', 'city', 'height max')

points = tallest.with_columns('colors', 'blue',
                              'areas', tallest.column("height max"))
Circle.map_table(points).show()

Make this Notebook Trusted to load map: File -> Trust Notebook

(We haven’t done much with maps – you can use this as a reference for your own work, but we’d never ask you to create maps like this on, say, the midterm…)

CSCI 104: Data Science and Computing for All

Table Examples

Contents

Table Examples#

1. Group and Pivot Practice#

2. Hopkins Trees#

3. Sky Scrapers#