Week 5A
Getting Data Part 1: Working with APIs

Sep 29, 2020

Week #5 Agenda

  • Introduction to APIs
  • Pulling census data and shape files using Python
    • Example: Lead poisoning in Philadelphia
  • Using the Twitter API
    • Plotting geocoded tweets
    • Word frequencies
    • Sentiment analysis
In [1]:
import geopandas as gpd
import pandas as pd
import numpy as np
from shapely.geometry import Point

from matplotlib import pyplot as plt
import seaborn as sns

import hvplot.pandas
import holoviews as hv

hv.extension("bokeh")
%matplotlib inline
In [2]:
# UNCOMMENT TO SEE ALL ROWS/COLUMNS IN DATAFRAMES
# pd.options.display.max_columns = 999
# pd.options.display.max_rows = 999 

Introduction to APIs

Or, how to pull data from the web using Python

Example APIs

Note: when accessing data via API, many services will require you to register an API key to prevent you from overloading the service with requests

Part 1: Reading an automated data feed

USGS real-time earthquake feeds

This is an API for near-real-time data about earthquakes, and data is provided in GeoJSON format over the web.

The API has a separate endpoint for each version of the data that users might want. No authentication is required.

GeoPandas can read GeoJSON from the web directly

Simply pass the URL to the gpd.read_file() function:

In [3]:
# Download data on magnitude 2.5+ quakes from the past week
endpoint_url = "http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_week.geojson"
df = gpd.read_file(endpoint_url)
In [4]:
df.head() 
Out[4]:
id mag place time updated tz url detail felt cdi ... sources types nst dmin rms gap magType type title geometry
0 us6000c2cm 5.20 south of the Kermadec Islands 1601338654970 1601339694040 None https://earthquake.usgs.gov/earthquakes/eventp... https://earthquake.usgs.gov/earthquakes/feed/v... NaN NaN ... ,us, ,origin,phase-data, NaN 4.3070 0.78 69.0 mww earthquake M 5.2 - south of the Kermadec Islands POINT Z (-177.87720 -33.56530 10.00000)
1 us6000c2ch 4.50 7 km NNW of San José del Progreso, Mexico 1601335641190 1601339637496 None https://earthquake.usgs.gov/earthquakes/eventp... https://earthquake.usgs.gov/earthquakes/feed/v... 2.0 2.0 ... ,us, ,dyfi,origin,phase-data, NaN 1.6140 0.78 176.0 mb earthquake M 4.5 - 7 km NNW of San José del Progreso, Mexico POINT Z (-97.71600 16.16140 10.00000)
2 pr2020272014 3.38 85 km NNW of San Antonio, Puerto Rico 1601328284180 1601331229927 None https://earthquake.usgs.gov/earthquakes/eventp... https://earthquake.usgs.gov/earthquakes/feed/v... NaN NaN ... ,us,pr, ,origin,phase-data, 23.0 0.7913 0.50 217.0 md earthquake M 3.4 - 85 km NNW of San Antonio, Puerto Rico POINT Z (-67.25560 19.24550 10.00000)
3 us6000c2ae 4.90 61 km NE of Hengchun, Taiwan 1601326256666 1601338972309 None https://earthquake.usgs.gov/earthquakes/eventp... https://earthquake.usgs.gov/earthquakes/feed/v... 6.0 3.8 ... ,us, ,dyfi,origin,phase-data, NaN 0.3650 0.73 103.0 mww earthquake M 4.9 - 61 km NE of Hengchun, Taiwan POINT Z (121.10190 22.45120 23.40000)
4 pr2020272011 2.51 3 km S of La Parguera, Puerto Rico 1601315730070 1601320335880 None https://earthquake.usgs.gov/earthquakes/eventp... https://earthquake.usgs.gov/earthquakes/feed/v... NaN NaN ... ,pr, ,origin,phase-data, 19.0 0.0301 0.08 241.0 md earthquake M 2.5 - 3 km S of La Parguera, Puerto Rico POINT Z (-67.05360 17.94080 12.00000)

5 rows × 28 columns

Let's plot them on a map:

In [5]:
fig, ax = plt.subplots(figsize=(10, 10))

# plot the country outline
world = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))
world.to_crs(epsg=3857).plot(ax=ax, facecolor="none", edgecolor="black")

# plot the earthquakes
df.to_crs(epsg=3857).plot(ax=ax, color="crimson")

ax.set_axis_off()

Part 2: GeoServices

A GeoService is a standardized format for returning GeoJSON files over the web.

Documentation http://geoservices.github.io/

OpenDataPhilly provides GeoService API endpoints for the geometry hosted on its platform

In [6]:
# base URL
url = "https://services.arcgis.com/fLeGjb7u4uXqeF9q/arcgis/rest/services/Zipcodes_Poly/FeatureServer/0/"

Small utility package to make querying GeoServices easier: esri2gpd

https://github.com/PhiladelphiaController/esri2gpd

In [7]:
import esri2gpd
In [8]:
zip_codes = esri2gpd.get(url)
In [9]:
zip_codes.head()
Out[9]:
geometry OBJECTID CODE COD Shape__Area Shape__Length
0 POLYGON ((-75.11107 40.04682, -75.11206 40.047... 1 19120 20 9.177970e+07 49921.544063
1 POLYGON ((-75.19227 39.99463, -75.19240 39.994... 2 19121 21 6.959879e+07 39534.887217
2 POLYGON ((-75.15406 39.98601, -75.15494 39.986... 3 19122 22 3.591632e+07 24124.645221
3 POLYGON ((-75.15190 39.97056, -75.15258 39.970... 4 19123 23 3.585175e+07 26421.728982
4 POLYGON ((-75.09660 40.04249, -75.09661 40.042... 5 19124 24 1.448080e+08 63658.770420
In [10]:
zip_codes.crs
Out[10]:
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

Let's plot it

In [11]:
fig, ax = plt.subplots(figsize=(6, 6))
zip_codes.to_crs(epsg=3857).plot(ax=ax, facecolor="none", edgecolor="black")
ax.set_axis_off()

Another useful example: census tracts

https://www.opendataphilly.org/dataset/census-tracts

Part 3: Downloading Philly data from CARTO

  • Philadelphia hosts the majority of its open data in the cloud using CARTO.
  • They provide an API to download the data
  • You can access the API documentation from OpenDataPhilly

Note: the "API documentation" on OpenDataPhilly will link to the documentation for the CARTO database

For example: shooting victims in Philadelphia

https://www.opendataphilly.org/dataset/shooting-victims

https://phl.carto.com/api/v2/sql?q=SELECT+*,+ST_Y(the_geom)+AS+lat,+ST_X(the_geom)+AS+lng+FROM+shootings&filename=shootings&format=csv&skipfields=cartodb_id

  • First line: the base URL for CARTO
  • Second line: the SQL query

SQL Databases

The CARTO databases can be queried using SQL. This allows you to select specific data from the larger database.

CARTO API documentation: https://carto.com/developers/sql-api/

SQL documentation: https://www.postgresql.org/docs/9.1/sql.html

General Query Syntax

SELECT [field names] FROM [table name] WHERE [query]

Let's try it out in Python

We'll use Python's requests library to query the API endpoint with our desired query.

In [12]:
import requests
In [13]:
# the API end point
API_endpoint = "https://phl.carto.com/api/v2/sql"

# the query
query = "SELECT * FROM shootings" # table name is "shootings"

# desired format of the returned features
output_format = 'geojson'

# fields to skip
skipfields = ["cartodb_id"]
In [14]:
# all of our request parameters
params = dict(q=query, format=output_format, skipfields=skipfields)

params
Out[14]:
{'q': 'SELECT * FROM shootings',
 'format': 'geojson',
 'skipfields': ['cartodb_id']}
In [15]:
# make the request
r = requests.get(API_endpoint, params=params)

r
Out[15]:
<Response [200]>
In [16]:
# Get the returned data in JSON format
# This is a dictionary
features = r.json()
In [17]:
# What are the keys?
list(features.keys())
Out[17]:
['type', 'features']
In [18]:
features['type']
Out[18]:
'FeatureCollection'
In [19]:
# Let's look at the first feature
features['features'][0]
Out[19]:
{'type': 'Feature',
 'geometry': {'type': 'Point', 'coordinates': [-75.164986, 39.987479]},
 'properties': {'objectid': 468410,
  'year': 2018,
  'dc_key': '201822058421',
  'code': '0111',
  'date_': '2018-07-27T00:00:00Z',
  'time': '12:39:00',
  'race': 'B',
  'sex': 'M',
  'age': '30',
  'wound': 'multi',
  'officer_involved': 'N',
  'offender_injured': 'N',
  'offender_deceased': 'N',
  'location': '2200 BLOCK UBER ST',
  'latino': 0,
  'point_x': -75.1649863,
  'point_y': 39.98747862,
  'dist': '22',
  'inside': 0,
  'outside': 1,
  'fatal': 1}}

Use the GeoDataFrame.from_features() function to create a GeoDataFrame

In [20]:
shootings = gpd.GeoDataFrame.from_features(features, crs="EPSG:4326")
In [21]:
shootings.head()
Out[21]:
geometry objectid year dc_key code date_ time race sex age ... offender_injured offender_deceased location latino point_x point_y dist inside outside fatal
0 POINT (-75.16499 39.98748) 468410 2018 201822058421 0111 2018-07-27T00:00:00Z 12:39:00 B M 30 ... N N 2200 BLOCK UBER ST 0.0 -75.164986 39.987479 22 0.0 1.0 1.0
1 POINT (-75.13862 40.00885) 468411 2018 201825058365 0111 2018-07-27T00:00:00Z 23:12:00 B M 34 ... N N 600 BLOCK RISING SUN AV 0.0 -75.138621 40.008847 25 0.0 1.0 1.0
2 POINT (-75.15624 39.99421) 468412 2018 201839052180 0111 2018-07-27T00:00:00Z 00:39:00 B M 32 ... N N 2700 BLOCK N 15TH ST 0.0 -75.156238 39.994207 39 0.0 1.0 1.0
3 POINT (-75.24086 39.92980) 468413 2018 201812055743 0111 2018-07-31T00:00:00Z 13:25:00 B M 24 ... N N 1700 BLOCK S Avondale St 0.0 -75.240856 39.929799 12 0.0 1.0 1.0
4 POINT (-75.15011 39.99659) 468414 2018 201825059719 0111 2018-08-01T00:00:00Z 20:09:00 B M 17 ... N N 2900 BLOCK N 12th St 0.0 -75.150115 39.996590 25 0.0 1.0 1.0

5 rows × 22 columns

At home exercise: Visualizing shootings data

Step 1: Prep the data

  • Drop rows where the geometry is NaN
  • Convert to a better CRS (e.g., 3857)
In [22]:
# make sure we remove missing geometries
shootings = shootings.dropna(subset=['geometry'])

# convert to a better CRS
shootings = shootings.to_crs(epsg=3857)

Step 2: Plot the points

A quick plot with geopandas to show the shootings as points

In [23]:
fig, ax = plt.subplots(figsize=(6, 6))

# ZIP codes
zip_codes.to_crs(epsg=3857).plot(ax=ax, facecolor="none", edgecolor="black")

# Shootings
shootings.plot(ax=ax, color="crimson")
ax.set_axis_off()

Step 3: Make a (more useful) hex bin map

In [24]:
# initialize the axes
fig, ax = plt.subplots(figsize=(10, 10), facecolor=plt.get_cmap('viridis')(0))

# convert to Web Mercator and plot the hexbins 
x = shootings.geometry.x
y = shootings.geometry.y
ax.hexbin(x, y, gridsize=40, mincnt=1, cmap='viridis')

# overlay the ZIP codes
zip_codes.to_crs(epsg=3857).plot(ax=ax, 
                                 facecolor='none', 
                                 linewidth=0.5,
                                 edgecolor='white')

ax.set_axis_off()

Count the total number of rows in a table

The COUNT function can be applied to count all rows.

In [25]:
query = "SELECT COUNT(*) FROM shootings"
In [26]:
params = dict(q=query)
r = requests.get(API_endpoint, params=params)
In [27]:
r.json()
Out[27]:
{'rows': [{'count': 8238}],
 'time': 0.002,
 'fields': {'count': {'type': 'number', 'pgtype': 'int8'}},
 'total_rows': 1}

Important: always good to check how many rows you might be downloading before hand.

Select all columns, limiting the total number returned

The LIMIT function limits the number of returned rows. It is very useful for taking a quick look at the format of a database.

In [28]:
# select the first 5
query = "SELECT * FROM shootings LIMIT 5"
In [29]:
params = dict(q=query, format="geojson")
r = requests.get(API_endpoint, params=params)
In [30]:
df = gpd.GeoDataFrame.from_features(r.json(), crs="EPSG:4326")
df
Out[30]:
geometry cartodb_id objectid year dc_key code date_ time race sex ... offender_injured offender_deceased location latino point_x point_y dist inside outside fatal
0 POINT (-75.16499 39.98748) 1 468410 2018 201822058421 0111 2018-07-27T00:00:00Z 12:39:00 B M ... N N 2200 BLOCK UBER ST 0 -75.164986 39.987479 22 0 1 1
1 POINT (-75.13862 40.00885) 2 468411 2018 201825058365 0111 2018-07-27T00:00:00Z 23:12:00 B M ... N N 600 BLOCK RISING SUN AV 0 -75.138621 40.008847 25 0 1 1
2 POINT (-75.15624 39.99421) 3 468412 2018 201839052180 0111 2018-07-27T00:00:00Z 00:39:00 B M ... N N 2700 BLOCK N 15TH ST 0 -75.156238 39.994207 39 0 1 1
3 POINT (-75.24086 39.92980) 4 468413 2018 201812055743 0111 2018-07-31T00:00:00Z 13:25:00 B M ... N N 1700 BLOCK S Avondale St 0 -75.240856 39.929799 12 0 1 1
4 POINT (-75.15011 39.99659) 5 468414 2018 201825059719 0111 2018-08-01T00:00:00Z 20:09:00 B M ... N N 2900 BLOCK N 12th St 0 -75.150115 39.996590 25 0 1 1

5 rows × 23 columns

Select by specific column values

In [31]:
shootings.head()
Out[31]:
geometry objectid year dc_key code date_ time race sex age ... offender_injured offender_deceased location latino point_x point_y dist inside outside fatal
0 POINT (-8367327.967 4864122.929) 468410 2018 201822058421 0111 2018-07-27T00:00:00Z 12:39:00 B M 30 ... N N 2200 BLOCK UBER ST 0.0 -75.164986 39.987479 22 0.0 1.0 1.0
1 POINT (-8364393.029 4867227.985) 468411 2018 201825058365 0111 2018-07-27T00:00:00Z 23:12:00 B M 34 ... N N 600 BLOCK RISING SUN AV 0.0 -75.138621 40.008847 25 0.0 1.0 1.0
2 POINT (-8366354.144 4865100.492) 468412 2018 201839052180 0111 2018-07-27T00:00:00Z 00:39:00 B M 32 ... N N 2700 BLOCK N 15TH ST 0.0 -75.156238 39.994207 39 0.0 1.0 1.0
3 POINT (-8375773.777 4855746.099) 468413 2018 201812055743 0111 2018-07-31T00:00:00Z 13:25:00 B M 24 ... N N 1700 BLOCK S Avondale St 0.0 -75.240856 39.929799 12 0.0 1.0 1.0
4 POINT (-8365672.535 4865446.760) 468414 2018 201825059719 0111 2018-08-01T00:00:00Z 20:09:00 B M 17 ... N N 2900 BLOCK N 12th St 0.0 -75.150115 39.996590 25 0.0 1.0 1.0

5 rows × 22 columns

Select fatal shootings only

In [32]:
query = "SELECT * FROM shootings WHERE fatal = 1" 
In [33]:
# Make the request
params = dict(q=query, format="geojson")
r = requests.get(API_endpoint, params=params)

# Make the GeoDataFrame
fatal = gpd.GeoDataFrame.from_features(r.json(), crs="EPSG:4326")
print("number of fatal shootings = ", len(fatal))
number of fatal shootings =  1589

Select shootings in 2020

In [34]:
query = "SELECT * FROM shootings WHERE date_ > '1/1/20'"
In [35]:
# Make the request
params = dict(q=query, format="geojson")
r = requests.get(API_endpoint, params=params)

# Make the GeoDataFrame
this_year = gpd.GeoDataFrame.from_features(r.json(), crs="EPSG:4326")
print("number of shootings this year = ", len(this_year))
number of shootings this year =  1553

The easier way: carto2gpd

  • A small utility library to simplify the querying of CARTO APIs.
  • The get() function will query the database
  • The get_size() function will use COUNT() to get the total number of rows

https://github.com/PhiladelphiaController/carto2gpd

In [36]:
import carto2gpd
In [37]:
where = "date_ > '1/1/20' and fatal = 1.0"
df = carto2gpd.get(API_endpoint, 'shootings', where=where)
In [38]:
df.head()
Out[38]:
geometry cartodb_id objectid year dc_key code date_ time race sex ... offender_injured offender_deceased location latino point_x point_y dist inside outside fatal
0 POINT (-75.11775 39.99618) 27 468671 2020 202024041667 0111 2020-06-05T00:00:00Z 00:21:00 B M ... N N 600 BLOCK E LIPPINCOTT ST 0 -75.117747 39.996185 24 0 1 1
1 POINT (-75.12955 39.99694) 28 468672 2020 202024041828 0111 2020-06-05T00:00:00Z 20:19:00 B M ... N N 100 BLOCK E CLEARFIELD ST 0 -75.129551 39.996944 24 0 1 1
2 POINT (-75.20045 39.96783) 29 468673 2020 202016022790 0111 2020-06-07T00:00:00Z 13:20:00 B M ... N N 800 BLOCK N 39TH ST 0 -75.200450 39.967828 16 0 1 1
3 POINT (-75.22831 39.95989) 30 468674 2020 202018044208 0111 2020-06-07T00:00:00Z 11:46:00 B M ... N N 0 BLOCK S RUBY ST 0 -75.228312 39.959892 18 1 0 1
4 POINT (-75.17188 39.98360) 31 468675 2020 202022040610 0111 2020-06-07T00:00:00Z 14:43:00 B M ... N N 2300 BLOCK W BERKS ST 0 -75.171875 39.983597 22 0 1 1

5 rows × 23 columns

In [39]:
# Limit results to the first 5
df = carto2gpd.get(API_endpoint, 'shootings', limit=5)
In [40]:
df
Out[40]:
geometry cartodb_id objectid year dc_key code date_ time race sex ... offender_injured offender_deceased location latino point_x point_y dist inside outside fatal
0 POINT (-75.16499 39.98748) 1 468410 2018 201822058421 0111 2018-07-27T00:00:00Z 12:39:00 B M ... N N 2200 BLOCK UBER ST 0 -75.164986 39.987479 22 0 1 1
1 POINT (-75.13862 40.00885) 2 468411 2018 201825058365 0111 2018-07-27T00:00:00Z 23:12:00 B M ... N N 600 BLOCK RISING SUN AV 0 -75.138621 40.008847 25 0 1 1
2 POINT (-75.15624 39.99421) 3 468412 2018 201839052180 0111 2018-07-27T00:00:00Z 00:39:00 B M ... N N 2700 BLOCK N 15TH ST 0 -75.156238 39.994207 39 0 1 1
3 POINT (-75.24086 39.92980) 4 468413 2018 201812055743 0111 2018-07-31T00:00:00Z 13:25:00 B M ... N N 1700 BLOCK S Avondale St 0 -75.240856 39.929799 12 0 1 1
4 POINT (-75.15011 39.99659) 5 468414 2018 201825059719 0111 2018-08-01T00:00:00Z 20:09:00 B M ... N N 2900 BLOCK N 12th St 0 -75.150115 39.996590 25 0 1 1

5 rows × 23 columns

In [41]:
size = carto2gpd.get_size(API_endpoint, 'shootings')
print(size)
8238

Step 1:

  • Convert the date column to DateTime objects
  • Add Month and Day of Week columns
In [42]:
shootings['date'] = pd.to_datetime(shootings['date_'])
In [43]:
shootings["Month"] = shootings["date"].dt.month
shootings["Day of Week"] = shootings["date"].dt.dayofweek # Monday is 0, Sunday is 6

Step 2: Calculate number of shootings by month and day of week

Use the familiar Groupby --> size()

In [44]:
count = shootings.groupby(['Month', 'Day of Week']).size()
count = count.reset_index(name='Count')
count.head()
Out[44]:
Month Day of Week Count
0 1 0 62
1 1 1 82
2 1 2 69
3 1 3 75
4 1 4 74

Step 3: Make a heatmap using hvplot

In [45]:
# Remember 0 is Monday and 6 is Sunday
count.hvplot.heatmap(
    x="Day of Week",
    y="Month",
    C="Count",
    cmap="viridis",
    width=400,
    height=500,
    flip_yaxis=True,
)
/Users/nhand/opt/miniconda3/envs/musa-550-fall-2020/lib/python3.7/site-packages/holoviews/plotting/util.py:685: MatplotlibDeprecationWarning: The global colormaps dictionary is no longer considered public API.
  [cmap for cmap in cm.cmap_d if not
/Users/nhand/opt/miniconda3/envs/musa-550-fall-2020/lib/python3.7/site-packages/holoviews/plotting/util.py:685: MatplotlibDeprecationWarning: The global colormaps dictionary is no longer considered public API.
  [cmap for cmap in cm.cmap_d if not
Out[45]:

Trends: more shootings on the weekends and in the summer months

API Example #1: Census

US Census data is foundational

  • Rich data sets with annual releases
  • Decennial results plus American Community Survey (ACS) results
  • Wide range of topics covered: sex, income, poverty, education, housing

Getting census data (the old way)
American Factfinder

Getting census data (the new way)
census.data.gov

Getting census data: other options

Several 3rd party options with easier interfaces for accessing census data

Data USA

Census Reporter:

NHGIS

Example: poverty data

Returns JSON

Screen%20Shot%202020-09-26%20at%2010.28.06%20PM.png

How to find the right variable names?

The census provides web-based documentation:

Accessing the API is easier from Python

Several packages provide easier Python interfaces to census data based on the census API.

We'll focus on cenpy - "Explore and download data from Census APIs"

Example: the racial "dot" map

Source

Let's make this for Philadelphia in Python!

In [46]:
# First step: import cenpy
import cenpy

The "explorer" module

Functions to help you explore the Census API from Python

Step 1: Identify what dataset we want

  • Today, we'll use the 5-year American Community Survey (latest available year: 2018)
  • Other common datasets:
    • 1-year ACS datasets as well (latest available year: 2019)
    • 10-year decennial survey (latest available year: 2010)
In [47]:
available = cenpy.explorer.available()

available.head()
Out[47]:
c_isTimeseries publisher temporal spatial programCode modified keyword contactPoint distribution description ... c_isCube c_isAggregate c_valuesLink c_groupsLink c_examplesLink c_tagsLink c_variablesLink c_geographyLink c_dataset vintage
ABSCB2017 NaN U.S. Census Bureau unidentified NaN 006:007 2020-04-30 00:00:00.0 () {'fn': 'ASE Staff', 'hasEmail': 'mailto:erd.an... {'@type': 'dcat:Distribution', 'accessURL': 'h... The Annual Business Survey (ABS) provides info... ... NaN True https://api.census.gov/data/2017/abscb/values.... https://api.census.gov/data/2017/abscb/groups.... https://api.census.gov/data/2017/abscb/example... https://api.census.gov/data/2017/abscb/tags.json https://api.census.gov/data/2017/abscb/variabl... https://api.census.gov/data/2017/abscb/geograp... (abscb,) 2017.0
ABSCBO2017 NaN U.S. Census Bureau unidentified NaN 006:007 2020-04-30 00:00:00.0 () {'fn': 'ASE Staff', 'hasEmail': 'mailto:erd.an... {'@type': 'dcat:Distribution', 'accessURL': 'h... The Annual Business Survey (ABS) provides info... ... NaN True https://api.census.gov/data/2017/abscbo/values... https://api.census.gov/data/2017/abscbo/groups... https://api.census.gov/data/2017/abscbo/exampl... https://api.census.gov/data/2017/abscbo/tags.json https://api.census.gov/data/2017/abscbo/variab... https://api.census.gov/data/2017/abscbo/geogra... (abscbo,) 2017.0
ABSCS2017 NaN U.S. Census Bureau unidentified NaN 006:007 2020-04-30 00:00:00.0 () {'fn': 'ASE Staff', 'hasEmail': 'mailto:erd.an... {'@type': 'dcat:Distribution', 'accessURL': 'h... The Annual Business Survey (ABS) provides info... ... NaN True https://api.census.gov/data/2017/abscs/values.... https://api.census.gov/data/2017/abscs/groups.... https://api.census.gov/data/2017/abscs/example... https://api.census.gov/data/2017/abscs/tags.json https://api.census.gov/data/2017/abscs/variabl... https://api.census.gov/data/2017/abscs/geograp... (abscs,) 2017.0
ACSCD1132011 NaN U.S. Census Bureau 2011/2011 United States 006:004 2014-10-06 () {'fn': 'American Community Survey Office', 'ha... {'@type': 'dcat:Distribution', 'accessURL': 'h... The American Community Survey (ACS) is a natio... ... NaN True https://api.census.gov/data/2011/acs1/cd113/va... https://api.census.gov/data/2011/acs1/cd113/gr... https://api.census.gov/data/2011/acs1/cd113/ex... https://api.census.gov/data/2011/acs1/cd113/ta... https://api.census.gov/data/2011/acs1/cd113/va... https://api.census.gov/data/2011/acs1/cd113/ge... (acs1, cd113) 2011.0
ACSCD1152015 NaN U.S. Census Bureau 2015/2015 United States 006:004 2017-02-10 () {'fn': 'American Community Survey Office', 'ha... {'@type': 'dcat:Distribution', 'accessURL': 'h... The American Community Survey (ACS) is an ongo... ... NaN True https://api.census.gov/data/2015/acs1/cd115/va... https://api.census.gov/data/2015/acs1/cd115/gr... https://api.census.gov/data/2015/acs1/cd115/ex... https://api.census.gov/data/2015/acs1/cd115/ta... https://api.census.gov/data/2015/acs1/cd115/va... https://api.census.gov/data/2015/acs1/cd115/ge... (acs1, cd115) 2015.0

5 rows × 24 columns

We can use the pandas filter() to search for specific identifiers in the dataframe.

In this case, let's search for the American Community Survey datasets. We'll match index labels using regular expressions.

In particular, we'll search for labels that start with "ACS". In the language of regular expressions, we'll use the "^" to mean "match labels that start with"

For more info on regular expressions, the documentation for the re module is a good place to start.

In [48]:
# Return a dataframe of all datasets that start with "ACS"
# Axis=0 means to filter the index labels!
available.filter(regex="^ACS", axis=0)
Out[48]:
c_isTimeseries publisher temporal spatial programCode modified keyword contactPoint distribution description ... c_isCube c_isAggregate c_valuesLink c_groupsLink c_examplesLink c_tagsLink c_variablesLink c_geographyLink c_dataset vintage
ACSCD1132011 NaN U.S. Census Bureau 2011/2011 United States 006:004 2014-10-06 () {'fn': 'American Community Survey Office', 'ha... {'@type': 'dcat:Distribution', 'accessURL': 'h... The American Community Survey (ACS) is a natio... ... NaN True https://api.census.gov/data/2011/acs1/cd113/va... https://api.census.gov/data/2011/acs1/cd113/gr... https://api.census.gov/data/2011/acs1/cd113/ex... https://api.census.gov/data/2011/acs1/cd113/ta... https://api.census.gov/data/2011/acs1/cd113/va... https://api.census.gov/data/2011/acs1/cd113/ge... (acs1, cd113) 2011.0
ACSCD1152015 NaN U.S. Census Bureau 2015/2015 United States 006:004 2017-02-10 () {'fn': 'American Community Survey Office', 'ha... {'@type': 'dcat:Distribution', 'accessURL': 'h... The American Community Survey (ACS) is an ongo... ... NaN True https://api.census.gov/data/2015/acs1/cd115/va... https://api.census.gov/data/2015/acs1/cd115/gr... https://api.census.gov/data/2015/acs1/cd115/ex... https://api.census.gov/data/2015/acs1/cd115/ta... https://api.census.gov/data/2015/acs1/cd115/va... https://api.census.gov/data/2015/acs1/cd115/ge... (acs1, cd115) 2015.0
ACSCP1Y2010 NaN U.S. Census Bureau unidentified United States 006:004 2018-09-18 00:00:00.0 () {'fn': 'American Community Survey Office', 'ha... {'@type': 'dcat:Distribution', 'accessURL': 'h... The American Community Survey (ACS) is an ongo... ... True True https://api.census.gov/data/2010/acs/acs1/cpro... https://api.census.gov/data/2010/acs/acs1/cpro... https://api.census.gov/data/2010/acs/acs1/cpro... https://api.census.gov/data/2010/acs/acs1/cpro... https://api.census.gov/data/2010/acs/acs1/cpro... https://api.census.gov/data/2010/acs/acs1/cpro... (acs, acs1, cprofile) 2010.0
ACSCP1Y2011 NaN U.S. Census Bureau unidentified United States 006:004 2018-09-18 00:00:00.0 () {'fn': 'American Community Survey Office', 'ha... {'@type': 'dcat:Distribution', 'accessURL': 'h... The American Community Survey (ACS) is an ongo... ... True True https://api.census.gov/data/2011/acs/acs1/cpro... https://api.census.gov/data/2011/acs/acs1/cpro... https://api.census.gov/data/2011/acs/acs1/cpro... https://api.census.gov/data/2011/acs/acs1/cpro... https://api.census.gov/data/2011/acs/acs1/cpro... https://api.census.gov/data/2011/acs/acs1/cpro... (acs, acs1, cprofile) 2011.0
ACSCP1Y2012 NaN U.S. Census Bureau unidentified United States 006:004 2018-09-18 00:00:00.0 () {'fn': 'American Community Survey Office', 'ha... {'@type': 'dcat:Distribution', 'accessURL': 'h... The American Community Survey (ACS) is an ongo... ... True True https://api.census.gov/data/2012/acs/acs1/cpro... https://api.census.gov/data/2012/acs/acs1/cpro... https://api.census.gov/data/2012/acs/acs1/cpro... https://api.census.gov/data/2012/acs/acs1/cpro... https://api.census.gov/data/2012/acs/acs1/cpro... https://api.census.gov/data/2012/acs/acs1/cpro... (acs, acs1, cprofile) 2012.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
ACSST5Y2014 NaN U.S. Census Bureau unidentified United States 006:004 2018-06-29 00:00:00.0 () {'fn': 'American Community Survey Office', 'ha... {'@type': 'dcat:Distribution', 'accessURL': 'h... The American Community Survey (ACS) is an ongo... ... True True https://api.census.gov/data/2014/acs/acs5/subj... https://api.census.gov/data/2014/acs/acs5/subj... https://api.census.gov/data/2014/acs/acs5/subj... https://api.census.gov/data/2014/acs/acs5/subj... https://api.census.gov/data/2014/acs/acs5/subj... https://api.census.gov/data/2014/acs/acs5/subj... (acs, acs5, subject) 2014.0
ACSST5Y2015 NaN U.S. Census Bureau unidentified United States 006:004 2018-06-29 00:00:00.0 () {'fn': 'American Community Survey Office', 'ha... {'@type': 'dcat:Distribution', 'accessURL': 'h... The American Community Survey (ACS) is an ongo... ... True True https://api.census.gov/data/2015/acs/acs5/subj... https://api.census.gov/data/2015/acs/acs5/subj... https://api.census.gov/data/2015/acs/acs5/subj... https://api.census.gov/data/2015/acs/acs5/subj... https://api.census.gov/data/2015/acs/acs5/subj... https://api.census.gov/data/2015/acs/acs5/subj... (acs, acs5, subject) 2015.0
ACSST5Y2016 NaN U.S. Census Bureau unidentified United States 006:004 2017-12-07 () {'fn': 'American Community Survey Office', 'ha... {'@type': 'dcat:Distribution', 'accessURL': 'h... The American Community Survey (ACS) is an ongo... ... True True https://api.census.gov/data/2016/acs/acs5/subj... https://api.census.gov/data/2016/acs/acs5/subj... https://api.census.gov/data/2016/acs/acs5/subj... https://api.census.gov/data/2016/acs/acs5/subj... https://api.census.gov/data/2016/acs/acs5/subj... https://api.census.gov/data/2016/acs/acs5/subj... (acs, acs5, subject) 2016.0
ACSST5Y2017 NaN U.S. Census Bureau unidentified NaN 006:004 2018-10-19 00:00:00.0 () {'fn': 'American Community Survey Office', 'ha... {'@type': 'dcat:Distribution', 'accessURL': 'h... The American Community Survey (ACS) is an ongo... ... True True https://api.census.gov/data/2017/acs/acs5/subj... https://api.census.gov/data/2017/acs/acs5/subj... https://api.census.gov/data/2017/acs/acs5/subj... https://api.census.gov/data/2017/acs/acs5/subj... https://api.census.gov/data/2017/acs/acs5/subj... https://api.census.gov/data/2017/acs/acs5/subj... (acs, acs5, subject) 2017.0
ACSST5Y2018 NaN U.S. Census Bureau unidentified NaN 006:004 2019-10-22 15:36:29.0 () {'fn': 'American Community Survey Office', 'ha... {'@type': 'dcat:Distribution', 'accessURL': 'h... The American Community Survey (ACS) is an ongo... ... True True https://api.census.gov/data/2018/acs/acs5/subj... https://api.census.gov/data/2018/acs/acs5/subj... https://api.census.gov/data/2018/acs/acs5/subj... https://api.census.gov/data/2018/acs/acs5/subj... https://api.census.gov/data/2018/acs/acs5/subj... https://api.census.gov/data/2018/acs/acs5/subj... (acs, acs5, subject) 2018.0

144 rows × 24 columns

Many flavors of ACS datasets are available — we want to use the detailed tables version, specifically the 5-year survey.

The relevant identifiers start with: "ACSDT5Y".

In [49]:
# Return a dataframe of all datasets that start with "ACSDT5Y"
available.filter(regex="^ACSDT5Y", axis=0)
Out[49]:
c_isTimeseries publisher temporal spatial programCode modified keyword contactPoint distribution description ... c_isCube c_isAggregate c_valuesLink c_groupsLink c_examplesLink c_tagsLink c_variablesLink c_geographyLink c_dataset vintage
ACSDT5Y2009 NaN U.S. Census Bureau unidentified NaN 006:004 2019-08-27 13:11:18.0 () {'fn': 'American Community Survey Office', 'ha... {'@type': 'dcat:Distribution', 'accessURL': 'h... The American Community Survey (ACS) is an ongo... ... True True https://api.census.gov/data/2009/acs/acs5/valu... https://api.census.gov/data/2009/acs/acs5/grou... https://api.census.gov/data/2009/acs/acs5/exam... https://api.census.gov/data/2009/acs/acs5/tags... https://api.census.gov/data/2009/acs/acs5/vari... https://api.census.gov/data/2009/acs/acs5/geog... (acs, acs5) 2009.0
ACSDT5Y2010 NaN U.S. Census Bureau unidentified United States 006:004 2018-07-04 00:00:00.0 () {'fn': 'American Community Survey Office', 'ha... {'@type': 'dcat:Distribution', 'accessURL': 'h... The American Community Survey (ACS) is an ongo... ... True True https://api.census.gov/data/2010/acs/acs5/valu... https://api.census.gov/data/2010/acs/acs5/grou... https://api.census.gov/data/2010/acs/acs5/exam... https://api.census.gov/data/2010/acs/acs5/tags... https://api.census.gov/data/2010/acs/acs5/vari... https://api.census.gov/data/2010/acs/acs5/geog... (acs, acs5) 2010.0
ACSDT5Y2011 NaN U.S. Census Bureau unidentified United States 006:004 2018-07-04 00:00:00.0 () {'fn': 'American Community Survey Office', 'ha... {'@type': 'dcat:Distribution', 'accessURL': 'h... The American Community Survey (ACS) is an ongo... ... True True https://api.census.gov/data/2011/acs/acs5/valu... https://api.census.gov/data/2011/acs/acs5/grou... https://api.census.gov/data/2011/acs/acs5/exam... https://api.census.gov/data/2011/acs/acs5/tags... https://api.census.gov/data/2011/acs/acs5/vari... https://api.census.gov/data/2011/acs/acs5/geog... (acs, acs5) 2011.0
ACSDT5Y2012 NaN U.S. Census Bureau unidentified United States 006:004 2018-07-04 00:00:00.0 () {'fn': 'American Community Survey Office', 'ha... {'@type': 'dcat:Distribution', 'accessURL': 'h... The American Community Survey (ACS) is an ongo... ... True True https://api.census.gov/data/2012/acs/acs5/valu... https://api.census.gov/data/2012/acs/acs5/grou... https://api.census.gov/data/2012/acs/acs5/exam... https://api.census.gov/data/2012/acs/acs5/tags... https://api.census.gov/data/2012/acs/acs5/vari... https://api.census.gov/data/2012/acs/acs5/geog... (acs, acs5) 2012.0
ACSDT5Y2013 NaN U.S. Census Bureau unidentified United States 006:004 2018-07-04 00:00:00.0 () {'fn': 'American Community Survey Office', 'ha... {'@type': 'dcat:Distribution', 'accessURL': 'h... The American Community Survey (ACS) is an ongo... ... True True https://api.census.gov/data/2013/acs/acs5/valu... https://api.census.gov/data/2013/acs/acs5/grou... https://api.census.gov/data/2013/acs/acs5/exam... https://api.census.gov/data/2013/acs/acs5/tags... https://api.census.gov/data/2013/acs/acs5/vari... https://api.census.gov/data/2013/acs/acs5/geog... (acs, acs5) 2013.0
ACSDT5Y2014 NaN U.S. Census Bureau unidentified United States 006:004 2018-07-04 00:00:00.0 () {'fn': 'American Community Survey Office', 'ha... {'@type': 'dcat:Distribution', 'accessURL': 'h... The American Community Survey (ACS) is an ongo... ... True True https://api.census.gov/data/2014/acs/acs5/valu... https://api.census.gov/data/2014/acs/acs5/grou... https://api.census.gov/data/2014/acs/acs5/exam... https://api.census.gov/data/2014/acs/acs5/tags... https://api.census.gov/data/2014/acs/acs5/vari... https://api.census.gov/data/2014/acs/acs5/geog... (acs, acs5) 2014.0
ACSDT5Y2015 NaN U.S. Census Bureau unidentified United States 006:004 2018-07-05 00:00:00.0 () {'fn': 'American Community Survey Office', 'ha... {'@type': 'dcat:Distribution', 'accessURL': 'h... The American Community Survey (ACS) is an ongo... ... True True https://api.census.gov/data/2015/acs/acs5/valu... https://api.census.gov/data/2015/acs/acs5/grou... https://api.census.gov/data/2015/acs/acs5/exam... https://api.census.gov/data/2015/acs/acs5/tags... https://api.census.gov/data/2015/acs/acs5/vari... https://api.census.gov/data/2015/acs/acs5/geog... (acs, acs5) 2015.0
ACSDT5Y2016 NaN U.S. Census Bureau unidentified United States 006:004 2017-12-07 () {'fn': 'American Community Survey Office', 'ha... {'@type': 'dcat:Distribution', 'accessURL': 'h... The American Community Survey (ACS) is an ongo... ... True True https://api.census.gov/data/2016/acs/acs5/valu... https://api.census.gov/data/2016/acs/acs5/grou... https://api.census.gov/data/2016/acs/acs5/exam... https://api.census.gov/data/2016/acs/acs5/tags... https://api.census.gov/data/2016/acs/acs5/vari... https://api.census.gov/data/2016/acs/acs5/geog... (acs, acs5) 2016.0
ACSDT5Y2017 NaN U.S. Census Bureau unidentified NaN 006:004 2018-08-21 07:11:43.0 () {'fn': 'American Community Survey Office', 'ha... {'@type': 'dcat:Distribution', 'accessURL': 'h... The American Community Survey (ACS) is an ongo... ... True True https://api.census.gov/data/2017/acs/acs5/valu... https://api.census.gov/data/2017/acs/acs5/grou... https://api.census.gov/data/2017/acs/acs5/exam... https://api.census.gov/data/2017/acs/acs5/tags... https://api.census.gov/data/2017/acs/acs5/vari... https://api.census.gov/data/2017/acs/acs5/geog... (acs, acs5) 2017.0
ACSDT5Y2018 NaN U.S. Census Bureau unidentified NaN 006:004 2019-10-22 16:28:02.0 () {'fn': 'American Community Survey Office', 'ha... {'@type': 'dcat:Distribution', 'accessURL': 'h... The American Community Survey (ACS) is an ongo... ... True True https://api.census.gov/data/2018/acs/acs5/valu... https://api.census.gov/data/2018/acs/acs5/grou... https://api.census.gov/data/2018/acs/acs5/exam... https://api.census.gov/data/2018/acs/acs5/tags... https://api.census.gov/data/2018/acs/acs5/vari... https://api.census.gov/data/2018/acs/acs5/geog... (acs, acs5) 2018.0
ACSDT5YAIAN2010 NaN U.S. Census Bureau unidentified NaN 006:004 2019-10-24 07:18:57.0 () {'fn': 'American Community Survey Office', 'ha... {'@type': 'dcat:Distribution', 'accessURL': 'h... The American Indian and Alaska Native (AIAN) t... ... True True https://api.census.gov/data/2010/acs/acs5/aian... https://api.census.gov/data/2010/acs/acs5/aian... https://api.census.gov/data/2010/acs/acs5/aian... https://api.census.gov/data/2010/acs/acs5/aian... https://api.census.gov/data/2010/acs/acs5/aian... https://api.census.gov/data/2010/acs/acs5/aian... (acs, acs5, aian) 2010.0
ACSDT5YAIAN2015 NaN U.S. Census Bureau unidentified NaN 006:004 2020-02-13 00:00:00.0 () {'fn': 'American Community Survey Office', 'ha... {'@type': 'dcat:Distribution', 'accessURL': 'h... The American Indian and Alaska Native (AIAN) t... ... True True https://api.census.gov/data/2015/acs/acs5/aian... https://api.census.gov/data/2015/acs/acs5/aian... https://api.census.gov/data/2015/acs/acs5/aian... https://api.census.gov/data/2015/acs/acs5/aian... https://api.census.gov/data/2015/acs/acs5/aian... https://api.census.gov/data/2015/acs/acs5/aian... (acs, acs5, aian) 2015.0
ACSDT5YSPT2010 NaN U.S. Census Bureau unidentified NaN 006:004 2019-10-11 14:16:00.0 () {'fn': 'American Community Survey Office', 'ha... {'@type': 'dcat:Distribution', 'accessURL': 'h... The Selected Population Tables (SPT) are rele... ... True True https://api.census.gov/data/2010/acs/acs5/spt/... https://api.census.gov/data/2010/acs/acs5/spt/... https://api.census.gov/data/2010/acs/acs5/spt/... https://api.census.gov/data/2010/acs/acs5/spt/... https://api.census.gov/data/2010/acs/acs5/spt/... https://api.census.gov/data/2010/acs/acs5/spt/... (acs, acs5, spt) 2010.0
ACSDT5YSPT2015 NaN U.S. Census Bureau unidentified NaN 006:004 2020-02-18 00:00:00.0 () {'fn': 'American Community Survey Office', 'ha... {'@type': 'dcat:Distribution', 'accessURL': 'h... The Selected Population Tables (SPT) are rele... ... True True https://api.census.gov/data/2015/acs/acs5/spt/... https://api.census.gov/data/2015/acs/acs5/spt/... https://api.census.gov/data/2015/acs/acs5/spt/... https://api.census.gov/data/2015/acs/acs5/spt/... https://api.census.gov/data/2015/acs/acs5/spt/... https://api.census.gov/data/2015/acs/acs5/spt/... (acs, acs5, spt) 2015.0

14 rows × 24 columns

Let's use the latest available data (2018). We can use the explain() function to print out a description of the dataset:

In [50]:
cenpy.explorer.explain("ACSDT5Y2018")
Out[50]:
{'American Community Survey: 1-Year Estimates: Detailed Tables 5-Year': 'The American Community Survey (ACS) is an ongoing survey that provides data every year -- giving communities the current information they need to plan investments and services. The ACS covers a broad range of topics about social, economic, demographic, and housing characteristics of the U.S. population.  Summary files include the following geographies: nation, all states (including DC and Puerto Rico), all metropolitan areas, all congressional districts (114th congress), all counties, all places, and all tracts and block groups.  Summary files contain the most detailed cross-tabulations, many of which are published down to block groups. The data are population and housing counts. There are over 64,000 variables in this dataset.'}

Step 2: Initialize the API connection

Use the cenpy.remote.APIConnection object, and pass it the name of the dataset.

In [51]:
acs = cenpy.remote.APIConnection("ACSDT5Y2018")

Step 3: Find the variables we want to load

The .variables attribute stores the available variables (across all Census tables).

We can use the varslike() function to search the variables dataframe (it's just a simple wrapper around the pandas filter() function).

In [52]:
len(acs.variables)
Out[52]:
27037
In [53]:
acs.variables.head(n=10)
Out[53]:
label concept predicateType group limit predicateOnly attributes required
for Census API FIPS 'for' clause Census API Geography Specification fips-for N/A 0 True NaN NaN
in Census API FIPS 'in' clause Census API Geography Specification fips-in N/A 0 True NaN NaN
ucgid Uniform Census Geography Identifier clause Census API Geography Specification ucgid N/A 0 True NaN NaN
B24022_060E Estimate!!Total!!Female!!Service occupations!!... SEX BY OCCUPATION AND MEDIAN EARNINGS IN THE P... int B24022 0 NaN B24022_060M,B24022_060MA,B24022_060EA NaN
B19001B_014E Estimate!!Total!!$100,000 to $124,999 HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 201... int B19001B 0 NaN B19001B_014M,B19001B_014MA,B19001B_014EA NaN
B07007PR_019E Estimate!!Total!!Moved from different municipi... GEOGRAPHICAL MOBILITY IN THE PAST YEAR BY CITI... int B07007PR 0 NaN B07007PR_019EA,B07007PR_019M,B07007PR_019MA NaN
B19101A_004E Estimate!!Total!!$15,000 to $19,999 FAMILY INCOME IN THE PAST 12 MONTHS (IN 2018 I... int B19101A 0 NaN B19101A_004M,B19101A_004MA,B19101A_004EA NaN
B24022_061E Estimate!!Total!!Female!!Service occupations!!... SEX BY OCCUPATION AND MEDIAN EARNINGS IN THE P... int B24022 0 NaN B24022_061M,B24022_061MA,B24022_061EA NaN
B19001B_013E Estimate!!Total!!$75,000 to $99,999 HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 201... int B19001B 0 NaN B19001B_013M,B19001B_013MA,B19001B_013EA NaN
B07007PR_018E Estimate!!Total!!Moved from different municipi... GEOGRAPHICAL MOBILITY IN THE PAST YEAR BY CITI... int B07007PR 0 NaN B07007PR_018EA,B07007PR_018M,B07007PR_018MA NaN

We're interested in variables about hispanic origin broken down by race — let's see if we can find the variables where the "Concept" column starts with "RACE"

In [54]:
acs.varslike?
In [55]:
acs.varslike("HISPANIC OR LATINO ORIGIN BY RACE", by='concept').sort_index() # searches along concept column
Out[55]:
label concept predicateType group limit predicateOnly attributes required
B03002_001E Estimate!!Total HISPANIC OR LATINO ORIGIN BY RACE int B03002 0 NaN B03002_001EA,B03002_001M,B03002_001MA NaN
B03002_002E Estimate!!Total!!Not Hispanic or Latino HISPANIC OR LATINO ORIGIN BY RACE int B03002 0 NaN B03002_002EA,B03002_002M,B03002_002MA NaN
B03002_003E Estimate!!Total!!Not Hispanic or Latino!!White... HISPANIC OR LATINO ORIGIN BY RACE int B03002 0 NaN B03002_003EA,B03002_003M,B03002_003MA NaN
B03002_004E Estimate!!Total!!Not Hispanic or Latino!!Black... HISPANIC OR LATINO ORIGIN BY RACE int B03002 0 NaN B03002_004EA,B03002_004M,B03002_004MA NaN
B03002_005E Estimate!!Total!!Not Hispanic or Latino!!Ameri... HISPANIC OR LATINO ORIGIN BY RACE int B03002 0 NaN B03002_005EA,B03002_005M,B03002_005MA NaN
B03002_006E Estimate!!Total!!Not Hispanic or Latino!!Asian... HISPANIC OR LATINO ORIGIN BY RACE int B03002 0 NaN B03002_006EA,B03002_006M,B03002_006MA NaN
B03002_007E Estimate!!Total!!Not Hispanic or Latino!!Nativ... HISPANIC OR LATINO ORIGIN BY RACE int B03002 0 NaN B03002_007EA,B03002_007M,B03002_007MA NaN
B03002_008E Estimate!!Total!!Not Hispanic or Latino!!Some ... HISPANIC OR LATINO ORIGIN BY RACE int B03002 0 NaN B03002_008EA,B03002_008M,B03002_008MA NaN
B03002_009E Estimate!!Total!!Not Hispanic or Latino!!Two o... HISPANIC OR LATINO ORIGIN BY RACE int B03002 0 NaN B03002_009EA,B03002_009M,B03002_009MA NaN
B03002_010E Estimate!!Total!!Not Hispanic or Latino!!Two o... HISPANIC OR LATINO ORIGIN BY RACE int B03002 0 NaN B03002_010EA,B03002_010M,B03002_010MA NaN
B03002_011E Estimate!!Total!!Not Hispanic or Latino!!Two o... HISPANIC OR LATINO ORIGIN BY RACE int B03002 0 NaN B03002_011EA,B03002_011M,B03002_011MA NaN
B03002_012E Estimate!!Total!!Hispanic or Latino HISPANIC OR LATINO ORIGIN BY RACE int B03002 0 NaN B03002_012EA,B03002_012M,B03002_012MA NaN
B03002_013E Estimate!!Total!!Hispanic or Latino!!White alone HISPANIC OR LATINO ORIGIN BY RACE int B03002 0 NaN B03002_013EA,B03002_013M,B03002_013MA NaN
B03002_014E Estimate!!Total!!Hispanic or Latino!!Black or ... HISPANIC OR LATINO ORIGIN BY RACE int B03002 0 NaN B03002_014EA,B03002_014M,B03002_014MA NaN
B03002_015E Estimate!!Total!!Hispanic or Latino!!American ... HISPANIC OR LATINO ORIGIN BY RACE int B03002 0 NaN B03002_015EA,B03002_015M,B03002_015MA NaN
B03002_016E Estimate!!Total!!Hispanic or Latino!!Asian alone HISPANIC OR LATINO ORIGIN BY RACE int B03002 0 NaN B03002_016EA,B03002_016M,B03002_016MA NaN
B03002_017E Estimate!!Total!!Hispanic or Latino!!Native Ha... HISPANIC OR LATINO ORIGIN BY RACE int B03002 0 NaN B03002_017EA,B03002_017M,B03002_017MA NaN
B03002_018E Estimate!!Total!!Hispanic or Latino!!Some othe... HISPANIC OR LATINO ORIGIN BY RACE int B03002 0 NaN B03002_018EA,B03002_018M,B03002_018MA NaN
B03002_019E Estimate!!Total!!Hispanic or Latino!!Two or mo... HISPANIC OR LATINO ORIGIN BY RACE int B03002 0 NaN B03002_019EA,B03002_019M,B03002_019MA NaN
B03002_020E Estimate!!Total!!Hispanic or Latino!!Two or mo... HISPANIC OR LATINO ORIGIN BY RACE int B03002 0 NaN B03002_020EA,B03002_020M,B03002_020MA NaN
B03002_021E Estimate!!Total!!Hispanic or Latino!!Two or mo... HISPANIC OR LATINO ORIGIN BY RACE int B03002 0 NaN B03002_021EA,B03002_021M,B03002_021MA NaN
GEO_ID Geography AGGREGATE HOUSEHOLD INCOME IN THE PAST 12 MONT... string B17015,B18104,B17016,B18105,B17017,B18106,B170... 0 NaN NAME NaN

It looks like the table we want is "B03002" — we can also easily filter for all variables in this table

In [56]:
variables = [
    "NAME",
    "B03002_001E", # Total
    "B03002_003E", # Not Hispanic, White
    "B03002_004E", # Not Hispanic, Black
    "B03002_005E", # Not Hispanic, American Indian
    "B03002_006E", # Not Hispanic, Asian
    "B03002_007E", # Not Hispanic, Native Hawaiian
    "B03002_008E", # Not Hispanic, Other
    "B03002_009E", # Not Hispanic, Two or More Races
    "B03002_012E", # Hispanic
]

Note: we've also include the "NAME" variable which returns the name of the Census geography we are querying for

Step 4: Identify the geographies to use

The Census API use heirarchy of geographies when requesting data.

For example, you cannot just request data for a specific county — you need to specify the state and the county.

Common hierarchies

  • State --> county
  • State --> place (e.g., cities)
  • State --> county --> tract
  • State --> county --> tract --> block group

Tip: Use the .geographies attribute

This allows you to see:

  1. What geographies are available for a specific dataset
  2. The other required geographies in the heirarchy
In [57]:
acs.geographies['fips']
Out[57]:
name geoLevelDisplay referenceDate requires wildcard optionalWithWCFor
0 us 010 2018-01-01 NaN NaN NaN
1 region 020 2018-01-01 NaN NaN NaN
2 division 030 2018-01-01 NaN NaN NaN
3 state 040 2018-01-01 NaN NaN NaN
4 county 050 2018-01-01 [state] [state] state
... ... ... ... ... ... ...
80 public use microdata area 795 2018-01-01 [state] [state] state
81 zip code tabulation area 860 2018-01-01 NaN NaN NaN
82 school district (elementary) 950 2018-01-01 [state] NaN NaN
83 school district (secondary) 960 2018-01-01 [state] NaN NaN
84 school district (unified) 970 2018-01-01 [state] NaN NaN

85 rows × 6 columns

For the racial dot map, we'll use the most granular available geography: block group.

The hierarchy is: state --> county --> tract --> block group but we can use the * operator for tracts so we'll need to know the FIPS codes for PA and Philadelphia County

In [58]:
counties = cenpy.explorer.fips_table("COUNTY")
counties.head()
Out[58]:
0 1 2 3 4
0 AL 1 1 Autauga County H1
1 AL 1 3 Baldwin County H1
2 AL 1 5 Barbour County H1
3 AL 1 7 Bibb County H1
4 AL 1 9 Blount County H1
In [59]:
# Trim to just Philadelphia
# Search for rows where name contains "Philadelphia"
counties.loc[ counties[3].str.contains("Philadelphia") ]
Out[59]:
0 1 2 3 4
2294 PA 42 101 Philadelphia County H6

For Philadelphia County, the FIPS codes are:

  • Philadelphia County: "101"
  • PA: "42"
In [60]:
philly_county_code = "101"
pa_state_code = "42"

You can also look up FIPS codes on Google! Wikipedia is usually a trustworthy source...

Step 5: Build the query (finally)

We'll use the .query() function, which takes the following arguments:

  1. cols - the list of variables desired from the dataset
  2. geo_unit - string denoting the smallest geographic unit; syntax is "name:FIPS"
  3. geo_filter - dictionary containing groupings of geo_units, if required by the hierarchy
In [61]:
philly_demo_data = acs.query(
    cols=variables,
    geo_unit="block group:*",
    geo_filter={"state": pa_state_code, 
                "county": philly_county_code, 
                "tract": "*"},
)


philly_demo_data.head()
Out[61]:
NAME B03002_001E B03002_003E B03002_004E B03002_005E B03002_006E B03002_007E B03002_008E B03002_009E B03002_012E state county tract block group
0 Block Group 3, Census Tract 288, Philadelphia ... 1487 218 161 0 304 0 0 163 641 42 101 028800 3
1 Block Group 1, Census Tract 288, Philadelphia ... 1912 57 298 0 0 0 0 11 1546 42 101 028800 1
2 Block Group 2, Census Tract 298, Philadelphia ... 801 155 487 0 40 0 0 2 117 42 101 029800 2
3 Block Group 4, Census Tract 298, Philadelphia ... 1337 380 415 0 21 0 0 0 521 42 101 029800 4
4 Block Group 3, Census Tract 298, Philadelphia ... 519 186 268 0 24 0 0 10 31 42 101 029800 3

Important: data is returned as strings rather than numeric values

In [62]:
for variable in variables:
    
    # Convert all variables EXCEPT for NAME
    if variable != "NAME":
        philly_demo_data[variable] = philly_demo_data[variable].astype(float)

What if we mess up the geographic hierarchy?

If you forget to include required parts of the geography heirarchy, you'll get an error!

In [63]:
acs.query(
    cols=variables,
    geo_unit="block group:*",
    geo_filter={"state": pa_state_code},
)
---------------------------------------------------------------------------
JSONDecodeError                           Traceback (most recent call last)
~/opt/miniconda3/envs/musa-550-fall-2020/lib/python3.7/site-packages/cenpy/remote.py in query(self, cols, geo_unit, geo_filter, apikey, **kwargs)
    218         try:
--> 219             json_content = res.json()
    220             df = pd.DataFrame().from_records(json_content[1:], columns=json_content[0])

~/opt/miniconda3/envs/musa-550-fall-2020/lib/python3.7/site-packages/requests/models.py in json(self, **kwargs)
    897                     pass
--> 898         return complexjson.loads(self.text, **kwargs)
    899 

~/opt/miniconda3/envs/musa-550-fall-2020/lib/python3.7/site-packages/simplejson/__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, use_decimal, **kw)
    524             and not use_decimal and not kw):
--> 525         return _default_decoder.decode(s)
    526     if cls is None:

~/opt/miniconda3/envs/musa-550-fall-2020/lib/python3.7/site-packages/simplejson/decoder.py in decode(self, s, _w, _PY3)
    369             s = str(s, self.encoding)
--> 370         obj, end = self.raw_decode(s)
    371         end = _w(s, end).end()

~/opt/miniconda3/envs/musa-550-fall-2020/lib/python3.7/site-packages/simplejson/decoder.py in raw_decode(self, s, idx, _w, _PY3)
    399                 idx += 3
--> 400         return self.scan_once(s, idx=_w(s, idx).end())

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

HTTPError                                 Traceback (most recent call last)
<ipython-input-63-a4d77471a08c> in <module>
      2     cols=variables,
      3     geo_unit="block group:*",
----> 4     geo_filter={"state": pa_state_code},
      5 )

~/opt/miniconda3/envs/musa-550-fall-2020/lib/python3.7/site-packages/cenpy/remote.py in query(self, cols, geo_unit, geo_filter, apikey, **kwargs)
    228             if res.status_code == 400:
    229                 raise r.HTTPError(
--> 230                     "400 " + "\n".join(map(lambda x: x.decode(), res.iter_lines()))
    231                 )
    232             else:

HTTPError: 400 error: unknown/unsupported geography heirarchy

We need the block group geometries too!

cenpy includes an interface to the Census' [Tiger] shapefile database.

In [64]:
cenpy.tiger.available()
Out[64]:
[{'name': 'AIANNHA', 'type': 'MapServer'},
 {'name': 'CBSA', 'type': 'MapServer'},
 {'name': 'Hydro', 'type': 'MapServer'},
 {'name': 'Labels', 'type': 'MapServer'},
 {'name': 'Legislative', 'type': 'MapServer'},
 {'name': 'Places_CouSub_ConCity_SubMCD', 'type': 'MapServer'},
 {'name': 'PUMA_TAD_TAZ_UGA_ZCTA', 'type': 'MapServer'},
 {'name': 'Region_Division', 'type': 'MapServer'},
 {'name': 'School', 'type': 'MapServer'},
 {'name': 'Special_Land_Use_Areas', 'type': 'MapServer'},
 {'name': 'State_County', 'type': 'MapServer'},
 {'name': 'tigerWMS_ACS2012', 'type': 'MapServer'},
 {'name': 'tigerWMS_ACS2013', 'type': 'MapServer'},
 {'name': 'tigerWMS_ACS2014', 'type': 'MapServer'},
 {'name': 'tigerWMS_ACS2015', 'type': 'MapServer'},
 {'name': 'tigerWMS_ACS2016', 'type': 'MapServer'},
 {'name': 'tigerWMS_ACS2017', 'type': 'MapServer'},
 {'name': 'tigerWMS_ACS2018', 'type': 'MapServer'},
 {'name': 'tigerWMS_ACS2019', 'type': 'MapServer'},
 {'name': 'tigerWMS_Census2010', 'type': 'MapServer'},
 {'name': 'tigerWMS_Current', 'type': 'MapServer'},
 {'name': 'tigerWMS_ECON2012', 'type': 'MapServer'},
 {'name': 'tigerWMS_PhysicalFeatures', 'type': 'MapServer'},
 {'name': 'Tracts_Blocks', 'type': 'MapServer'},
 {'name': 'Transportation', 'type': 'MapServer'},
 {'name': 'TribalTracts', 'type': 'MapServer'},
 {'name': 'Urban', 'type': 'MapServer'},
 {'name': 'USLandmass', 'type': 'MapServer'}]

Set the ACS2018 database as the desired GeoService

In [65]:
acs.set_mapservice("tigerWMS_ACS2018")
Out[65]:
Connection to American Community Survey: 1-Year Estimates: Detailed Tables 5-Year(ID: https://api.census.gov/data/id/ACSDT5Y2018)
With MapServer: Census Current (2018) WMS

The map service has many different layers — select the layer for our desired geography

In [66]:
acs.mapservice.layers
Out[66]:
[(ESRILayer) 2010 Census Public Use Microdata Areas,
 (ESRILayer) 2010 Census Public Use Microdata Areas Labels,
 (ESRILayer) 2010 Census ZIP Code Tabulation Areas,
 (ESRILayer) 2010 Census ZIP Code Tabulation Areas Labels,
 (ESRILayer) Tribal Census Tracts,
 (ESRILayer) Tribal Census Tracts Labels,
 (ESRILayer) Tribal Block Groups,
 (ESRILayer) Tribal Block Groups Labels,
 (ESRILayer) Census Tracts,
 (ESRILayer) Census Tracts Labels,
 (ESRILayer) Census Block Groups,
 (ESRILayer) Census Block Groups Labels,
 (ESRILayer) 2010 Census Blocks,
 (ESRILayer) 2010 Census Blocks Labels,
 (ESRILayer) Unified School Districts,
 (ESRILayer) Unified School Districts Labels,
 (ESRILayer) Secondary School Districts,
 (ESRILayer) Secondary School Districts Labels,
 (ESRILayer) Elementary School Districts,
 (ESRILayer) Elementary School Districts Labels,
 (ESRILayer) Estates,
 (ESRILayer) Estates Labels,
 (ESRILayer) County Subdivisions,
 (ESRILayer) County Subdivisions Labels,
 (ESRILayer) Subbarrios,
 (ESRILayer) Subbarrios Labels,
 (ESRILayer) Consolidated Cities,
 (ESRILayer) Consolidated Cities Labels,
 (ESRILayer) Incorporated Places,
 (ESRILayer) Incorporated Places Labels,
 (ESRILayer) Census Designated Places,
 (ESRILayer) Census Designated Places Labels,
 (ESRILayer) Alaska Native Regional Corporations,
 (ESRILayer) Alaska Native Regional Corporations Labels,
 (ESRILayer) Tribal Subdivisions,
 (ESRILayer) Tribal Subdivisions Labels,
 (ESRILayer) Federal American Indian Reservations,
 (ESRILayer) Federal American Indian Reservations Labels,
 (ESRILayer) Off-Reservation Trust Lands,
 (ESRILayer) Off-Reservation Trust Lands Labels,
 (ESRILayer) State American Indian Reservations,
 (ESRILayer) State American Indian Reservations Labels,
 (ESRILayer) Hawaiian Home Lands,
 (ESRILayer) Hawaiian Home Lands Labels,
 (ESRILayer) Alaska Native Village Statistical Areas,
 (ESRILayer) Alaska Native Village Statistical Areas Labels,
 (ESRILayer) Oklahoma Tribal Statistical Areas,
 (ESRILayer) Oklahoma Tribal Statistical Areas Labels,
 (ESRILayer) State Designated Tribal Statistical Areas,
 (ESRILayer) State Designated Tribal Statistical Areas Labels,
 (ESRILayer) Tribal Designated Statistical Areas,
 (ESRILayer) Tribal Designated Statistical Areas Labels,
 (ESRILayer) American Indian Joint-Use Areas,
 (ESRILayer) American Indian Joint-Use Areas Labels,
 (ESRILayer) 116th Congressional Districts,
 (ESRILayer) 116th Congressional Districts Labels,
 (ESRILayer) 2018 State Legislative Districts - Upper,
 (ESRILayer) 2018 State Legislative Districts - Upper Labels,
 (ESRILayer) 2018 State Legislative Districts - Lower,
 (ESRILayer) 2018 State Legislative Districts - Lower Labels,
 (ESRILayer) Census Divisions,
 (ESRILayer) Census Divisions Labels,
 (ESRILayer) Census Regions,
 (ESRILayer) Census Regions Labels,
 (ESRILayer) 2010 Census Urbanized Areas,
 (ESRILayer) 2010 Census Urbanized Areas Labels,
 (ESRILayer) 2010 Census Urban Clusters,
 (ESRILayer) 2010 Census Urban Clusters Labels,
 (ESRILayer) Combined New England City and Town Areas,
 (ESRILayer) Combined New England City and Town Areas Labels,
 (ESRILayer) New England City and Town Area Divisions,
 (ESRILayer) New England City and Town Area  Divisions Labels,
 (ESRILayer) Metropolitan New England City and Town Areas,
 (ESRILayer) Metropolitan New England City and Town Areas Labels,
 (ESRILayer) Micropolitan New England City and Town Areas,
 (ESRILayer) Micropolitan New England City and Town Areas Labels,
 (ESRILayer) Combined Statistical Areas,
 (ESRILayer) Combined Statistical Areas Labels,
 (ESRILayer) Metropolitan Divisions,
 (ESRILayer) Metropolitan Divisions Labels,
 (ESRILayer) Metropolitan Statistical Areas,
 (ESRILayer) Metropolitan Statistical Areas Labels,
 (ESRILayer) Micropolitan Statistical Areas,
 (ESRILayer) Micropolitan Statistical Areas Labels,
 (ESRILayer) States,
 (ESRILayer) States Labels,
 (ESRILayer) Counties,
 (ESRILayer) Counties Labels]
In [67]:
acs.mapservice.layers[10]
Out[67]:
(ESRILayer) Census Block Groups

Query the service using the .query() function.

We can use a SQL command to request census tracts only for Philadelphia county by specifying the other geographies in the hierachy (in this case, state and county)

In [68]:
acs.mapservice.layers[10].variables
Out[68]:
name type alias length domain
0 MTFCC esriFieldTypeString MTFCC 5.0 None
1 OID esriFieldTypeDouble OID NaN None
2 GEOID esriFieldTypeString GEOID 12.0 None
3 STATE esriFieldTypeString STATE 2.0 None
4 COUNTY esriFieldTypeString COUNTY 3.0 None
5 TRACT esriFieldTypeString TRACT 6.0 None
6 BLKGRP esriFieldTypeString BLKGRP 1.0 None
7 BASENAME esriFieldTypeString BASENAME 100.0 None
8 NAME esriFieldTypeString NAME 100.0 None
9 LSADC esriFieldTypeString LSADC 2.0 None
10 FUNCSTAT esriFieldTypeString FUNCSTAT 1.0 None
11 AREALAND esriFieldTypeDouble AREALAND NaN None
12 AREAWATER esriFieldTypeDouble AREAWATER NaN None
13 STGEOMETRY esriFieldTypeGeometry STGEOMETRY NaN None
14 CENTLAT esriFieldTypeString CENTLAT 11.0 None
15 CENTLON esriFieldTypeString CENTLON 12.0 None
16 INTPTLAT esriFieldTypeString INTPTLAT 11.0 None
17 INTPTLON esriFieldTypeString INTPTLON 12.0 None
18 OBJECTID esriFieldTypeOID OBJECTID NaN None
19 STGEOMETRY.AREA esriFieldTypeDouble STGEOMETRY.AREA NaN None
20 STGEOMETRY.LEN esriFieldTypeDouble STGEOMETRY.LEN NaN None
In [69]:
# Use SQL to return geometries only for Philadelphia County in PA
where_clause = f"STATE = {pa_state_code} AND COUNTY = {philly_county_code}"

# Query for block groups
philly_block_groups = acs.mapservice.layers[10].query(where=where_clause)
/Users/nhand/opt/miniconda3/envs/musa-550-fall-2020/lib/python3.7/site-packages/pyproj/crs/crs.py:53: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6
  return _prepare_from_string(" ".join(pjargs))
In [70]:
philly_block_groups.head()
Out[70]:
MTFCC OID GEOID STATE COUNTY TRACT BLKGRP BASENAME NAME LSADC ... AREALAND AREAWATER CENTLAT CENTLON INTPTLAT INTPTLON OBJECTID STGEOMETRY.AREA STGEOMETRY.LEN geometry
0 G5030 20858508931877 421010355002 42 101 035500 2 2 Block Group 2 BG ... 537129 0 +40.0853340 -075.0307522 +40.0853340 -075.0307522 1463 9.186724e+05 4215.789689 POLYGON ((-8353106.345 4878406.706, -8353053.5...
1 G5030 20858508931779 421010355003 42 101 035500 3 3 Block Group 3 BG ... 1046315 1067 +40.0924064 -075.0301624 +40.0929757 -075.0294999 1464 1.791745e+06 8235.537680 POLYGON ((-8353580.232 4878642.128, -8353218.1...
2 G5030 20858508932434 421010356011 42 101 035601 1 1 Block Group 1 BG ... 846947 8936 +40.0931910 -075.0389771 +40.0923618 -075.0369260 1466 1.464185e+06 6290.193745 POLYGON ((-8354256.944 4879308.848, -8354227.1...
3 G5030 20858508932102 421010356022 42 101 035602 2 2 Block Group 2 BG ... 859939 713 +40.1070708 -075.0438735 +40.1087070 -075.0462160 1467 1.472942e+06 5987.640635 POLYGON ((-8355008.796 4881749.699, -8354858.9...
4 G5030 208583717019680 421010386001 42 101 038600 1 1 Block Group 1 BG ... 3310650 32074 +40.0580462 -075.2103410 +40.0582762 -075.2110913 1883 5.712645e+06 15897.046276 POLYGON ((-8373602.156 4876473.490, -8373704.5...

5 rows × 21 columns

Merge the demographic data with geometries

Merge based on multiple columns: state, county, tract, and block group IDs.

The relevant columns are:

  • "STATE", "COUNTY", "TRACT", "BLKGRP" in the spatial data
  • "state", "county", "tract", "block group" in the non-spatial data
In [71]:
philly_demo_final = philly_block_groups.merge(
    philly_demo_data,
    left_on=["STATE", "COUNTY", "TRACT", "BLKGRP"],
    right_on=["state", "county", "tract", "block group"],
)
In [72]:
philly_demo_final.head()
Out[72]:
MTFCC OID GEOID STATE COUNTY TRACT BLKGRP BASENAME NAME_x LSADC ... B03002_005E B03002_006E B03002_007E B03002_008E B03002_009E B03002_012E state county tract block group
0 G5030 20858508931877 421010355002 42 101 035500 2 2 Block Group 2 BG ... 0.0 82.0 0.0 0.0 107.0 114.0 42 101 035500 2
1 G5030 20858508931779 421010355003 42 101 035500 3 3 Block Group 3 BG ... 38.0 431.0 0.0 0.0 124.0 235.0 42 101 035500 3
2 G5030 20858508932434 421010356011 42 101 035601 1 1 Block Group 1 BG ... 8.0 369.0 0.0 0.0 49.0 45.0 42 101 035601 1
3 G5030 20858508932102 421010356022 42 101 035602 2 2 Block Group 2 BG ... 0.0 410.0 0.0 0.0 9.0 172.0 42 101 035602 2
4 G5030 208583717019680 421010386001 42 101 038600 1 1 Block Group 1 BG ... 0.0 60.0 0.0 5.0 32.0 56.0 42 101 038600 1

5 rows × 35 columns

Plot it to make sure it makes sense!

Using geopandas...

In [73]:
# Check the CRS
philly_demo_final.crs
Out[73]:
<Projected CRS: EPSG:3857>
Name: WGS 84 / Pseudo-Mercator
Axis Info [cartesian]:
- E[east]: Easting (metre)
- N[north]: Northing (metre)
Area of Use:
- name: World - 85°S to 85°N
- bounds: (-180.0, -85.06, 180.0, 85.06)
Coordinate Operation:
- name: Popular Visualisation Pseudo-Mercator
- method: Popular Visualisation Pseudo Mercator
Datum: World Geodetic System 1984
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
In [74]:
fig, ax = plt.subplots(figsize=(10,10))

# Plot the choropleth
philly_demo_final.plot(ax=ax, column='B03002_001E', legend=True)

# Format
ax.set_title("Population of Philadelphia by Block Group", fontsize=16)
ax.set_axis_off()

Or using hvplot...

In [75]:
cols = ['NAME_x', 'B03002_001E', 'geometry']
philly_demo_final[cols].hvplot(c='B03002_001E',
                              geo=True, 
                              crs=3857, 
                              legend=True,
                              width=600, 
                              height=400, 
                              cmap='viridis')
/Users/nhand/opt/miniconda3/envs/musa-550-fall-2020/lib/python3.7/site-packages/holoviews/plotting/util.py:685: MatplotlibDeprecationWarning: The global colormaps dictionary is no longer considered public API.
  [cmap for cmap in cm.cmap_d if not
/Users/nhand/opt/miniconda3/envs/musa-550-fall-2020/lib/python3.7/site-packages/holoviews/plotting/util.py:685: MatplotlibDeprecationWarning: The global colormaps dictionary is no longer considered public API.
  [cmap for cmap in cm.cmap_d if not
Out[75]:

Let's prep the data for the dot map

  1. Rename columns to more user-friendly versions
  2. Add a general "Other" category
In [76]:
# Rename columns
philly_demo_final = philly_demo_final.rename(
    columns={
        "B03002_001E": "Total",  # Total
        "B03002_003E": "White",  # Not Hispanic, White
        "B03002_004E": "Black",  # Not Hispanic, Black
        "B03002_005E": "AI/AN",  # Not Hispanic, American Indian
        "B03002_006E": "Asian",  # Not Hispanic, Asian
        "B03002_007E": "NH/PI",  # Not Hispanic, Native Hawaiian
        "B03002_008E": "Other_",  # Not Hispanic, Other
        "B03002_009E": "Two Plus",  # Not Hispanic, Two or More Races
        "B03002_012E": "Hispanic",  # Hispanic
    }
)
In [77]:
# Add an "Other" column 
cols = ['AI/AN', 'NH/PI','Other_', 'Two Plus']
philly_demo_final['Other'] = philly_demo_final[cols].sum(axis=1)

Define a function to create random points

Given a polygon, create randomly distributed points that fall within the polygon.

In [78]:
def random_points_in_polygon(number, polygon):
    """
    Generate a random number of points within the 
    specified polygon.
    """
    points = []
    min_x, min_y, max_x, max_y = polygon.bounds
    i= 0
    while i < number:
        point = Point(np.random.uniform(min_x, max_x), np.random.uniform(min_y, max_y))
        if polygon.contains(point):
            points.append(point)
            i += 1
    return points

Random points example

In [79]:
# get the first block group polygon in the data set
geo = philly_demo_final.iloc[0].geometry

geo
Out[79]:
In [80]:
fig, ax = plt.subplots(figsize=(6, 6))

# Generate some random points
random_points = random_points_in_polygon(100, geo)

# Plot random points
gpd.GeoSeries(random_points).plot(ax=ax, markersize=20, color="red")

# Plot boundary of block group
gpd.GeoSeries([geo]).plot(ax=ax, facecolor="none", edgecolor="black")

ax.set_axis_off()
In [81]:
def generate_dot_map(data, people_per_dot):
    """
    Given a GeoDataFrame with demographic columns, generate a dot 
    map according to the population in each geometry.
    """
    results = []
    for field in ["White", "Hispanic", "Black", "Asian", "Other"]:

        # generate random points
        pts = data.apply(
            lambda row: random_points_in_polygon(
                row[field] / people_per_dot, row["geometry"]
            ),
            axis=1,
        )

        # combine into single GeoSeries
        pts = gpd.GeoSeries(pts.apply(pd.Series).stack(), crs=data["geometry"].crs)
        pts.name = "geometry"

        # make into a GeoDataFrame
        pts = gpd.GeoDataFrame(pts)
        pts["field"] = field

        # save
        results.append(pts)

    return gpd.GeoDataFrame(pd.concat(results), crs=data["geometry"].crs).reset_index(
        drop=True
    )

Calculate the dot map

In [82]:
dot_map = generate_dot_map(philly_demo_final, people_per_dot=50)
In [83]:
print("number of points = ", len(dot_map))
number of points =  34134
In [84]:
dot_map.head()
Out[84]:
geometry field
0 POINT (-8352609.693 4878621.977) White
1 POINT (-8352435.400 4878744.252) White
2 POINT (-8352530.683 4878183.500) White
3 POINT (-8352346.485 4878117.165) White
4 POINT (-8352151.823 4878230.909) White

Now let's plot it

In [85]:
# setup a custom color map from ColorBrewer
from matplotlib.colors import ListedColormap

cmap = ListedColormap(
    ["#3a833c", "#377eb8", "#4daf4a", "#984ea3", "#ff7f00", "#ffff33"]
)
In [86]:
# plot the dot map
dot_map_3857 = dot_map.to_crs(epsg=3857)
In [87]:
fig, ax = plt.subplots(figsize=(10, 10), facecolor="#cfcfcf")

# Plot
dot_map_3857.plot(
    ax=ax,
    column="field",
    categorical=True,
    legend=True,
    alpha=1,
    markersize=0.5,
    cmap=cmap,
)

# format
ax.set_title("Philadelphia, PA", fontsize=16)
ax.text(
    0.5, 0.95, "1 dot = 50 people", fontsize=12, transform=ax.transAxes, ha="center"
)
ax.set_axis_off()