Week 1B: Exploratory Data Science in Python¶

9/3/2020

Today¶

Introduction to Pandas
Key data analysis concepts
Example: Zillow + Census housing data

Reminder: The weekly workflow¶

You'll set up your local Python environment as part of first homework assignment (posted today!)
Each week, you will have two options to follow along with lectures:
1. Using Binder in the cloud, launching via the button on the week's repository
2. Download the week's repository to your laptop and launch the notebook locally
Work on homeworks locally on your laptop — Binder is only a temporary environment (no save features)

To follow along today, go to https://github.com/MUSA-550-Fall-2020/week-1

Screen%20Shot%202020-09-02%20at%2010.31.40%20PM.png

Python Data Analysis: the pandas package¶

Documentation is available at https://pandas.pydata.org

The new, less fun logo¶

The following line imports the pandas package:

import pandas as pd

Basic pandas concepts¶

The primary objects in pandas are the:

DataFrame, which is very similar to an Excel spreadsheet, and has rows and named columns
Series, which represents a single column of data. A DataFrame contains one or more Series and a name for each Series.

The data frame is a commonly used abstraction for data manipulation. Similar implementations exist in R.

You can think Series objects as fancier versions of Python's built-in list data type

To create a Series object:

# city names and population
city_names = pd.Series(['San Francisco', 'San Jose', 'Sacramento'])
population = pd.Series([852469, 1015785, 485199])

city_names

0    San Francisco
1         San Jose
2       Sacramento
dtype: object

DataFrame objects can be created by passing a dict mapping string column names to their respective Series.

cities_df = pd.DataFrame({ 'City Name': city_names, 'Population': population })
cities_df

Note: always try to give your variables meaningful names. It will help immensely when you are trying to debug problems or when you're trying to understand the code you've written months later.

Accessing data¶

# access columns with a dict-like syntax
cities_df['Population']

0     852469
1    1015785
2     485199
Name: Population, dtype: int64

Rows can be accessed using Python's syntax for slicing¶

# list slicing: get the elements with indices 1 and 2 (but NOT 3)
cities_list = ['San Francisco', 'San Jose', 'Sacramento']
cities_list[1:3]

['San Jose', 'Sacramento']

Unfortunately the functionality for slicing lists is not that powerful...but pandas will have many more features!

# slicing data frame rows is very similar!
cities_df[1:3]

pandas includes functionality for many different ways of selecting data. See the documentation for many more examples.

Manipulating data¶

cities_df['Population'].min()

485199

NumPy is a popular toolkit for scientific computing.

pandas Series can be used as arguments to most NumPy functions:

import numpy as np

# calculate the median population value
np.median(cities_df['Population'])

852469.0

For more complex single-column transformations, you can use Series.apply. It accepts a function that is applied to each value in the Series.

For example, we can find which cities have a population greater than a million:

# define our function
def get_large_cities(population):
    return population > 1e6

large_cities_sel = cities_df['Population'].apply(get_large_cities)

large_cities_sel

0    False
1     True
2    False
Name: Population, dtype: bool

# add the new computed column to our original data frame
cities_df["Large Cities"] = large_cities_sel

# we can also use lambda (unnamed, inline) functions
cities_df["Small Cities"] = cities_df["Population"].apply(
    lambda  population: population < 1e6
)

# print out
cities_df

Data selection¶

We can select the "large" cities by passing the boolean values to the .loc() function of a DataFrame:

cities_df['Large Cities']

0    False
1     True
2    False
Name: Large Cities, dtype: bool

cities_df.loc[cities_df['Large Cities']]

There is a faster, more concise, to achieve this!¶

cities_df['Population'] > 1e6

0    False
1     True
2    False
Name: Population, dtype: bool

# this is equivalent to doing 
cities_df.loc[(cities_df['Population'] < 1e6)]

What about getting rows where "Large Cities" is False?¶

Use the Python tilde operator to do a logicial not operation:

# reverse the large cities boolean selection
cities_df.loc[~large_cities_sel]

# or equivalently:
# NOTE: you need to put the whole expression in () and then apply the tilde!
cities_df.loc[~(cities_df['Population'] > 1e6)]

An even faster way!¶

We can use the pandas query function.

The query() function will return a subset of your dataframe based on a string version of the boolean expression.

cities_df.query("Population > 1e6")

How to remember the specifics of all of these functions?¶

The documentation is your best friend! Use the question mark operator!

# use the question mark
pd.DataFrame.loc?

Now onto a more interesting example...¶

Analyzing trends in rentals rates over time¶

Let's load census data on rentals rates from the 2017 Amercian Community Survey.

We've stored the data locally in the comma-separated value (CSV) format. Later in the course, you'll learn how to download them directly using Python.

Use the read_csv() function from pandas. The first argument to read_csv() is the file path to load.

census_df = pd.read_csv("./data/census/ACS_17_1YR_B25003.csv")

census_df.head(n=5)

Select certain columns and rename them.

Total: Total occupied housing units
Rentals: Renter-occupied housing units

## FIRST STEP: let's trim to the columns we want

# columns we want
columns = ['GEO.display-label', 'HD01_VD01', 'HD01_VD03']
census_df = census_df[columns]

# Peak at the dataframe with trimmed columns
census_df.head()

## STEP 2: Let's rename the columns!

# rename columns
census_df.columns = ['City', 'Total', 'Rentals']

# Return the head (first 5 rows) from the cell
census_df.head()

# Step 3: Add the rental rate as a new column
census_df['Rental Rate'] = census_df['Rentals'] / census_df['Total'] * 100

census_df.head()

# How many rows are in the dataframe? --> use the len() operator

len(census_df)

608

We can select the largest cities by population using the Series.isin() function, which checks if each value in the Series is in the specified list.

# Define the list of city names we want
top_cities_list = [
    "Philadelphia city, Pennsylvania",
    "New York city, New York",
    "Los Angeles city, California",
    "Houston city, Texas",
    "Chicago city, Illinois",
    "Phoenix city, Arizona",
]

# Use the isin() to test whether a row value is in a city
census_df["City"].isin(top_cities_list)

0      False
1      False
2      False
3      False
4      False
       ...  
603    False
604    False
605    False
606    False
607    False
Name: City, Length: 608, dtype: bool

# Now do the selection!
census_df.loc[census_df["City"].isin(top_cities_list)]  # select the valid rows

Only 6 rows as expected!

Alternatively, we could use the query() function. The syntax is a bit different (and I always forget and have to look at the documentation).

In the query() string, you can reference existing variables by prefacing the variable name with an "@" symbol.

For example:

# Get the rows where the City column is in "top_cities_list"
census_df.query("City in @top_cities_list")

An aside about string formatting¶

Python uses the "%" operator to insert variable values into strings. For example, the file name of the data we want to load for 2017 is:

year_str = "17"
path = "./data/census/ACS_%s_1YR_B25003.csv" % year_str
print(path)

./data/census/ACS_17_1YR_B25003.csv

See this guide on this type of string formatting for more info.

Python f-strings¶

Python version 3.6 introduced a new and improved string formatting syntax, where the variable that is being inserted can be referenced directly using the variable name.

The variable name should be enclosed in curly braces inside the string:

year_str = "17"
path = f"./data/census/ACS_{year_str}_1YR_B25003.csv"
print(path)

./data/census/ACS_17_1YR_B25003.csv

For more info: see this guide

Now let's load data from 2005 to 2017¶

Let's use a for loop to loop over each year value from 2005 to 2017

all_rental_data = []
for year in range(2005, 2018):
    
    #---------------------------------------------
    # Step 0: Print out year 
    # The variable year changes for each iteration of the loop
    #----------------------------------------------
    print(f"Loading data for year = {year}...")
    
    #---------------------------------
    # Step 1: Read data for this year
    #---------------------------------
    year_str = str(year) # convert integer value of "year" to a string
    year_str = year_str[2:] # extract the last two digits of the year string
    rental_df = pd.read_csv(f"./data/census/ACS_{year_str}_1YR_B25003.csv")
     
    #---------------------------------
    # Step 2: Columns we want
    #---------------------------------
    columns = ['GEO.display-label', 'HD01_VD01', 'HD01_VD03']
    rental_df = rental_df[columns]

    #---------------------------------
    # Step 3: Rename columns
    #---------------------------------
    rental_df.columns = ['City', 'Total', 'Rentals']
    
    #----------------------------------
    # Step 4: Calculate the rental rate
    #----------------------------------
    rental_df['Rental Rate'] = rental_df['Rentals'] / rental_df['Total'] * 100
    
    #----------------------------------
    # Step 5: Select the cities we want
    #----------------------------------
    selection = rental_df['City'].isin(top_cities_list)
    top_cities_df = rental_df.loc[selection].copy()
    
    #----------------------------------
    # Step 6: Add a column for the year
    #----------------------------------
    top_cities_df['Year'] = year
    
    #-----------------------------------
    # Step 7: Save this dataframe in the list
    #-----------------------------------
    all_rental_data.append(top_cities_df)

#---------------------------------------------------
# Step 8: After the for loop, combine all of the 
# data frames into one along the row axis
#---------------------------------------------------
all_rental_data = pd.concat(all_rental_data, axis=0)

Loading data for year = 2005...
Loading data for year = 2006...
Loading data for year = 2007...
Loading data for year = 2008...
Loading data for year = 2009...
Loading data for year = 2010...
Loading data for year = 2011...
Loading data for year = 2012...
Loading data for year = 2013...
Loading data for year = 2014...
Loading data for year = 2015...
Loading data for year = 2016...
Loading data for year = 2017...

# Note: the pd.concat() took a list of dataframes and converted them 
# into a single dataframe!
type(all_rental_data)

pandas.core.frame.DataFrame

all_rental_data.head()

# What about the tail?
all_rental_data.tail()

# How many rows? Hopefully (6 cities x 13 years = 78!)
len(all_rental_data)

78

Basic plotting: `matplotlib`¶

Much more to come next week. For now, we'll use some very simple plotting commands with Python's main plotting libary matplotlib.

Import the library we'll use from matplotlib.

from matplotlib import pyplot as plt

We'll use the plot() function which plots a simple x vs y line.

The strategy¶

Create a figure axes to plot to
Loop over each unique city
Select the subset of the data for each city
Plot the 'Year' vs. 'Rental Rate' columns using the plot() function for the sub-selection data frame

Note¶

To find the unique city names in the 'City' column, we can use the unique() function:

all_rental_data['City'].unique()

array(['Phoenix city, Arizona', 'Los Angeles city, California',
       'Chicago city, Illinois', 'New York city, New York',
       'Philadelphia city, Pennsylvania', 'Houston city, Texas'],
      dtype=object)

with plt.style.context('ggplot'):
    
    # Step 1: Create a figure and axes to plot on
    fig, ax = plt.subplots(figsize=(10,6))
    
    # Step 2: Loop over each city and plot
    for city in all_rental_data['City'].unique():
        
        # Print out the city in each loop iteration
        print(city)
        
        # Select data for this city
        this_city = all_rental_data['City'] == city
        city_df = all_rental_data.loc[this_city]
        
        # Plot Year vs Rental Rate
        ax.plot(city_df['Year'], city_df['Rental Rate'], label=city, linewidth=4)

    # Add a legend
    ax.legend(loc=0, ncol=3, fontsize=12)

    # Set the y limit
    ax.set_ylim(30, 72)

    # Add a y label
    ax.set_ylabel("Rental Rate in Percent");

Phoenix city, Arizona
Los Angeles city, California
Chicago city, Illinois
New York city, New York
Philadelphia city, Pennsylvania
Houston city, Texas

Note: we used the "ggplot" theme to style our plots. Matplotlib comes with a number of built in styles: see all of the examples here.

From 2005 to 2017, Philadelphia had the largest percent increase¶

Exercise: Can you calculate what the increase was for Philadelphia?¶

You will need to:

Trim the full data set to just Philadelphia
Select the rental rate for 2017
Select the rental rate for 2005
Calculate the percent change

Exercise solution¶

# trim to just Philadelphia first, using this boolean selection index
all_rental_data['City']=='Philadelphia city, Pennsylvania'

13     False
75     False
218    False
349    False
393     True
       ...  
94     False
275    False
426    False
474     True
525    False
Name: City, Length: 78, dtype: bool

# select Philadelphia only
philly = all_rental_data.loc[all_rental_data['City']=='Philadelphia city, Pennsylvania']
philly

An aside about labels and the index in Pandas¶

Note how the first row of the above dataframe has an index label of 393. This means we can access it using the .loc[] function like:

philly.loc[393]

City           Philadelphia city, Pennsylvania
Total                                   565433
Rentals                                 244792
Rental Rate                            43.2928
Year                                      2005
Name: 393, dtype: object

This is different than the .iloc[] function, which also indexes based on the integer value of the row, regardless of what the labels in the index are. For example, the first row of a dataframe can always be accessed using:

philly.iloc[0]

City           Philadelphia city, Pennsylvania
Total                                   565433
Rentals                                 244792
Rental Rate                            43.2928
Year                                      2005
Name: 393, dtype: object

We can reset the index labels so they range from 0 to the length of the dataframe, using the reset_index() function. For example

philly.reset_index(drop=True)

Next, to calculate our percent difference we need to get the values for Philadelphia in 2005 and 2017:

# select 2005
philly_2005 = philly.loc[philly['Year']==2005]
value_2005 = philly_2005['Rental Rate'].iloc[0]
print("2005 value = ", value_2005)

2005 value =  43.292839293072745

philly_2005['Rental Rate']

393    43.292839
Name: Rental Rate, dtype: float64

# select 2017
philly_2017 = philly.loc[philly['Year']==2017]
value_2017 = philly_2017['Rental Rate'].iloc[0]
print("2017 value = ", value_2017)

2017 value =  51.02418245229665

# calculate percent change
percent_change = (value_2017 - value_2005)/ value_2005 * 100

print("change in rental rate in Philadelphia from 2005 to 2017 = ", percent_change)

change in rental rate in Philadelphia from 2005 to 2017 =  17.85824927509661

For more details on the iloc() function, see the documentation on indexing by position.

Now let's compare Zillow rental and home value data¶

Load citywide Zillow Rent Index (ZRI) and Zillow Home Value Index (ZHVI) data.

Files were downloaded from https://www.zillow.com/research/data/

home_values = pd.read_csv("./data/zillow/Metro_Zhvi_AllHomes.csv", engine='python')
rent_values = pd.read_csv('./data/zillow/Metro_Zri_SingleFamilyResidenceRental.csv', engine='python')

Note: the engine='python' keyword is necessary here to load the Zillow data, otherwise you will get a UnicodeDecodeError exception.

Peek at the first few rows of the ZRI data:

rent_values.head()

And do the same for the ZHVI data:

home_values.head()

Selecting the cities we want¶

valid_cities = [
    "New York, NY",
    "Chicago, IL",
    "Los Angeles-Long Beach-Anaheim, CA",
    "Philadelphia, PA",
    "Houston, TX",
    "Phoenix, AZ",
]

selection = home_values['RegionName'].isin(valid_cities)
home_values_trimmed = home_values.loc[selection]

selection = rent_values['RegionName'].isin(valid_cities)
rent_values_trimmed = rent_values.loc[selection]

rent_values_trimmed

Removing unwanted columns¶

Unwanted columns can be dropped from the data frame using the drop() function.

Note that the column axis is the second axis (axis=1), and if you wanted to remove rows, you could use the first axis (axis=0).

x = ['SizeRank', 'RegionID']
home_values_final = home_values_trimmed.drop(x, axis=1)

random_list = ['SizeRank', 'RegionID']
rent_values_final = rent_values_trimmed.drop(random_list, axis=1)

rent_values_final

Wide vs long format for datasets¶

Currently, our data is in wide format $\rightarrow$ each observation has its own column. This usually results in many columns but few rows.

home_values_final

Usually it's better to have data in tidy (also known as long) format.

Tidy datasets are arranged such that each variable is a column and each observation is a row.

In our case, we want to have a column called ZRI and one called ZHVI and a row for each month that the indices were measured.

pandas provides the melt() function for converting from wide formats to tidy formats.

pd.melt?

Now, let's melt our datasets:

ZHVI = pd.melt(
    home_values_final, id_vars=["RegionName"], value_name="ZHVI", var_name="Date"
)
ZRI = pd.melt(
    rent_values_final, id_vars=["RegionName"], value_name="ZRI", var_name="Date"
)

and take a look:

ZRI.tail()

ZHVI.head()

Note that missing data values are represented as NaN ("not a number")

Merging data frames¶

Another common operation is merging, also known as joining, two datasets.

We can use the merge() function to merge observations that have the same Date and RegionName values.

zillow_data = pd.merge(ZRI, ZHVI, on=['Date', 'RegionName'], how='outer')

zillow_data.head()

Merging is very powerful and the merge can be done in a number of ways. See the infographic on joining in this repository.

Datetime objects¶

Currently our Date column is stored as a string.

pandas includes additional functionality for dates, but first we must convert the strings using the to_datetime() function.

# convert to data
zillow_data['Date'] = pd.to_datetime(zillow_data['Date'])

Quick trick: Series that hold Datetime objects have a dt attribute that let's you grab parts of the date easily.

For example, we can easily add new columns for the month and year using:

# Note the the dtype is now datetime64[ns]
zillow_data['Date'].head()

0   2010-11-01
1   2010-11-01
2   2010-11-01
3   2010-11-01
4   2010-11-01
Name: Date, dtype: datetime64[ns]

zillow_data['Month'] = zillow_data['Date'].dt.month
zillow_data['Year'] = zillow_data['Date'].dt.year

zillow_data.head()

Annual trends: grouping by `Year`¶

pandas is especially useful for grouping and aggregating data via the groupby() function.

From the pandas documentation, groupby means:

Splitting the data into groups based on some criteria.
Applying a function to each group independently.
Combining the results into a data structure.

The documentation is available here.

We can calculate annual averages for each year by grouping by the RegionName and Year columns and taking the mean of our desired column. For example:

# calculate mean values for each Year and City (RegionName)
annual_ZHVI = zillow_data.groupby(['RegionName', 'Year'])['ZHVI'].mean() 
annual_ZRI = zillow_data.groupby(['RegionName', 'Year'])['ZRI'].mean()

print(type(annual_ZHVI))

<class 'pandas.core.series.Series'>

annual_ZHVI.head()

RegionName   Year
Chicago, IL  1996    138344.444444
             1997    130316.666667
             1998    137841.666667
             1999    147516.666667
             2000    159900.000000
Name: ZHVI, dtype: float64

Note that here that the result is indexed by the columns we grouped by (RegionName and Year).

We can reset the index so that the index values are listed as columns in the data frame again.

annual_ZHVI = annual_ZHVI.reset_index()
annual_ZRI = annual_ZRI.reset_index()

annual_ZHVI.head(n=50)

Plotting our results: ZHVI¶

with plt.style.context("ggplot"):

    # Create figure and axes
    fig, ax = plt.subplots(figsize=(10, 6))
    
    # Plot for each unique city
    for city in annual_ZHVI["RegionName"].unique():
        
        # select the data for this city
        selection = annual_ZHVI["RegionName"] == city
        df = annual_ZHVI.loc[selection]

        # plot
        ax.plot(df["Year"], df["ZHVI"] / 1e3, label=city, linewidth=4)

    
    # Format the axes
    ax.set_ylim(50, 800)
    ax.legend(loc=0, ncol=2, fontsize=12)
    ax.set_ylabel("Zillow Home Value Index\n(in thousands of dollars)")

Home values in Philadelphia have only recently recovered to pre-2008 levels¶

Plotting the results: Zillow Rent Index¶

with plt.style.context('ggplot'):
    
    # Create the figure and axes
    fig, ax = plt.subplots(figsize=(10,6))
    
    # Loop over the cities to plot each one
    for city in annual_ZRI['RegionName'].unique():
        
        # Select the city data
        selection = annual_ZRI['RegionName'] == city
        df = annual_ZRI.loc[selection]
        
        # Plot
        ax.plot(df['Year'], df['ZRI'], label=city, linewidth=4)
    
    # Format
    ax.set_ylim(1000, 3300)
    ax.legend(loc=0, ncol=2)
    ax.set_ylabel('Zillow Rent Index (in dollars)')

Rent prices in Philadelphia have remained relatively flat, relative to other large cities.¶

First Homework Assignment¶

Available on GitHub:

https://github.com/MUSA-550-Fall-2020/assignment-1

Two parts:

Download and install Python locally on your computer. Instructions in the assignment README!
Analyze Zillow ZHVI data by ZIP code in Philadelphia, submitting your Jupyter notebook.

Due date: next Thursday 9/10 by the start of class (6pm)

Next week: Data Visualization Fundamentals¶

We'll transition to our "flipped" classroom design
Pre-recorded lecture will be posted on Sunday evening — this will replace the Tuesday lecture
Zoom lecture on Thursday will be more lab-based — applying skills from the pre-recorded lecture
Zoom link for next Thursday will be posted to Canvas

Announcements & reminders will go out on Piazza!

That's it!¶

Stay tuned for announcement of office hours
Post questions on Piazza!
Email questions/concerns to nhand@design.upenn.edu

	GEO.id	GEO.id2	GEO.display-label	HD01_VD01	HD02_VD01	HD01_VD02	HD02_VD02	HD01_VD03	HD02_VD03
0	1600000US0107000	107000	Birmingham city, Alabama	90186	3374	40295	2507	49891	3372
1	1600000US0121184	121184	Dothan city, Alabama	25192	875	14935	763	10257	722
2	1600000US0135896	135896	Hoover city, Alabama	34058	1330	22913	1724	11145	1482
3	1600000US0137000	137000	Huntsville city, Alabama	84750	2687	46677	2948	38073	2680
4	1600000US0150000	150000	Mobile city, Alabama	78145	2673	41297	2323	36848	2325

	GEO.display-label	HD01_VD01	HD01_VD03
0	Birmingham city, Alabama	90186	49891
1	Dothan city, Alabama	25192	10257
2	Hoover city, Alabama	34058	11145
3	Huntsville city, Alabama	84750	38073
4	Mobile city, Alabama	78145	36848

	City	Total	Rentals
0	Birmingham city, Alabama	90186	49891
1	Dothan city, Alabama	25192	10257
2	Hoover city, Alabama	34058	11145
3	Huntsville city, Alabama	84750	38073
4	Mobile city, Alabama	78145	36848

	City	Total	Rentals	Rental Rate
0	Birmingham city, Alabama	90186	49891	55.320116
1	Dothan city, Alabama	25192	10257	40.715306
2	Hoover city, Alabama	34058	11145	32.723589
3	Huntsville city, Alabama	84750	38073	44.923894
4	Mobile city, Alabama	78145	36848	47.153369

	City	Total	Rentals	Rental Rate	Year
393	Philadelphia city, Pennsylvania	565433	244792	43.292839	2005
400	Philadelphia city, Pennsylvania	554048	231323	41.751437	2006
408	Philadelphia city, Pennsylvania	562384	239363	42.562199	2007
415	Philadelphia city, Pennsylvania	578263	253084	43.766245	2008
419	Philadelphia city, Pennsylvania	569835	258459	45.356814	2009
427	Philadelphia city, Pennsylvania	575413	264129	45.902508	2010
434	Philadelphia city, Pennsylvania	576429	265126	45.994563	2011
437	Philadelphia city, Pennsylvania	579874	277323	47.824700	2012
449	Philadelphia city, Pennsylvania	582528	285430	48.998503	2013
454	Philadelphia city, Pennsylvania	577862	277301	47.987409	2014
458	Philadelphia city, Pennsylvania	581604	275397	47.351291	2015
465	Philadelphia city, Pennsylvania	580205	277707	47.863600	2016
474	Philadelphia city, Pennsylvania	606142	309279	51.024182	2017

	City Name	Population	Large Cities	Small Cities
0	San Francisco	852469	False	True
1	San Jose	1015785	True	False
2	Sacramento	485199	False	True

	City	Total	Rentals	Rental Rate
18	Phoenix city, Arizona	559155	245335	43.876027
94	Los Angeles city, California	1384851	878061	63.404727
275	Chicago city, Illinois	1047695	577430	55.114322
426	New York city, New York	3159674	2127881	67.344954
474	Philadelphia city, Pennsylvania	606142	309279	51.024182
525	Houston city, Texas	837686	479409	57.230155

	City	Total	Rentals	Rental Rate	Year
13	Phoenix city, Arizona	503753	204914	40.677475	2005
75	Los Angeles city, California	1284124	771325	60.066240	2005
218	Chicago city, Illinois	1020605	525620	51.500825	2005
349	New York city, New York	3026196	2023924	66.880136	2005
393	Philadelphia city, Pennsylvania	565433	244792	43.292839	2005

	RegionID	RegionName	SizeRank	2010-11	2010-12	2011-01	2011-02	2011-03	2011-04	2011-05	...	2018-02	2018-03	2018-04	2018-05	2018-06	2018-07	2018-08	2018-09	2018-10	2018-11
0	102001	United States	0	1250.0	1248.0	1245.0	1240.0	1236.0	1234.0	1236.0	...	1411	1412	1414	1416	1416	1415	1415	1414	1417	1426
1	394913	New York, NY	1	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	2422	2417	2417	2418	2417	2416	2415	2415	2425	2438
2	753899	Los Angeles-Long Beach-Anaheim, CA	2	2283.0	2272.0	2263.0	2259.0	2258.0	2263.0	2265.0	...	2856	2866	2877	2886	2890	2892	2892	2893	2897	2911
3	394463	Chicago, IL	3	1581.0	1569.0	1560.0	1551.0	1541.0	1534.0	1530.0	...	1698	1696	1696	1696	1695	1694	1694	1693	1697	1713
4	394514	Dallas-Fort Worth, TX	4	1297.0	1297.0	1297.0	1298.0	1298.0	1300.0	1304.0	...	1624	1622	1622	1621	1619	1618	1618	1616	1620	1630

	RegionID	RegionName	SizeRank	1996-04	1996-05	1996-06	1996-07	1996-08	1996-09	1996-10	...	2018-02	2018-03	2018-04	2018-05	2018-06	2018-07	2018-08	2018-09	2018-10	2018-11
0	102001	United States	0	100600.0	100600.0	100600.0	100700.0	100800.0	101000.0	101200.0	...	211500	213200	214400	215600	216800	217900	219100	220200	221600	222800
1	394913	New York, NY	1	165000.0	164800.0	164600.0	164300.0	164100.0	163900.0	163800.0	...	420800	423400	425300	426600	427600	428600	429500	431100	433500	435700
2	753899	Los Angeles-Long Beach-Anaheim, CA	2	170600.0	170400.0	170100.0	169800.0	169500.0	169300.0	169200.0	...	635900	639000	640600	641900	643200	644700	646400	647500	648500	650400
3	394463	Chicago, IL	3	138300.0	138900.0	139300.0	139400.0	139300.0	138800.0	138000.0	...	218000	219300	219900	220500	220900	221100	221500	222100	222800	223600
4	394514	Dallas-Fort Worth, TX	4	115300.0	115400.0	115500.0	115700.0	115900.0	116100.0	116400.0	...	222600	225500	227000	228400	230200	231700	232600	233800	235900	237800

	RegionName	Date	ZRI
577	Los Angeles-Long Beach-Anaheim, CA	2018-11	2911.0
578	Chicago, IL	2018-11	1713.0
579	Philadelphia, PA	2018-11	1696.0
580	Houston, TX	2018-11	1591.0
581	Phoenix, AZ	2018-11	1423.0

	RegionName	Date	ZHVI
0	New York, NY	1996-04	165000.0
1	Los Angeles-Long Beach-Anaheim, CA	1996-04	170600.0
2	Chicago, IL	1996-04	138300.0
3	Philadelphia, PA	1996-04	109700.0
4	Houston, TX	1996-04	NaN

	RegionName	Date	ZRI	ZHVI
0	New York, NY	2010-11	NaN	358800.0
1	Los Angeles-Long Beach-Anaheim, CA	2010-11	2283.0	410300.0
2	Chicago, IL	2010-11	1581.0	182800.0
3	Philadelphia, PA	2010-11	1557.0	204400.0
4	Houston, TX	2010-11	1370.0	137500.0

	RegionName	Date	ZRI	ZHVI	Month	Year
0	New York, NY	2010-11-01	NaN	358800.0	11	2010
1	Los Angeles-Long Beach-Anaheim, CA	2010-11-01	2283.0	410300.0	11	2010
2	Chicago, IL	2010-11-01	1581.0	182800.0	11	2010
3	Philadelphia, PA	2010-11-01	1557.0	204400.0	11	2010
4	Houston, TX	2010-11-01	1370.0	137500.0	11	2010

	RegionName	Year	ZHVI
0	Chicago, IL	1996	138344.444444
1	Chicago, IL	1997	130316.666667
2	Chicago, IL	1998	137841.666667
3	Chicago, IL	1999	147516.666667
4	Chicago, IL	2000	159900.000000
5	Chicago, IL	2001	172858.333333
6	Chicago, IL	2002	185350.000000
7	Chicago, IL	2003	197341.666667
8	Chicago, IL	2004	216033.333333
9	Chicago, IL	2005	237100.000000
10	Chicago, IL	2006	251775.000000
11	Chicago, IL	2007	251450.000000
12	Chicago, IL	2008	233575.000000
13	Chicago, IL	2009	206675.000000
14	Chicago, IL	2010	190841.666667
15	Chicago, IL	2011	171133.333333
16	Chicago, IL	2012	161150.000000
17	Chicago, IL	2013	168450.000000
18	Chicago, IL	2014	179666.666667
19	Chicago, IL	2015	188175.000000
20	Chicago, IL	2016	197633.333333
21	Chicago, IL	2017	208983.333333
22	Chicago, IL	2018	220536.363636
23	Houston, TX	1996	NaN
24	Houston, TX	1997	105920.000000
25	Houston, TX	1998	111800.000000
26	Houston, TX	1999	121250.000000
27	Houston, TX	2000	126775.000000
28	Houston, TX	2001	124141.666667
29	Houston, TX	2002	125650.000000
30	Houston, TX	2003	128583.333333
31	Houston, TX	2004	135741.666667
32	Houston, TX	2005	141791.666667
33	Houston, TX	2006	144033.333333
34	Houston, TX	2007	148366.666667
35	Houston, TX	2008	144950.000000
36	Houston, TX	2009	140841.666667
37	Houston, TX	2010	139341.666667
38	Houston, TX	2011	135175.000000
39	Houston, TX	2012	134516.666667
40	Houston, TX	2013	139483.333333
41	Houston, TX	2014	148350.000000
42	Houston, TX	2015	164275.000000
43	Houston, TX	2016	178841.666667
44	Houston, TX	2017	188108.333333
45	Houston, TX	2018	198454.545455
46	Los Angeles-Long Beach-Anaheim, CA	1996	169722.222222
47	Los Angeles-Long Beach-Anaheim, CA	1997	172133.333333
48	Los Angeles-Long Beach-Anaheim, CA	1998	188033.333333
49	Los Angeles-Long Beach-Anaheim, CA	1999	207566.666667