Week 8B: Raster Data in the Wild¶

Oct 22, 2020

This week: raster data analysis with the holoviz ecosystem¶

Two case studies:

Last time: Using satellite imagery to detect changes in lake volume
Today: Detecting urban heat islands in Philadelphia

Initialize our packages:

from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
import geopandas as gpd
%matplotlib inline

import intake
import xarray as xr
import cartopy.crs as ccrs

import hvplot.xarray
import hvplot.pandas
import holoviews as hv
import geoviews as gv
from geoviews.tile_sources import EsriImagery
hv.extension('bokeh')

Create the intake data catalog so we can load our datasets:

url = 'https://raw.githubusercontent.com/pyviz-topics/EarthML/master/examples/catalog.yml'
cat = intake.open_catalog(url)

Example #2: Urban heat islands¶

We'll reproduce an analysis by Ken on urban heat islands in Philadelphia using Python.
The analysis uses Landsat 8 data (2017)
See: http://urbanspatialanalysis.com/urban-heat-islands-street-trees-in-philadelphia/

First load some metadata for Landsat 8¶

band_info = pd.DataFrame([
        (1,  "Aerosol", " 0.43 - 0.45",    0.440,  "30",         "Coastal aerosol"),
        (2,  "Blue",    " 0.45 - 0.51",    0.480,  "30",         "Blue"),
        (3,  "Green",   " 0.53 - 0.59",    0.560,  "30",         "Green"),
        (4,  "Red",     " 0.64 - 0.67",    0.655,  "30",         "Red"),
        (5,  "NIR",     " 0.85 - 0.88",    0.865,  "30",         "Near Infrared (NIR)"),
        (6,  "SWIR1",   " 1.57 - 1.65",    1.610,  "30",         "Shortwave Infrared (SWIR) 1"),
        (7,  "SWIR2",   " 2.11 - 2.29",    2.200,  "30",         "Shortwave Infrared (SWIR) 2"),
        (8,  "Panc",    " 0.50 - 0.68",    0.590,  "15",         "Panchromatic"),
        (9,  "Cirrus",  " 1.36 - 1.38",    1.370,  "30",         "Cirrus"),
        (10, "TIRS1",   "10.60 - 11.19",   10.895, "100 * (30)", "Thermal Infrared (TIRS) 1"),
        (11, "TIRS2",   "11.50 - 12.51",   12.005, "100 * (30)", "Thermal Infrared (TIRS) 2")],
    columns=['Band', 'Name', 'Wavelength Range (µm)', 'Nominal Wavelength (µm)', 'Resolution (m)', 'Description']).set_index(["Band"])
band_info

Landsat data on Google Cloud Storage¶

We'll be downloading publicly available Landsat data from Google Cloud Storage

See: https://cloud.google.com/storage/docs/public-datasets/landsat

The relevant information is stored in our intake catalog:

From our catalog.yml file:

google_landsat_band:
    description: Landsat bands from Google Cloud Storage
    driver: rasterio
    parameters:
      path:
        description: landsat path
        type: int
      row:
        description: landsat row
        type: int
      product_id:
        description: landsat file id
        type: str
      band:
        description: band
        type: int
    args:
      urlpath: https://storage.googleapis.com/gcp-public-data-landsat/LC08/01/{{ '%03d' % path }}/{{ '%03d' % row }}/{{ product_id }}/{{ product_id }}_B{{ band }}.TIF
      chunks:
        band: 1
        x: 256
        y: 256

From the "urlpath" above, you can see we need "path", "row", and "product_id" variables to properly identify a Landsat image:

The path and row corresponding to the area over Philadelphia has already been selected using the Earth Explorer. This tool was also used to find the id of the particular date of interest using the same tool.

# Necessary variables
path = 14
row = 32
product_id = 'LC08_L1TP_014032_20160727_20170222_01_T1'

Use a utility function to query the API and request a specific band¶

This will return a specific Landsat band as a dask array.

from random import random
from time import sleep


def get_band_with_exponential_backoff(path, row, product_id, band, maximum_backoff=32):
    """
    Google Cloud Storage recommends using exponential backoff 
    when accessing the API. 
    
    https://cloud.google.com/storage/docs/exponential-backoff
    """
    n = backoff = 0
    while backoff < maximum_backoff:
        try:
            return cat.google_landsat_band(
                path=path, row=row, product_id=product_id, band=band
            ).to_dask()
        except:
            backoff = min(2 ** n + random(), maximum_backoff)
            sleep(backoff)
            n += 1

Load all of the bands and combine them into a single xarray DataArray¶

Loop over each band, load that band using the above function, and then concatenate the results together..

bands = [1, 2, 3, 4, 5, 6, 7, 9, 10, 11]

datasets = []
for band in bands:
    da = get_band_with_exponential_backoff(
        path=path, row=row, product_id=product_id, band=band
    )
    da = da.assign_coords(band=[band])
    datasets.append(da)

ds = xr.concat(datasets, "band", compat="identical")

ds

<xarray.DataArray (band: 10, y: 7871, x: 7741)>
dask.array<concatenate, shape=(10, 7871, 7741), dtype=uint16, chunksize=(1, 256, 256), chunktype=numpy.ndarray>
Coordinates:
  * x        (x) float64 3.954e+05 3.954e+05 3.955e+05 ... 6.276e+05 6.276e+05
  * y        (y) float64 4.582e+06 4.582e+06 4.582e+06 ... 4.346e+06 4.346e+06
  * band     (band) int64 1 2 3 4 5 6 7 9 10 11
Attributes:
    transform:      (30.0, 0.0, 395385.0, 0.0, -30.0, 4582215.0)
    crs:            +init=epsg:32618
    res:            (30.0, 30.0)
    is_tiled:       1
    nodatavals:     (nan,)
    scales:         (1.0,)
    offsets:        (0.0,)
    AREA_OR_POINT:  Point

array([395400., 395430., 395460., ..., 627540., 627570., 627600.])

array([4582200., 4582170., 4582140., ..., 4346160., 4346130., 4346100.])

array([ 1,  2,  3,  4,  5,  6,  7,  9, 10, 11])

Important: CRS info¶

CRS is EPSG=32618

Also grab the metadata from Google Cloud Storage¶

There is an associated metadata file stored on Google Cloud Storage
The below function will parse that metadata file for a specific path, row, and product ID
The specifics of this are not overly important for our purposes

def load_google_landsat_metadata(path, row, product_id):
    """
    Load Landsat metadata for path, row, product_id from Google Cloud Storage
    """

    def parse_type(x):
        try:
            return eval(x)
        except:
            return x

    baseurl = "https://storage.googleapis.com/gcp-public-data-landsat/LC08/01"
    suffix = f"{path:03d}/{row:03d}/{product_id}/{product_id}_MTL.txt"

    df = intake.open_csv(
        urlpath=f"{baseurl}/{suffix}",
        csv_kwargs={
            "sep": "=",
            "header": None,
            "names": ["index", "value"],
            "skiprows": 2,
            "converters": {"index": (lambda x: x.strip()), "value": parse_type},
        },
    ).read()
    metadata = df.set_index("index")["value"]
    return metadata

metadata = load_google_landsat_metadata(path, row, product_id)
metadata.head(n=10)

index
ORIGIN                         Image courtesy of the U.S. Geological Survey
REQUEST_ID                                              0501702206266_00020
LANDSAT_SCENE_ID                                      LC80140322016209LGN01
LANDSAT_PRODUCT_ID                 LC08_L1TP_014032_20160727_20170222_01_T1
COLLECTION_NUMBER                                                        01
FILE_DATE                                              2017-02-22T04:00:05Z
STATION_ID                                                              LGN
PROCESSING_SOFTWARE_VERSION                                      LPGS_2.7.0
END_GROUP                                                METADATA_FILE_INFO
GROUP                                                      PRODUCT_METADATA
Name: value, dtype: object

Let's trim our data to the Philadelphia limits¶

The Landsat image covers an area much wider than the Philadelphia limits
We'll load the city limits from Open Data Philly
Use the slice() function discussed last lecture

Load the city limits¶

From OpenDataPhilly, the city limits for Philadelphia are available at: http://data.phl.opendata.arcgis.com/datasets/405ec3da942d4e20869d4e1449a2be48_0.geojson
Be sure to convert to the same CRS as the Landsat data!

# Load the GeoJSON from the URL
url = "http://data.phl.opendata.arcgis.com/datasets/405ec3da942d4e20869d4e1449a2be48_0.geojson"
city_limits = gpd.read_file(url)

# Convert to the right CRS for this data
city_limits = city_limits.to_crs(epsg=32618)

Use the xmin/xmax and ymin/ymax of the city limits to trim the Landsat DataArray¶

Use the built-in slice functionality of xarray
Remember, the y coordinates are in descending order, so you'll slice will need to be in descending order as well

# Look up the order of xmin/xmax and ymin/ymax
city_limits.total_bounds?

# Get x/y range of city limits from "total_bounds"
xmin, ymin, xmax, ymax = city_limits.total_bounds

# Slice our xarray data
subset = ds.sel(
                x=slice(xmin, xmax), 
                y=slice(ymax, ymin)
               ) # NOTE: y coordinate system is in descending order!

Call the compute() function on your sliced data¶

Originally, the Landsat data was stored as dask arrays
This should now only load the data into memory that covers Philadelphia

subset = subset.compute()

subset

<xarray.DataArray (band: 10, y: 1000, x: 924)>
array([[[10702, 10162, 10361, ..., 11681, 11489, 15594],
        [10870, 10376, 10122, ..., 11620, 11477, 12042],
        [10429, 10147, 10063, ..., 11455, 11876, 11790],
        ...,
        [11944, 11696, 11626, ..., 11192, 11404, 10923],
        [12406, 11555, 11155, ..., 11516, 11959, 10766],
        [11701, 11232, 10657, ..., 10515, 11475, 10926]],

       [[ 9809,  9147,  9390, ..., 10848, 10702, 15408],
        [ 9989,  9405,  9139, ..., 10831, 10660, 11361],
        [ 9439,  9083,  8981, ..., 10654, 11141, 11073],
        ...,
        [11220, 10853, 10741, ..., 10318, 10511,  9950],
        [11765, 10743, 10259, ..., 10646, 11378,  9829],
        [10889, 10365,  9630, ...,  9500, 10573, 10008]],

       [[ 9042,  8466,  8889, ..., 10014,  9647, 14981],
        [ 9699,  8714,  8596, ...,  9866,  9783, 11186],
        [ 8623,  8457,  8334, ...,  9688, 10474,  9993],
        ...,
...
        ...,
        [ 5027,  5028,  5038, ...,  5035,  5037,  5029],
        [ 5058,  5021,  5023, ...,  5035,  5041,  5031],
        [ 5036,  5041,  5040, ...,  5036,  5044,  5035]],

       [[29033, 28976, 28913, ..., 32614, 32577, 32501],
        [28940, 28904, 28858, ..., 32516, 32545, 32545],
        [28882, 28879, 28854, ..., 32431, 32527, 32563],
        ...,
        [30094, 29929, 29713, ..., 29521, 29525, 29429],
        [30140, 29972, 29696, ..., 29556, 29516, 29398],
        [30133, 29960, 29614, ..., 29587, 29533, 29424]],

       [[25558, 25519, 25492, ..., 27680, 27650, 27619],
        [25503, 25450, 25402, ..., 27636, 27630, 27639],
        [25473, 25434, 25378, ..., 27609, 27668, 27667],
        ...,
        [26126, 26055, 25934, ..., 25622, 25586, 25520],
        [26149, 26077, 25935, ..., 25651, 25594, 25507],
        [26147, 26050, 25856, ..., 25696, 25644, 25571]]], dtype=uint16)
Coordinates:
  * x        (x) float64 4.761e+05 4.761e+05 4.761e+05 ... 5.037e+05 5.038e+05
  * y        (y) float64 4.443e+06 4.443e+06 4.443e+06 ... 4.413e+06 4.413e+06
  * band     (band) int64 1 2 3 4 5 6 7 9 10 11
Attributes:
    transform:      (30.0, 0.0, 395385.0, 0.0, -30.0, 4582215.0)
    crs:            +init=epsg:32618
    res:            (30.0, 30.0)
    is_tiled:       1
    nodatavals:     (nan,)
    scales:         (1.0,)
    offsets:        (0.0,)
    AREA_OR_POINT:  Point

array([[[10702, 10162, 10361, ..., 11681, 11489, 15594],
        [10870, 10376, 10122, ..., 11620, 11477, 12042],
        [10429, 10147, 10063, ..., 11455, 11876, 11790],
        ...,
        [11944, 11696, 11626, ..., 11192, 11404, 10923],
        [12406, 11555, 11155, ..., 11516, 11959, 10766],
        [11701, 11232, 10657, ..., 10515, 11475, 10926]],

       [[ 9809,  9147,  9390, ..., 10848, 10702, 15408],
        [ 9989,  9405,  9139, ..., 10831, 10660, 11361],
        [ 9439,  9083,  8981, ..., 10654, 11141, 11073],
        ...,
        [11220, 10853, 10741, ..., 10318, 10511,  9950],
        [11765, 10743, 10259, ..., 10646, 11378,  9829],
        [10889, 10365,  9630, ...,  9500, 10573, 10008]],

       [[ 9042,  8466,  8889, ..., 10014,  9647, 14981],
        [ 9699,  8714,  8596, ...,  9866,  9783, 11186],
        [ 8623,  8457,  8334, ...,  9688, 10474,  9993],
        ...,
...
        ...,
        [ 5027,  5028,  5038, ...,  5035,  5037,  5029],
        [ 5058,  5021,  5023, ...,  5035,  5041,  5031],
        [ 5036,  5041,  5040, ...,  5036,  5044,  5035]],

       [[29033, 28976, 28913, ..., 32614, 32577, 32501],
        [28940, 28904, 28858, ..., 32516, 32545, 32545],
        [28882, 28879, 28854, ..., 32431, 32527, 32563],
        ...,
        [30094, 29929, 29713, ..., 29521, 29525, 29429],
        [30140, 29972, 29696, ..., 29556, 29516, 29398],
        [30133, 29960, 29614, ..., 29587, 29533, 29424]],

       [[25558, 25519, 25492, ..., 27680, 27650, 27619],
        [25503, 25450, 25402, ..., 27636, 27630, 27639],
        [25473, 25434, 25378, ..., 27609, 27668, 27667],
        ...,
        [26126, 26055, 25934, ..., 25622, 25586, 25520],
        [26149, 26077, 25935, ..., 25651, 25594, 25507],
        [26147, 26050, 25856, ..., 25696, 25644, 25571]]], dtype=uint16)

array([476070., 476100., 476130., ..., 503700., 503730., 503760.])

array([4443060., 4443030., 4443000., ..., 4413150., 4413120., 4413090.])

array([ 1,  2,  3,  4,  5,  6,  7,  9, 10, 11])

# Original data was about 8000 x 8000 in size
ds.shape

(10, 7871, 7741)

# Sliced data is only about 1000 x 1000 in size!
subset.shape

(10, 1000, 924)

Let's plot band 3 of the Landsat data¶

# City limits plot
limits_plot = city_limits.hvplot.polygons(
    line_color="white", alpha=0, line_alpha=1, crs=32618, geo=True
)

# Landsat plot
landsat_plot = subset.sel(band=3).hvplot.image(
    x="x",
    y="y",
    rasterize=True,
    cmap="viridis",
    frame_width=500,
    frame_height=500,
    geo=True,
    crs=32618,
)

# Combined
landsat_plot * limits_plot

/Users/nhand/opt/miniconda3/envs/musa-550-fall-2020/lib/python3.7/site-packages/holoviews/plotting/util.py:685: MatplotlibDeprecationWarning: The global colormaps dictionary is no longer considered public API.
  [cmap for cmap in cm.cmap_d if not
/Users/nhand/opt/miniconda3/envs/musa-550-fall-2020/lib/python3.7/site-packages/holoviews/plotting/util.py:685: MatplotlibDeprecationWarning: The global colormaps dictionary is no longer considered public API.
  [cmap for cmap in cm.cmap_d if not

Calculate the NDVI¶

Remember, NDVI = (NIR - Red) / (NIR + Red)

band_info.head()

NIR is band 5 and Red is band 4

# Selet the bands we need
NIR = subset.sel(band=5)
RED = subset.sel(band=4)

# Calculate the NDVI
NDVI = (NIR - RED) / (NIR + RED)
NDVI = NDVI.where(NDVI < np.inf)

NDVI.head()

<xarray.DataArray (y: 5, x: 5)>
array([[0.34690712, 0.43934922, 0.45820226, 0.45798014, 0.38205128],
       [0.33747045, 0.40266976, 0.47276229, 0.44722264, 0.4530777 ],
       [0.40783302, 0.44093268, 0.47419128, 0.49448025, 0.49341935],
       [0.40190311, 0.39986498, 0.46889397, 0.48497283, 0.48686142],
       [0.47248631, 0.47255625, 0.46365524, 0.37692424, 0.36055777]])
Coordinates:
  * x        (x) float64 4.761e+05 4.761e+05 4.761e+05 4.762e+05 4.762e+05
  * y        (y) float64 4.443e+06 4.443e+06 4.443e+06 4.443e+06 4.443e+06

array([[0.34690712, 0.43934922, 0.45820226, 0.45798014, 0.38205128],
       [0.33747045, 0.40266976, 0.47276229, 0.44722264, 0.4530777 ],
       [0.40783302, 0.44093268, 0.47419128, 0.49448025, 0.49341935],
       [0.40190311, 0.39986498, 0.46889397, 0.48497283, 0.48686142],
       [0.47248631, 0.47255625, 0.46365524, 0.37692424, 0.36055777]])

array([476070., 476100., 476130., 476160., 476190.])

array([4443060., 4443030., 4443000., 4442970., 4442940.])

# NEW: Use datashader to plot the NDVI
p = NDVI.hvplot.image(
    x="x",
    y="y",
    geo=True,
    crs=32618,
    datashade=True, # NEW: USE DATASHADER
    project=True, # NEW: PROJECT THE DATA BEFORE PLOTTING
    frame_height=500,
    frame_width=500,
    cmap="viridis",
)

# NEW: Use a transparent rasterized version of the plot for hover text
# No hover text available with datashaded images so we can use the rasterized version
p_hover = NDVI.hvplot(
    x="x",
    y="y",
    crs=32618,
    geo=True,
    rasterize=True,
    frame_height=500,
    frame_width=500,
    alpha=0, # SET ALPHA=0 TO MAKE THIS TRANSPARENT
    colorbar=False,
)

# City limits
limits_plot = city_limits.hvplot.polygons(
    geo=True, crs=32618, line_color="white", line_width=2, alpha=0, line_alpha=1
)

p * p_hover * limits_plot

/Users/nhand/opt/miniconda3/envs/musa-550-fall-2020/lib/python3.7/site-packages/holoviews/plotting/util.py:685: MatplotlibDeprecationWarning: The global colormaps dictionary is no longer considered public API.
  [cmap for cmap in cm.cmap_d if not
/Users/nhand/opt/miniconda3/envs/musa-550-fall-2020/lib/python3.7/site-packages/holoviews/plotting/util.py:685: MatplotlibDeprecationWarning: The global colormaps dictionary is no longer considered public API.
  [cmap for cmap in cm.cmap_d if not
/Users/nhand/opt/miniconda3/envs/musa-550-fall-2020/lib/python3.7/site-packages/holoviews/plotting/util.py:685: MatplotlibDeprecationWarning: The global colormaps dictionary is no longer considered public API.
  [cmap for cmap in cm.cmap_d if not

A couple of notes¶

Notice that we leveraged datashader algorithm to plot the NDVI by specifying the datashade=True keyword
Hover tools won't work with datashaded images — instead, we overlaid a transparent rasterized version of the image, which shows the mean pixel values across the image

Now, on to land surface temperature¶

Given the NDVI calculated above, we can determine land surface temperature
But, the data must be cleaned first! Several atmospheric corrections and transformations must be applied.
We'll use existing an existing package called rio_toa to do this

We'll also need to specify one more for transforming satellite temperature (brightness temp) to land surface temperature.
For these calculations we'll use both Thermal Infrared bands - 10 and 11.

from rio_toa import brightness_temp, toa_utils

def land_surface_temp(BT, w, NDVI):
    """Calculate land surface temperature of Landsat 8
    
    temp = BT/1 + w * (BT /p) * ln(e)
    
    BT = At Satellite temperature (brightness)
    w = wavelength of emitted radiance (μm)
    
    where p = h * c / s (1.439e-2 mK)
    
    h = Planck's constant (Js)
    s = Boltzmann constant (J/K)
    c = velocity of light (m/s)
    """
    h = 6.626e-34
    s = 1.38e-23
    c = 2.998e8

    p = (h * c / s) * 1e6

    Pv = (NDVI - NDVI.min() / NDVI.max() - NDVI.min()) ** 2
    e = 0.004 * Pv + 0.986

    return BT / 1 + w * (BT / p) * np.log(e)

def land_surface_temp_for_band(band):
    """
    Utility function to get land surface temperature for a specific band
    """
    # params from general Landsat info
    w = band_info.loc[band]["Nominal Wavelength (µm)"]

    # params from specific Landsat data text file for these images
    ML = metadata[f"RADIANCE_MULT_BAND_{band}"]
    AL = metadata[f"RADIANCE_ADD_BAND_{band}"]
    K1 = metadata[f"K1_CONSTANT_BAND_{band}"]
    K2 = metadata[f"K2_CONSTANT_BAND_{band}"]

    at_satellite_temp = brightness_temp.brightness_temp(
        subset.sel(band=band).values, ML, AL, K1, K2
    )

    return land_surface_temp(at_satellite_temp, w, NDVI)

Get land surface temp for band 10¶

# temperature for band 10
band = 10
temp_10 = land_surface_temp_for_band(band).compute()

# convert to Farenheit
temp_10_f = toa_utils.temp_rescale(temp_10, 'F')

# Save the numpy array as an xarray
band_10 = subset.sel(band=band).copy(data=temp_10_f)

Get the land surface temp for band 11¶

# Get the land surface temp
band = 11
temp_11 = land_surface_temp_for_band(band).compute()

# Convert to Farenheit
temp_11_f = toa_utils.temp_rescale(temp_11, "F")

# Save as an xarray DataArray
band_11 = subset.sel(band=band).copy(data=temp_11_f)

Plot both temperatures side-by-side¶

# plot for band 10
plot_10 = band_10.hvplot.image(
    x="x",
    y="y",
    rasterize=True,
    cmap="fire_r",
    crs=32618,
    geo=True,
    frame_height=400,
    frame_width=400,
    title="Band 10",
)

# plot for band 11
plot_11 = band_11.hvplot.image(
    x="x",
    y="y",
    rasterize=True,
    cmap="fire_r",
    crs=32618,
    geo=True,
    frame_height=400,
    frame_width=400,
    title="Band 11",
)

plot_10 + plot_11

/Users/nhand/opt/miniconda3/envs/musa-550-fall-2020/lib/python3.7/site-packages/holoviews/plotting/util.py:685: MatplotlibDeprecationWarning: The global colormaps dictionary is no longer considered public API.
  [cmap for cmap in cm.cmap_d if not
/Users/nhand/opt/miniconda3/envs/musa-550-fall-2020/lib/python3.7/site-packages/holoviews/plotting/util.py:685: MatplotlibDeprecationWarning: The global colormaps dictionary is no longer considered public API.
  [cmap for cmap in cm.cmap_d if not
/Users/nhand/opt/miniconda3/envs/musa-550-fall-2020/lib/python3.7/site-packages/holoviews/plotting/util.py:685: MatplotlibDeprecationWarning: The global colormaps dictionary is no longer considered public API.
  [cmap for cmap in cm.cmap_d if not
/Users/nhand/opt/miniconda3/envs/musa-550-fall-2020/lib/python3.7/site-packages/holoviews/plotting/util.py:685: MatplotlibDeprecationWarning: The global colormaps dictionary is no longer considered public API.
  [cmap for cmap in cm.cmap_d if not

Plot the final product (the average of the bands)¶

# Let's average the results of these two bands
mean_temp_f = (band_10 + band_11) / 2

# IMPORTANT: copy over the metadata
mean_temp_f.attrs = band_10.attrs

mean_temp_f

<xarray.DataArray (y: 1000, x: 924)>
array([[79.40579086, 79.18871976, 78.98910218, ..., 91.86524855,
        91.72038357, 91.49660753],
       [79.07306227, 78.86416093, 78.64685031, ..., 91.56712235,
        91.60934815, 91.6311314 ],
       [78.87625297, 78.77157573, 78.57777687, ..., 91.33527213,
        91.66605844, 91.73405961],
       ...,
       [83.01805262, 82.50340008, 81.75768196, ..., 80.579109  ,
        80.49671178, 80.13207064],
       [83.16916028, 82.64632139, 81.7252    , ..., 80.72431108,
        80.4982923 , 80.03521661],
       [83.14992768, 82.55447768, 81.35868618, ..., 80.90148737,
        80.65920451, 80.25022931]])
Coordinates:
  * x        (x) float64 4.761e+05 4.761e+05 4.761e+05 ... 5.037e+05 5.038e+05
  * y        (y) float64 4.443e+06 4.443e+06 4.443e+06 ... 4.413e+06 4.413e+06
Attributes:
    transform:      (30.0, 0.0, 395385.0, 0.0, -30.0, 4582215.0)
    crs:            +init=epsg:32618
    res:            (30.0, 30.0)
    is_tiled:       1
    nodatavals:     (nan,)
    scales:         (1.0,)
    offsets:        (0.0,)
    AREA_OR_POINT:  Point

array([[79.40579086, 79.18871976, 78.98910218, ..., 91.86524855,
        91.72038357, 91.49660753],
       [79.07306227, 78.86416093, 78.64685031, ..., 91.56712235,
        91.60934815, 91.6311314 ],
       [78.87625297, 78.77157573, 78.57777687, ..., 91.33527213,
        91.66605844, 91.73405961],
       ...,
       [83.01805262, 82.50340008, 81.75768196, ..., 80.579109  ,
        80.49671178, 80.13207064],
       [83.16916028, 82.64632139, 81.7252    , ..., 80.72431108,
        80.4982923 , 80.03521661],
       [83.14992768, 82.55447768, 81.35868618, ..., 80.90148737,
        80.65920451, 80.25022931]])

array([476070., 476100., 476130., ..., 503700., 503730., 503760.])

array([4443060., 4443030., 4443000., ..., 4413150., 4413120., 4413090.])

p = mean_temp_f.hvplot.image(
    x="x",
    y="y",
    rasterize=True,
    crs=32618,
    geo=True,
    frame_width=600,
    frame_height=600,
    cmap="rainbow",
    alpha=0.5,
    legend=False,
    title="Mean Surface Temp (F)"
)

img = p * EsriImagery

img

/Users/nhand/opt/miniconda3/envs/musa-550-fall-2020/lib/python3.7/site-packages/holoviews/plotting/util.py:685: MatplotlibDeprecationWarning: The global colormaps dictionary is no longer considered public API.
  [cmap for cmap in cm.cmap_d if not
/Users/nhand/opt/miniconda3/envs/musa-550-fall-2020/lib/python3.7/site-packages/holoviews/plotting/util.py:685: MatplotlibDeprecationWarning: The global colormaps dictionary is no longer considered public API.
  [cmap for cmap in cm.cmap_d if not

Use the zoom tool to identify hot spots¶

Key insight: color of the building roof plays a very important role in urban heat islands

Exercise: isolate hot spots on the map¶

We can use the .where() function in xarray to identify the heat islands across the city

To do:

Select only those pixels with mean temperatures about 90 degrees.
Remember, the where() function takes a boolean array where each pixel is True/False based on the selection criteria
Use hvplot to plot the imagery for Philadelphia, with the hotspots overlaid

hot_spots = mean_temp_f.where(mean_temp_f > 90)

hot_spots_plot = hot_spots.hvplot.image(
    x="x",
    y="y",
    cmap="fire",
    crs=32618,
    geo=True,
    frame_height=500,
    frame_width=500,
    colorbar=False,
    legend=False,
    rasterize=True,
    title="Mean Temp (F) > 90",
    alpha=0.7
)


hot_spots_plot * EsriImagery

/Users/nhand/opt/miniconda3/envs/musa-550-fall-2020/lib/python3.7/site-packages/holoviews/plotting/util.py:685: MatplotlibDeprecationWarning: The global colormaps dictionary is no longer considered public API.
  [cmap for cmap in cm.cmap_d if not
/Users/nhand/opt/miniconda3/envs/musa-550-fall-2020/lib/python3.7/site-packages/holoviews/plotting/util.py:685: MatplotlibDeprecationWarning: The global colormaps dictionary is no longer considered public API.
  [cmap for cmap in cm.cmap_d if not

What about cold spots?¶

Let's select things less than 75 degrees as well...

cold_spots = mean_temp_f.where(mean_temp_f < 75)

cold_spots_plot = cold_spots.hvplot.image(
    x="x",
    y="y",
    cmap="fire",
    geo=True,
    crs=32618,
    frame_height=500,
    frame_width=500,
    colorbar=False,
    legend=False,
    rasterize=True,
    title="Mean Temp (F) < 75",
    alpha=0.5
)


cold_spots_plot * EsriImagery

/Users/nhand/opt/miniconda3/envs/musa-550-fall-2020/lib/python3.7/site-packages/holoviews/plotting/util.py:685: MatplotlibDeprecationWarning: The global colormaps dictionary is no longer considered public API.
  [cmap for cmap in cm.cmap_d if not
/Users/nhand/opt/miniconda3/envs/musa-550-fall-2020/lib/python3.7/site-packages/holoviews/plotting/util.py:685: MatplotlibDeprecationWarning: The global colormaps dictionary is no longer considered public API.
  [cmap for cmap in cm.cmap_d if not

Cold spots pick out bodies of water and parks / areas of high vegetation!¶

Exercise: Exploring the relationship between temperature and parks/trees¶

In this section, we'll use zonal statistics to calculate the average temperatures for neighborhoods and parks, and compare this to the number of street trees in the city.

Saving xarray data to a geoTIFF file¶

Unfortunately, the zonal statistics calculation requires a rasterio file — so, we will need to to save our mean temperature xarray DataArray to a .tif file
This is not built in to xarray (yet), but we can use the rioxarray package

import rioxarray

# Save to a raster GeoTIFF file!
mean_temp_f.rio.to_raster("./mean_temp_f.tif")

/Users/nhand/opt/miniconda3/envs/musa-550-fall-2020/lib/python3.7/site-packages/pyproj/crs/crs.py:280: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6
  projstring = _prepare_from_string(projparams)

Part 1.1: Calculate the mean temperature for parks in Philadelphia¶

The parks dataset is available in the data folder: ./data/parks.geojson
Use the rasterstats package to calculate the zonal stats
Look back at week 4's lecture for help!
Make sure the CRS matches!

from rasterstats import zonal_stats
import rasterio as rio

# Open the GeoTIFF file using rasterio
f = rio.open('./mean_temp_f.tif')
f

<open DatasetReader name='./mean_temp_f.tif' mode='r'>

# Load the parks
parks = gpd.read_file("./data/parks.geojson")

# Convert to the CRS from the mean temp file
CRS = f.crs.data
print(CRS)

{'init': 'epsg:32618'}

parks = parks.to_crs(CRS['init'])

parks.head()

# Use zonal_stats to get the mean temperature values within each park
# FIRST ARGUMENT: vector polygons
# SECOND ARGUMENT: mean temperature raster data
park_stats = zonal_stats(parks, mean_temp_f.values, affine=f.transform, stats=['mean'])

# Store the mean temp back in the original parks dataframe
parks['mean_temp'] = [s['mean'] for s in park_stats]

/Users/nhand/opt/miniconda3/envs/musa-550-fall-2020/lib/python3.7/site-packages/rasterstats/io.py:301: UserWarning: Setting nodata to -999; specify nodata explicitly
  warnings.warn("Setting nodata to -999; specify nodata explicitly")

parks.head()

Part 1.2: Plot a histogram of temperature values for each park¶

Before plotting the histogram, subtract out the mean temperature of the entire city
This will quickly let us see which parks have average temperatures above or below the citywide average

parks['mean_temp'].values

array([74.76537286, 77.36899338, 75.28063989, 78.34285007, 79.07602501,
       78.73502591, 82.94328736, 80.24579961, 80.51562072, 81.07076997,
       75.76226321, 83.27707029, 82.43015566, 76.59546164, 80.24201654,
       78.11569974, 80.00318427, 78.0216605 , 80.80303939, 79.00430924,
       78.35419148, 77.31818993, 78.94752018, 76.01341112, 76.68484593,
       80.0663878 , 76.67103024, 78.36557563, 78.76895037, 78.18271263,
       80.70693784, 76.1582538 , 78.4536133 , 76.90415997, 77.64608538,
       76.67823766, 79.44459144, 80.31134216, 76.6002728 , 80.45758355,
       76.31552813, 79.49211752, 77.1441184 , 75.15166103, 79.51963195,
       75.01521579, 77.32059183, 76.81428215, 79.84756402, 80.21862058,
       74.86457275, 80.50208605, 77.5941335 , 78.77740521, 82.98356878,
       78.23915867, 78.4997743 , 81.07401752, 76.10967079, 77.57387157,
       78.1165108 , 74.2506532 , 74.91180045])

# Create our figure and axes
fig, ax = plt.subplots(figsize=(8,6))

# Citywide mean temp
citywide_mean_temp = np.nanmean(mean_temp_f) # use nanmean() to skip NaN pixels automatically!

# Subtract out city wide median
# NOTE: the ".values" here will grab the numpy array from the pandas dataframe
park_temps = parks['mean_temp'].values - citywide_mean_temp

# Plot
ax.hist(park_temps, bins='auto')

# Add a vertical line at x=0
ax.axvline(x=0, lw=2, color='black')


# Format
ax.set_xlabel("Difference from Citywide Mean Temp (Degrees F)", fontsize=18)
ax.set_ylabel("Number of Parks", fontsize=18);

The majority of the parks are cooler than the citywide mean!¶

Part 2.1: Calculate the average temperature in Philadelphia's neighborhoods¶

Neighborhoods can be loaded from: https://github.com/azavea/geo-data/raw/master/Neighborhoods_Philadelphia/Neighborhoods_Philadelphia.geojson
Again, use zonal statistics to do the calculation
Make sure the CRS matches!

# Load the neighborhoods
url = "https://github.com/azavea/geo-data/raw/master/Neighborhoods_Philadelphia/Neighborhoods_Philadelphia.geojson"
hoods = gpd.read_file(url)

f.crs.data

{'init': 'epsg:32618'}

# Convert to the CRS of the Landsat data
hoods = hoods.to_crs(f.crs.data["init"])

# Get the mean temp values within the neighborhoods
# FIRST ARGUMENT: vector polygons
# SECOND ARGUMENT:  raster temp data
stats = zonal_stats(hoods, mean_temp_f.values, affine=f.transform, stats=['mean'])
hoods['mean_temp'] = [s['mean'] for s in stats]

/Users/nhand/opt/miniconda3/envs/musa-550-fall-2020/lib/python3.7/site-packages/rasterstats/io.py:301: UserWarning: Setting nodata to -999; specify nodata explicitly
  warnings.warn("Setting nodata to -999; specify nodata explicitly")

Part 2.2: Plot a choropleth map of the neighborhoods, coloring by temperature¶

Make a static chart with Geopandas
Make an interactive chart with Altair / hvplot, so we can quickly identify the neighborhoods with the highest/lowest temperatures!

hoods.head()

Plot using geopandas¶

# Initialize the figure/axes
fig, ax = plt.subplots(figsize=(10,10))

# Make the choropleth map
hoods.plot(cmap='viridis', column='mean_temp', ax=ax)

# Format
ax.set_axis_off()
ax.set_aspect("equal")

Plot using altair¶

import altair as alt

# Plot interactive map
# IMPORTANT: remember you need to convert the input data to EPSG=4326 for altair
meanT_chart = (
    alt.Chart(hoods.to_crs(epsg=4326))
    .mark_geoshape(stroke="#444444")
    .properties(
        width=400,
        height=450,
        title="Mean Neighborhood Temperatures in Philadelphia",
    )
    .encode(
        tooltip=[
            alt.Tooltip("mean_temp:Q", format=".1f", title="Mean Temp (℉)"),
            alt.Tooltip("listname:N", title="Neighborhood"),
        ],
        color=alt.Color(
            "mean_temp:Q",
            scale=alt.Scale(scheme="viridis"),
            legend=alt.Legend(orient="bottom", title="Mean Temp (℉)"),
        ),
    )
    .configure(background="white")
)

meanT_chart

Part 2.3: Calculate the number of street trees per neighborhood¶

We can use the street tree inventory from OpenDataPhilly
The tree data can be loaded from: http://data.phl.opendata.arcgis.com/datasets/957f032f9c874327a1ad800abd887d17_0.geojson
You will need to use the sjoin() function from geopandas to join the tree dataset to the neighborhoods dataset and then groupby the neighborhoods to find the number of trees per neighborhood

# Load the street trees inventory
url = "http://data.phl.opendata.arcgis.com/datasets/957f032f9c874327a1ad800abd887d17_0.geojson"
trees = gpd.read_file(url)

trees.head()

# Conver to the CRS of the Landsat data
trees = trees.to_crs(f.crs.data['init'])

# Add the neighborhood name to the trees dataset
trees = gpd.sjoin(trees, hoods, op='within')

trees.head()

# Calculate the number of trees per neighborhood
trees_per_hood = trees.groupby('listname').size().reset_index(name='trees_per_hood')

trees_per_hood.head()

# Merge the hood geometries back in
hoods = hoods.merge(trees_per_hood, on='listname')

2.3.1 a) Plot using geopandas first¶

# Initialize the figure
fig, ax = plt.subplots(figsize=(10,10))

# Plot
hoods.to_crs(epsg=3857).plot(cmap='viridis', column='trees_per_hood', ax=ax)

# Format
ax.set_axis_off()
ax.set_aspect("equal")

2.3.1 b) Plot using altair¶

# plot map, remembering to convert to EPSG=4326 first
ntrees_chart = (
    alt.Chart(hoods.to_crs(epsg=4326))
    .mark_geoshape(stroke="#444444")
    .properties(
        width=400,
        height=450,
        title="The Distribution of Street Trees in Philadelphia",
    )
    .encode(
        tooltip=[
            alt.Tooltip(f"trees_per_hood:Q", format=".1f", title="# Trees"),
            alt.Tooltip("listname:N", title="Neighborhood"),
        ],
        color=alt.Color(
            f"trees_per_hood:Q",
            scale=alt.Scale(scheme="viridis"),
            legend=alt.Legend(orient="bottom", title="# Trees"),
        ),
    )
)

ntrees_chart

Part 2.4: Use seaborn to explore the relationship b/w street trees and temperature¶

Use the kdeplot() function to plot the relationship between the number of trees per neighborhood and the mean temperature per neighborhood
You will need to merge() the relevant data frames together based on the neighborhood name

What are the major trends? Do you see the relationship you expect?"

import seaborn as sns

fig, ax = plt.subplots(figsize=(8,6))

# plot
sns.kdeplot(hoods['trees_per_hood'], hoods['mean_temp'], shade=True, ax=ax)

# format
ax.set_xlabel("Number of Trees per Neighborhood", fontsize=18)
ax.set_ylabel("Mean Temperature per Neighborhood", fontsize=18);

	OBJECTID	ASSET_NAME	SITE_NAME	CHILD_OF	ADDRESS	TYPE	USE_	DESCRIPTION	SQ_FEET	ACREAGE	ZIPCODE	ALLIAS	NOTES	TENANT	LABEL	DPP_ASSET_ID	Shape__Area	Shape__Length	geometry
0	7	Wissahickon Valley Park	Wissahickon Valley Park	Wissahickon Valley Park	None	LAND	REGIONAL_CONSERVATION_WATERSHED	NO_PROGRAM	9.078309e+07	2084.101326	19128		MAP	SITE	Wissahickon Valley Park	1357	1.441162e+07	71462.556702	MULTIPOLYGON (((484101.476 4431051.989, 484099...
1	8	West Fairmount Park	West Fairmount Park	West Fairmount Park		LAND	REGIONAL_CONSERVATION_WATERSHED	NO_PROGRAM	6.078159e+07	1395.358890	19131		MAP	SITE	West Fairmount Park	1714	9.631203e+06	25967.819064	MULTIPOLYGON (((482736.681 4428824.579, 482728...
2	23	Pennypack Park	Pennypack Park	Pennypack Park		LAND	REGIONAL_CONSERVATION_WATERSHED	NO_PROGRAM	6.023748e+07	1382.867808	19152	Verree Rd Interpretive Center	MAP	SITE	Pennypack Park	1651	9.566914e+06	41487.394790	MULTIPOLYGON (((497133.192 4434667.950, 497123...
3	24	East Fairmount Park	East Fairmount Park	East Fairmount Park		LAND	REGIONAL_CONSERVATION_WATERSHED	NO_PROGRAM	2.871642e+07	659.240959	19121		MAP	SITE	East Fairmount Park	1713	4.549582e+06	21499.126097	POLYGON ((484539.743 4424162.545, 484620.184 4...
4	25	Tacony Creek Park	Tacony Creek Park	Tacony Creek Park		LAND	REGIONAL_CONSERVATION_WATERSHED	NO_PROGRAM	1.388049e+07	318.653500	19120		MAP	SITE	Tacony Creek Park	1961	2.201840e+06	19978.610852	MULTIPOLYGON (((491712.882 4429633.244, 491713...

	OBJECTID	ASSET_NAME	SITE_NAME	CHILD_OF	ADDRESS	TYPE	USE_	DESCRIPTION	SQ_FEET	ACREAGE	ZIPCODE	ALLIAS	NOTES	TENANT	LABEL	DPP_ASSET_ID	Shape__Area	Shape__Length	geometry	mean_temp
0	7	Wissahickon Valley Park	Wissahickon Valley Park	Wissahickon Valley Park	None	LAND	REGIONAL_CONSERVATION_WATERSHED	NO_PROGRAM	9.078309e+07	2084.101326	19128		MAP	SITE	Wissahickon Valley Park	1357	1.441162e+07	71462.556702	MULTIPOLYGON (((484101.476 4431051.989, 484099...	74.765373
1	8	West Fairmount Park	West Fairmount Park	West Fairmount Park		LAND	REGIONAL_CONSERVATION_WATERSHED	NO_PROGRAM	6.078159e+07	1395.358890	19131		MAP	SITE	West Fairmount Park	1714	9.631203e+06	25967.819064	MULTIPOLYGON (((482736.681 4428824.579, 482728...	77.368993
2	23	Pennypack Park	Pennypack Park	Pennypack Park		LAND	REGIONAL_CONSERVATION_WATERSHED	NO_PROGRAM	6.023748e+07	1382.867808	19152	Verree Rd Interpretive Center	MAP	SITE	Pennypack Park	1651	9.566914e+06	41487.394790	MULTIPOLYGON (((497133.192 4434667.950, 497123...	75.280640
3	24	East Fairmount Park	East Fairmount Park	East Fairmount Park		LAND	REGIONAL_CONSERVATION_WATERSHED	NO_PROGRAM	2.871642e+07	659.240959	19121		MAP	SITE	East Fairmount Park	1713	4.549582e+06	21499.126097	POLYGON ((484539.743 4424162.545, 484620.184 4...	78.342850
4	25	Tacony Creek Park	Tacony Creek Park	Tacony Creek Park		LAND	REGIONAL_CONSERVATION_WATERSHED	NO_PROGRAM	1.388049e+07	318.653500	19120		MAP	SITE	Tacony Creek Park	1961	2.201840e+06	19978.610852	MULTIPOLYGON (((491712.882 4429633.244, 491713...	79.076025

	name	listname	mapname	shape_leng	shape_area	cartodb_id	created_at	updated_at	geometry	mean_temp
0	PENNYPACK_PARK	Pennypack Park	Pennypack Park	87084.285589	6.014076e+07	9	2013-03-19T17:41:50	2013-03-19T17:41:50	MULTIPOLYGON (((495187.170 4437462.583, 495168...	75.242137
1	OVERBROOK	Overbrook	Overbrook	57004.924607	7.692499e+07	138	2013-03-19T17:41:50	2013-03-19T17:41:50	MULTIPOLYGON (((480600.303 4425273.779, 480374...	82.626477
2	GERMANTOWN_SOUTHWEST	Germantown, Southwest	Southwest Germantown	14880.743608	1.441867e+07	59	2013-03-19T17:41:50	2013-03-19T17:41:50	MULTIPOLYGON (((486170.608 4430910.053, 486224...	85.047291
3	EAST_PARKSIDE	East Parkside	East Parkside	10885.781535	4.231000e+06	129	2013-03-19T17:41:50	2013-03-19T17:41:50	MULTIPOLYGON (((482980.741 4424959.311, 483032...	85.400930
4	GERMANY_HILL	Germany Hill	Germany Hill	13041.939087	6.949968e+06	49	2013-03-19T17:41:50	2013-03-19T17:41:50	MULTIPOLYGON (((480614.276 4431692.476, 480492...	82.101317

	OBJECTID	SPECIES	STATUS	DBH	geometry	index_right	name	listname	mapname	shape_leng	shape_area	cartodb_id	created_at	updated_at	mean_temp
0	1	None	None	None	POINT (486675.815 4427221.011)	21	STANTON	Stanton	Stanton	16115.745169	1.265486e+07	79	2013-03-19T17:41:50	2013-03-19T17:41:50	88.013481
1	2	None	None	None	POINT (486529.267 4427168.713)	21	STANTON	Stanton	Stanton	16115.745169	1.265486e+07	79	2013-03-19T17:41:50	2013-03-19T17:41:50	88.013481
4	5	None	None	None	POINT (486552.614 4427148.103)	21	STANTON	Stanton	Stanton	16115.745169	1.265486e+07	79	2013-03-19T17:41:50	2013-03-19T17:41:50	88.013481
5	6	None	None	None	POINT (486425.320 4427169.794)	21	STANTON	Stanton	Stanton	16115.745169	1.265486e+07	79	2013-03-19T17:41:50	2013-03-19T17:41:50	88.013481
21102	21103	None	None	None	POINT (486592.866 4427124.326)	21	STANTON	Stanton	Stanton	16115.745169	1.265486e+07	79	2013-03-19T17:41:50	2013-03-19T17:41:50	88.013481

	Name	Wavelength Range (µm)	Nominal Wavelength (µm)	Resolution (m)	Description
Band
1	Aerosol	0.43 - 0.45	0.440	30	Coastal aerosol
2	Blue	0.45 - 0.51	0.480	30	Blue
3	Green	0.53 - 0.59	0.560	30	Green
4	Red	0.64 - 0.67	0.655	30	Red
5	NIR	0.85 - 0.88	0.865	30	Near Infrared (NIR)
6	SWIR1	1.57 - 1.65	1.610	30	Shortwave Infrared (SWIR) 1
7	SWIR2	2.11 - 2.29	2.200	30	Shortwave Infrared (SWIR) 2
8	Panc	0.50 - 0.68	0.590	15	Panchromatic
9	Cirrus	1.36 - 1.38	1.370	30	Cirrus
10	TIRS1	10.60 - 11.19	10.895	100 * (30)	Thermal Infrared (TIRS) 1
11	TIRS2	11.50 - 12.51	12.005	100 * (30)	Thermal Infrared (TIRS) 2

	listname	trees_per_hood
0	Academy Gardens	119
1	Airport	56
2	Allegheny West	773
3	Andorra	157
4	Aston-Woodbridge	123