Mapping Tasiyagnunpa (Western Meadowlark) migration¶

Introduction to vector data operations

Tasiyagnunpa (or Western Meadowlark, or sturnella neglecta) migrates each year to next on the Great Plains in the United States. Using crowd-sourced observations of these birds, we can see that migration happening throughout the year.

Read more about the Lakota connection to Tasiyagnunpa from Native Sun News Today

Set up your reproducible workflow¶

Import Python libraries¶

We will be getting data from a source called GBIF (Global Biodiversity Information Facility). We need a package called pygbif to access the data, which is not included in your environment. Install it by running the cell below:

In [1]:
%%bash
pip install pygbif
Requirement already satisfied: pygbif in /opt/conda/lib/python3.11/site-packages (0.6.4)
Requirement already satisfied: requests>2.7 in /opt/conda/lib/python3.11/site-packages (from pygbif) (2.31.0)
Requirement already satisfied: requests-cache in /opt/conda/lib/python3.11/site-packages (from pygbif) (1.2.0)
Requirement already satisfied: geojson-rewind in /opt/conda/lib/python3.11/site-packages (from pygbif) (1.1.0)
Requirement already satisfied: geomet in /opt/conda/lib/python3.11/site-packages (from pygbif) (1.1.0)
Requirement already satisfied: appdirs>=1.4.3 in /opt/conda/lib/python3.11/site-packages (from pygbif) (1.4.4)
Requirement already satisfied: matplotlib in /opt/conda/lib/python3.11/site-packages (from pygbif) (3.8.4)
Requirement already satisfied: charset-normalizer<4,>=2 in /opt/conda/lib/python3.11/site-packages (from requests>2.7->pygbif) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.11/site-packages (from requests>2.7->pygbif) (3.7)
Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/conda/lib/python3.11/site-packages (from requests>2.7->pygbif) (2.2.1)
Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.11/site-packages (from requests>2.7->pygbif) (2024.2.2)
Requirement already satisfied: click in /opt/conda/lib/python3.11/site-packages (from geomet->pygbif) (8.1.7)
Requirement already satisfied: contourpy>=1.0.1 in /opt/conda/lib/python3.11/site-packages (from matplotlib->pygbif) (1.2.0)
Requirement already satisfied: cycler>=0.10 in /opt/conda/lib/python3.11/site-packages (from matplotlib->pygbif) (0.11.0)
Requirement already satisfied: fonttools>=4.22.0 in /opt/conda/lib/python3.11/site-packages (from matplotlib->pygbif) (4.51.0)
Requirement already satisfied: kiwisolver>=1.3.1 in /opt/conda/lib/python3.11/site-packages (from matplotlib->pygbif) (1.4.4)
Requirement already satisfied: numpy>=1.21 in /opt/conda/lib/python3.11/site-packages (from matplotlib->pygbif) (1.24.3)
Requirement already satisfied: packaging>=20.0 in /opt/conda/lib/python3.11/site-packages (from matplotlib->pygbif) (24.0)
Requirement already satisfied: pillow>=8 in /opt/conda/lib/python3.11/site-packages (from matplotlib->pygbif) (10.3.0)
Requirement already satisfied: pyparsing>=2.3.1 in /opt/conda/lib/python3.11/site-packages (from matplotlib->pygbif) (3.0.9)
Requirement already satisfied: python-dateutil>=2.7 in /opt/conda/lib/python3.11/site-packages (from matplotlib->pygbif) (2.9.0)
Requirement already satisfied: attrs>=21.2 in /opt/conda/lib/python3.11/site-packages (from requests-cache->pygbif) (23.2.0)
Requirement already satisfied: cattrs>=22.2 in /opt/conda/lib/python3.11/site-packages (from requests-cache->pygbif) (23.2.3)
Requirement already satisfied: platformdirs>=2.5 in /opt/conda/lib/python3.11/site-packages (from requests-cache->pygbif) (4.2.0)
Requirement already satisfied: url-normalize>=1.4 in /opt/conda/lib/python3.11/site-packages (from requests-cache->pygbif) (1.4.3)
Requirement already satisfied: six>=1.5 in /opt/conda/lib/python3.11/site-packages (from python-dateutil>=2.7->matplotlib->pygbif) (1.16.0)

Your Task: Import packages

Add imports for packages that will help you:

  1. Work with tabular data
  2. Work with geospatial vector data
  3. Make an interactive plot of tabular and/or vector data
In [2]:
import calendar
import os
import pathlib
import requests
import time
import zipfile
from getpass import getpass

import cartopy.crs as ccrs
import panel as pn
import pygbif.occurrences as occ
from glob import glob

import pandas as pd
import geopandas as gpd
import hvplot.pandas
import pygbif.species as species
INFO:NumExpr defaulting to 2 threads.

Create a folder for your data¶

For this challenge, you will need to save some data to your computer. We suggest saving to somewhere in your home folder (e.g. /home/username), rather than to your GitHub repository, since data files can easily become too large for GitHub.

Warning

The home directory is different for every user! Your home directory probably won’t exist on someone else’s computer. Make sure to use code like pathlib.Path.home() to compute the home directory on the computer the code is running on. This is key to writing reproducible and interoperable code.

Your Task: Create a project folder

The code below will help you get started with making a project directory

  1. Replace 'your-project-directory-name-here' and 'your-gbif-data-directory-name-here' with descriptive names
  2. Run the cell
  3. (OPTIONAL) Check in the terminal that you created the directory using the command ls ~/earth-analytics/data
In [3]:
# Create data directory in the home folder
data_dir = os.path.join(
    # Home directory
    pathlib.Path.home(),
    # Earth analytics data directory
    'earth-analytics',
    'data',
    # Project directory
    'species-distribution-ESIIL',
)
os.makedirs(data_dir, exist_ok=True)

# Define the directory name for GBIF data
gbif_dir = os.path.join(data_dir, 'meadowlark-data')

Define your study area – the ecoregions of North America¶

Track observations of Taciyagnunpa across the different ecoregions of North America! You should be able to see changes in the number of observations in each ecoregion throughout the year.

Download and save ecoregion boundaries¶

Your Task

  1. Find the URL for for the level III ecoregion boundaries. You can get ecoregion boundaries from the Environmental Protection Agency (EPA)..
  2. Replace your/url/here with the URL you found, making sure to format it so it is easily readable.
  3. Change all the variable names to descriptive variable names
  4. Run the cell to download and save the data.
In [4]:
# Set up the ecoregions level III boundary URL
ecoregion_url = ("https://gaftp.epa.gov/EPADataCommons/ORD/Ecoregions/"
                  "cec_na/NA_CEC_Eco_Level3.zip")
# Set up a path to save the dataon your machine
ecoregionpath = os.path.join(data_dir, 'NA_CEC_Eco_Level3.zip')

# Don't download twice
if not os.path.exists(ecoregionpath):
    # Download, and don't check the certificate for the EPA
    ecoregions_response = requests.get(ecoregion_url, verify=False)
    # Save the binary data to a file
    with open(ecoregionpath, 'wb') as ecoregions_file:
        ecoregions_file.write(ecoregions_response.content)

Load the ecoregions into Python¶

Your task

Download and save ecoregion boundaries from the EPA:

  1. Replace a_path with the path your created for your ecoregions file.
  2. (optional) Consider renaming and selecting columns to make your GeoDataFrame easier to work with.
  3. Make a quick plot with .plot() to make sure the download worked.
  4. Run the cell to load the data into Python
In [5]:
# Open up the ecoregions boundaries
ecoregions_gdf = (gpd.read_file(ecoregionpath)
.rename(columns={
        'NA_L3NAME': 'name',
        'Shape_Area': 'area'})
    [['name', 'area', 'geometry']]
)
# Name the index so it will match the other data later on
ecoregions_gdf.index.name = 'ecoregion'

# Plot the ecoregions to check download
ecoregions_gdf.plot()
Out[5]:
<Axes: >
No description has been provided for this image

Create a simplified GeoDataFrame for plotting¶

Plotting larger files can be time consuming. The code below will streamline plotting with hvplot by simplifying the geometry, projecting it to a Mercator projection that is compatible with geoviews, and cropping off areas in the Arctic.

Your task

Download and save ecoregion boundaries from the EPA:

  1. Make a copy of your ecoregions GeoDataFrame with the .copy() method, and save it to another variable name. Make sure to do everything else in this cell with your new copy!
  2. Simplify the ecoregions with .simplify(1000), and save it back to the geometry column.
  3. Change the Coordinate Reference System (CRS) to Mercator with .to_crs(ccrs.Mercator())
  4. Use the plotting code in the cell to check that the plotting runs quickly and looks the way you want, making sure to change gdf to YOUR GeoDataFrame name.
In [6]:
# Make a copy of the ecoregions
ecoregion_plot = ecoregions_gdf.copy()

# Simplify the geometry to speed up processing
ecoregion_plot.geometry = ecoregion_plot.simplify(1000)

# Change the CRS to Mercator for mapping
ecoregion_plot = ecoregion_plot.to_crs(ccrs.Mercator())

# Check that the plot runs
ecoregion_plot.hvplot(geo=True, crs=ccrs.Mercator())
Out[6]:

Access locations and times of Tasiyagnunpa encounters¶

For this challenge, you will use a database called the Global Biodiversity Information Facility (GBIF). GBIF is compiled from species observation data all over the world, and includes everything from museum specimens to photos taken by citizen scientists in their backyards.

Your task: Explore GBIF

Before your get started, go to the GBIF occurrences search page and explore the data.

Contribute to open data

You can get your own observations added to GBIF using iNaturalist!

Register and log in to GBIF¶

You will need a GBIF account to complete this challenge. You can use your GitHub account to authenticate with GBIF. Then, run the following code to save your credentials on your computer.

Tip

If you accidentally enter your credentials wrong, you can set reset_credentials=True instead of reset_credentials=False

In [7]:
reset_credentials = False
# GBIF needs a username, password, and email
credentials = dict(
    GBIF_USER=(input, 'GBIF username:'),
    GBIF_PWD=(getpass, 'GBIF password'),
    GBIF_EMAIL=(input, 'GBIF email'),
)
for env_variable, (prompt_func, prompt_text) in credentials.items():
    # Delete credential from environment if requested
    if reset_credentials and (env_variable in os.environ):
        os.environ.pop(env_variable)
    # Ask for credential and save to environment
    if not env_variable in os.environ:
        os.environ[env_variable] = prompt_func(prompt_text)

Get the species key¶

Your task

  1. Replace the species_name with the name of the species you want to look up
  2. Run the code to get the species key
In [8]:
# Query species
species_info = species.name_lookup('sturnella neglecta', rank='SPECIES')

# Get the first result
first_result = species_info['results'][0]

# Get the species key (nubKey)
species_key = first_result['nubKey']

# Check the result
first_result['species'], species_key
Out[8]:
('Sturnella neglecta', 9596413)

Download data from GBIF¶

Your task

  1. Replace csv_file_pattern with a string that will match any .csv file when used in the glob function. HINT: the character * represents any number of any values except the file separator (e.g. /)

  2. Add parameters to the GBIF download function, occ.download() to limit your query to:

    • Sturnella Neglecta observations
    • in north america (NORTH_AMERICA)
    • from 2023
    • with spatial coordinates.
  3. Then, run the download. This can take a few minutes.

In [9]:
# Only download once
gbif_pattern = os.path.join(gbif_dir, '*.csv')
if not glob(gbif_pattern):
    # Submit query to GBIF
    gbif_query = occ.download([
        "continent = NORTH_AMERICA",
        "speciesKey = 9596413",
        "year = 2023",
        "hasCoordinate = TRUE",
    ])
    download_key = gbif_query[0]

    #wait for download to build
    if not 'GBIF_DOWNLOAD_KEY' in os.environ:
        os.environ['GBIF_DOWNLOAD_KEY'] = gbif_query[0]

        # Wait for the download to build
        wait = occ.download_meta(download_key)['status']
        while not wait=='SUCCEEDED':
            wait = occ.download_meta(download_key)['status']
            time.sleep(5)

    # Download GBIF data
    download_info = occ.download_get(
        os.environ['GBIF_DOWNLOAD_KEY'], 
        path=data_dir)

    # Unzip GBIF data
    with zipfile.ZipFile(download_info['path']) as download_zip:
        download_zip.extractall(path=gbif_dir)

# Find the extracted .csv file path
gbif_path = glob(gbif_pattern)[0]

Load the GBIF data into Python¶

Your task

  1. Look at the beginning of the file you downloaded using the code below. What do you think the delimiter is?
  2. Run the following code cell. What happens?
  3. Uncomment and modify the parameters of pd.read_csv() below until your data loads successfully and you have only the columns you want.

You can use the following code to look at the beginning of your file:

In [10]:
!head $gbif_path
gbifID	datasetKey	occurrenceID	kingdom	phylum	class	order	family	genus	species	infraspecificEpithet	taxonRank	scientificName	verbatimScientificName	verbatimScientificNameAuthorship	countryCode	locality	stateProvince	occurrenceStatus	individualCount	publishingOrgKey	decimalLatitude	decimalLongitude	coordinateUncertaintyInMeters	coordinatePrecision	elevation	elevationAccuracy	depth	depthAccuracy	eventDate	day	month	year	taxonKey	speciesKey	basisOfRecord	institutionCode	collectionCode	catalogNumber	recordNumber	identifiedBy	dateIdentified	license	rightsHolder	recordedBy	typeStatus	establishmentMeans	lastInterpreted	mediaType	issue
4413934032	50c9509d-22c7-4a22-a47d-8c48425ef4a7	https://www.inaturalist.org/observations/181625787	Animalia	Chordata	Aves	Passeriformes	Icteridae	Sturnella	Sturnella neglecta		SPECIES	Sturnella neglecta Audubon, 1844	Sturnella neglecta		US		South Dakota	PRESENT		28eb1a3f-1c15-4a95-931a-4af90ecb574d	43.624742	-103.421674	31.0						2023-08-31T14:18	31	8	2023	9596413	9596413	HUMAN_OBSERVATION	iNaturalist	Observations	181625787		Kevin Mortensen	2023-09-03T20:19:03	CC_BY_NC_4_0	Kevin Mortensen	Kevin Mortensen			2024-05-28T04:06:24.994Z	StillImage	COORDINATE_ROUNDED;CONTINENT_DERIVED_FROM_COORDINATES;TAXON_MATCH_TAXON_ID_IGNORED
4028941879	50c9509d-22c7-4a22-a47d-8c48425ef4a7	https://www.inaturalist.org/observations/148038814	Animalia	Chordata	Aves	Passeriformes	Icteridae	Sturnella	Sturnella neglecta		SPECIES	Sturnella neglecta Audubon, 1844	Sturnella neglecta		US		Utah	PRESENT		28eb1a3f-1c15-4a95-931a-4af90ecb574d	41.04703	-112.223504	7200.0						2023-02-04T12:20:58	4	2	2023	9596413	9596413	HUMAN_OBSERVATION	iNaturalist	Observations	148038814		coloradomiks	2023-02-04T19:24:02	CC_BY_NC_4_0	coloradomiks	coloradomiks			2024-05-28T04:05:55.748Z	StillImage	COORDINATE_ROUNDED;CONTINENT_DERIVED_FROM_COORDINATES;TAXON_MATCH_TAXON_ID_IGNORED
4046729169	50c9509d-22c7-4a22-a47d-8c48425ef4a7	https://www.inaturalist.org/observations/149757860	Animalia	Chordata	Aves	Passeriformes	Icteridae	Sturnella	Sturnella neglecta		SPECIES	Sturnella neglecta Audubon, 1844	Sturnella neglecta		US		California	PRESENT		28eb1a3f-1c15-4a95-931a-4af90ecb574d	38.590682	-121.66885							2023-02-09T13:40	9	2	2023	9596413	9596413	HUMAN_OBSERVATION	iNaturalist	Observations	149757860		Jonathan Eisen	2023-02-26T20:15:31	CC_BY_4_0	Jonathan Eisen	Jonathan Eisen			2024-05-28T04:05:46.697Z	StillImage	COORDINATE_ROUNDED;CONTINENT_DERIVED_FROM_COORDINATES;TAXON_MATCH_TAXON_ID_IGNORED
4063055845	50c9509d-22c7-4a22-a47d-8c48425ef4a7	https://www.inaturalist.org/observations/150707489	Animalia	Chordata	Aves	Passeriformes	Icteridae	Sturnella	Sturnella neglecta		SPECIES	Sturnella neglecta Audubon, 1844	Sturnella neglecta		US		New Mexico	PRESENT		28eb1a3f-1c15-4a95-931a-4af90ecb574d	34.210922	-103.318527	4.0						2023-03-09T06:57:47	9	3	2023	9596413	9596413	HUMAN_OBSERVATION	iNaturalist	Observations	150707489		Christopher Rustay	2023-03-10T03:45:03	CC_BY_NC_4_0	Christopher Rustay	Christopher Rustay			2024-05-28T04:00:21.207Z	Sound	COORDINATE_ROUNDED;CONTINENT_DERIVED_FROM_COORDINATES;TAXON_MATCH_TAXON_ID_IGNORED
4067332956	50c9509d-22c7-4a22-a47d-8c48425ef4a7	https://www.inaturalist.org/observations/151302997	Animalia	Chordata	Aves	Passeriformes	Icteridae	Sturnella	Sturnella neglecta		SPECIES	Sturnella neglecta Audubon, 1844	Sturnella neglecta		US		Colorado	PRESENT		28eb1a3f-1c15-4a95-931a-4af90ecb574d	40.183999	-105.16944	8.0						2023-03-15T15:54	15	3	2023	9596413	9596413	HUMAN_OBSERVATION	iNaturalist	Observations	151302997		Phyllis Holst	2023-03-16T15:16:28	CC_BY_NC_4_0	Phyllis Holst	Phyllis Holst			2024-05-28T03:28:13.096Z	StillImage;StillImage	COORDINATE_ROUNDED;CONTINENT_DERIVED_FROM_COORDINATES;TAXON_MATCH_TAXON_ID_IGNORED
4096667337	50c9509d-22c7-4a22-a47d-8c48425ef4a7	https://www.inaturalist.org/observations/156167917	Animalia	Chordata	Aves	Passeriformes	Icteridae	Sturnella	Sturnella neglecta		SPECIES	Sturnella neglecta Audubon, 1844	Sturnella neglecta		CA		Alberta	PRESENT		28eb1a3f-1c15-4a95-931a-4af90ecb574d	51.620539	-113.956079	16779.0						2023-04-22T12:18	22	4	2023	9596413	9596413	HUMAN_OBSERVATION	iNaturalist	Observations	156167917		David Severson	2023-04-23T02:21:07	CC_BY_NC_4_0	David Severson	David Severson			2024-05-28T04:06:33.703Z	StillImage	COORDINATE_ROUNDED;CONTINENT_DERIVED_FROM_COORDINATES;TAXON_MATCH_TAXON_ID_IGNORED
4436055115	50c9509d-22c7-4a22-a47d-8c48425ef4a7	https://www.inaturalist.org/observations/187160665	Animalia	Chordata	Aves	Passeriformes	Icteridae	Sturnella	Sturnella neglecta		SPECIES	Sturnella neglecta Audubon, 1844	Sturnella neglecta		US		Arizona	PRESENT		28eb1a3f-1c15-4a95-931a-4af90ecb574d	32.259986	-110.872393	183.0						2023-10-09T16:01	9	10	2023	9596413	9596413	HUMAN_OBSERVATION	iNaturalist	Observations	187160665		Mike Ostrowski	2023-10-11T20:17:50	CC_BY_4_0	Mike Ostrowski	Mike Ostrowski			2024-05-28T03:38:26.531Z	StillImage	COORDINATE_ROUNDED;CONTINENT_DERIVED_FROM_COORDINATES;TAXON_MATCH_TAXON_ID_IGNORED
4116081880	50c9509d-22c7-4a22-a47d-8c48425ef4a7	https://www.inaturalist.org/observations/160994890	Animalia	Chordata	Aves	Passeriformes	Icteridae	Sturnella	Sturnella neglecta		SPECIES	Sturnella neglecta Audubon, 1844	Sturnella neglecta		CA		Manitoba	PRESENT		28eb1a3f-1c15-4a95-931a-4af90ecb574d	49.66161	-97.093047	26486.0						2023-05-10T18:28:34	10	5	2023	9596413	9596413	HUMAN_OBSERVATION	iNaturalist	Observations	160994890		cmjmousseau	2023-05-10T23:48:38	CC_BY_NC_4_0	cmjmousseau	cmjmousseau			2024-05-28T04:06:34.444Z	StillImage;StillImage;Sound;StillImage	COORDINATE_ROUNDED;CONTINENT_DERIVED_FROM_COORDINATES;TAXON_MATCH_TAXON_ID_IGNORED
4111990413	50c9509d-22c7-4a22-a47d-8c48425ef4a7	https://www.inaturalist.org/observations/159939260	Animalia	Chordata	Aves	Passeriformes	Icteridae	Sturnella	Sturnella neglecta		SPECIES	Sturnella neglecta Audubon, 1844	Sturnella neglecta		CA		Alberta	PRESENT		28eb1a3f-1c15-4a95-931a-4af90ecb574d	50.566925	-113.726747	4.0						2023-05-05T08:45	5	5	2023	9596413	9596413	HUMAN_OBSERVATION	iNaturalist	Observations	159939260		pchristensen	2023-05-05T20:38:20	CC_BY_NC_4_0	pchristensen	pchristensen			2024-05-28T04:01:16.895Z	StillImage	COORDINATE_ROUNDED;CONTINENT_DERIVED_FROM_COORDINATES;TAXON_MATCH_TAXON_ID_IGNORED
In [11]:
# Load the GBIF data
gbif_df = pd.read_csv(
    gbif_path, 
    delimiter='\t',
    index_col='gbifID',
    usecols=['gbifID', 'decimalLatitude', 'decimalLongitude', 'month']
)
gbif_df.head()
Out[11]:
decimalLatitude decimalLongitude month
gbifID
4413934032 43.624742 -103.421674 8
4028941879 41.047030 -112.223504 2
4046729169 38.590682 -121.668850 2
4063055845 34.210922 -103.318527 3
4067332956 40.183999 -105.169440 3

Convert the GBIF data to a GeoDataFrame¶

To plot the GBIF data, we need to convert it to a GeoDataFrame first.

Your task

  1. Replace your_dataframe with the name of the DataFrame you just got from GBIF
  2. Replace longitude_column_name and latitude_column_name with column names from your `DataFrame
  3. Run the code to get a GeoDataFrame of the GBIF data.
In [12]:
gbif_gdf = (
    gpd.GeoDataFrame(
        gbif_df, 
        geometry=gpd.points_from_xy(
            gbif_df.decimalLongitude, 
            gbif_df.decimalLatitude), 
        crs="EPSG:4326")
    # Select the desired columns
    [['month', 'geometry']]
)
gbif_gdf
Out[12]:
month geometry
gbifID
4413934032 8 POINT (-103.42167 43.62474)
4028941879 2 POINT (-112.22350 41.04703)
4046729169 2 POINT (-121.66885 38.59068)
4063055845 3 POINT (-103.31853 34.21092)
4067332956 3 POINT (-105.16944 40.18400)
... ... ...
4813536284 11 POINT (-121.50886 45.71418)
4629803773 6 POINT (-105.21397 41.87483)
4619275310 4 POINT (-122.23353 37.73908)
4685971001 4 POINT (-122.72205 45.10067)
4617103900 9 POINT (-108.83559 39.23686)

249048 rows × 2 columns

Count the number of observations in each ecosystem, during each month of 2023¶

Identify the ecoregion for each observation¶

You can combine the ecoregions and the observations spatially using a method called .sjoin(), which stands for spatial join.

Further reading

Check out the geopandas documentation on spatial joins to help you figure this one out. You can also ask your favorite LLM (Large-Language Model, like ChatGPT)

Your task

  1. Identify the correct values for the how= and predicate= parameters of the spatial join.
  2. Select only the columns you will need for your plot.
  3. Run the code.
In [13]:
gbif_ecoregion_gdf = (
    ecoregions_gdf
    # Match the CRS of the GBIF data and the ecoregions
    .to_crs(gbif_gdf.crs)
    # Find ecoregion for each observation
    .sjoin(
        gbif_gdf,
        how='inner', 
        predicate='contains')
    # Select the required columns
    [['month', 'name']]
)
gbif_ecoregion_gdf
Out[13]:
month name
ecoregion
57 6 Thompson-Okanogan Plateau
57 9 Thompson-Okanogan Plateau
57 6 Thompson-Okanogan Plateau
57 6 Thompson-Okanogan Plateau
57 8 Thompson-Okanogan Plateau
... ... ...
2545 6 Eastern Cascades Slopes and Foothills
2545 6 Eastern Cascades Slopes and Foothills
2545 5 Eastern Cascades Slopes and Foothills
2545 5 Eastern Cascades Slopes and Foothills
2545 4 Eastern Cascades Slopes and Foothills

248065 rows × 2 columns

Count the observations in each ecoregion each month¶

Your task:

  1. Replace columns_to_group_by with a list of columns. Keep in mind that you will end up with one row for each group – you want to count the observations in each ecoregion by month.
  2. Select only month/ecosystem combinations that have more than one occurrence recorded, since a single occurrence could be an error.
  3. Use the .groupby() and .mean() methods to compute the mean occurrences by ecoregion and by month.
  4. Run the code – it will normalize the number of occurrences by month and ecoretion.
In [14]:
occurrence_df = (
    gbif_ecoregion_gdf
    # For each ecoregion, for each month...
    .groupby(['ecoregion', 'month'])
    # ...count the number of occurrences
    .agg(occurrences=('name', 'count'))
)

# Get rid of rare observations (possible misidentification?)
occurrence_df = occurrence_df[occurrence_df.occurrences>1]

# Take the mean by ecoregion
mean_occurrences_by_ecoregion = (
    occurrence_df
    .groupby(['ecoregion'])
    .mean()
)

# Take the mean by month
mean_occurrences_by_month = (
    occurrence_df
    .groupby(['month'])
    .mean()
)

# Normalize the observations by the monthly mean throughout the year
occurrence_df['norm_occurrences'] = (
    occurrence_df
    / mean_occurrences_by_ecoregion 
    / mean_occurrences_by_month
)
occurrence_df
Out[14]:
occurrences norm_occurrences
ecoregion month
57 3 132 0.003020
4 397 0.004641
5 660 0.004941
6 481 0.005170
7 182 0.003507
... ... ... ...
2545 8 76 0.003036
9 63 0.002618
10 78 0.002695
11 45 0.001367
12 61 0.001663

983 rows × 2 columns

Plot the Tasiyagnunpa observations by month¶

Your task

  1. If applicable, replace any variable names with the names you defined previously.
  2. Replace column_name_used_for_ecoregion_color and column_name_used_for_slider with the column names you wish to use.
  3. Customize your plot with your choice of title, tile source, color map, and size.
In [29]:
# Join the occurrences with the plotting GeoDataFrame
occurrence_gdf = ecoregion_plot.join(occurrence_df)

# Get the plot bounds so they don't change with the slider
xmin, ymin, xmax, ymax = occurrence_gdf.total_bounds

# Define slider widget
slider = pn.widgets.DiscreteSlider(
    name='month',
    options={calendar.month_name[i]: i for i in range(1,13)}
)

# Plot occurrence by ecoregion and month
migration_plot = (
    occurrence_gdf
    .hvplot(
        c='norm_occurrences',
        groupby='month',
        # Use background tiles
        geo=True, crs=ccrs.Mercator(), tiles='EsriWorldLightGrayBase',
        title="Tasiyagnunpa Observations by Month",
        xlim=(xmin, xmax), ylim=(ymin, ymax),
        frame_height=550,
        widgets={'month': slider},
        widget_location='bottom',
        colormap='reds'
    )
)

# Save the plot
migration_plot.save('migration.html', embed=True)

# Show the plot
migration_plot
                                               
WARNING:bokeh.core.validation.check:W-1005 (FIXED_SIZING_MODE): 'fixed' sizing mode requires width and height to be set: figure(id='p46489', ...)

Out[29]:
BokehModel(combine_events=True, render_bundle={'docs_json': {'4ec28607-066f-4cbc-accd-e4f73aa03703': {'version…
In [31]:
from bokeh.models import HoverTool
import holoviews as hv
In [32]:
# Updating plot to remove scientific notation in hover tool

# Join the occurrences with the plotting GeoDataFrame
occurrence_gdf = ecoregion_plot.join(occurrence_df)

# Get the plot bounds so they don't change with the slider
xmin, ymin, xmax, ymax = occurrence_gdf.total_bounds

# Define slider widget
slider = pn.widgets.DiscreteSlider(
    name='month',
    options={calendar.month_name[i]: i for i in range(1,13)}
)

# Creating hover tool to show numbers as decimals and not in sci notation
hover = HoverTool(tooltips=[("norm_occurrences", "@norm_occurrences{'.0f'}")]) 

# Plot occurrence by ecoregion and month
migration_plot = (
    occurrence_gdf
    .hvplot(
        c='norm_occurrences',
        groupby='month',
        # Use background tiles
        geo=True, crs=ccrs.Mercator(), tiles='EsriWorldLightGrayBase',
        title="Tasiyagnunpa Observations by Month",
        xlim=(xmin, xmax), ylim=(ymin, ymax),
        frame_height=550,
        widgets={'month': slider},
        widget_location='bottom',
        colormap='reds',
        yformatter='%.0f',
        tools= [hover]
    )
)

# Save the plot
migration_plot.save('migration_no_sci.html', embed=True)

# Show the plot
migration_plot
                                               
WARNING:bokeh.core.validation.check:W-1005 (FIXED_SIZING_MODE): 'fixed' sizing mode requires width and height to be set: figure(id='p51439', ...)

Out[32]:
BokehModel(combine_events=True, render_bundle={'docs_json': {'682fa3c7-cee7-41b9-826a-e38f156263e8': {'version…
In [ ]:
%%capture
%%bash
jupyter nbconvert *.ipynb --to html

::: {.content-visible when-format=“html”} :::

Want an EXTRA CHALLENGE?

Notice that the month slider displays numbers instead of the month name. Use pn.widgets.DiscreteSlider() with the options= parameter set to give the months names. You might want to try asking ChatGPT how to do this, or look at the documentation for pn.widgets.DiscreteSlider(). This is pretty tricky!