Climate change is impacting the way people live around the world¶
::: {.cell .markdown}
Higher highs, lower lows, storms, and smoke – we’re all feeling the effects of climate change. In this workflow, you will take a look at trends in temperature over time in Rapid City, SD.
Important
Get started with open reproducible science!¶
Open reproducible science makes scientific methods, data and outcomes available to everyone. That means that everyone who wants should be able to find, read, understand, and run your workflows for themselves.
Image from https://www.earthdata.nasa.gov/esds/open-science/oss-for-eso-workshops
Few if any science projects are 100% open and reproducible (yet!). However, members of the open science community have developed open source tools and practices that can help you move toward that goal. You will learn about many of those tools in the Intro to Earth Data Science textbook. Don’t worry about learning all the tools at once – we’ve picked a few for you to get started with.
Further reading
What does open reproducible science mean to you?
Create a new Markdown cell below this one using the
+ Markdown
button in the upper left.In the new cell, answer the following questions using a numbered list in Markdown:
- In 1-2 sentences, define open reproducible science.
- In 1-2 sentences, choose one of the open source tools that you have learned about (i.e. Shell, Git/GitHub, Jupyter Notebook, Python) and explain how it supports open reproducible science.
- Open reproducible science is about making research, it's methods and outcomes, available to anyone. This includes that the research can be repeated, the data used is available to the public, and there is transparency around the methods and results.
- GitHub supports open and reproducible science because it allows a space where findings can be shared easily with the public and other researchers. With GitHub's functionality of codespaces and the ability to fork code, it allows for others to take your code, use it, add onto it, and continue to progress research.
Human-readable and Machine-readable¶
Create a new Markdown cell below this one using the ESC + b keyboard shortcut.
In the new cell, answer the following question in a Markdown quote: In 1-2 sentences, does this Jupyter Notebook file have a machine-readable name? Explain your answer.
This Jupyter notebook doesn't have a machine-readable name because it has spaces and an exclimation point in the name.
What the fork?! Who wrote this?¶
Below is a scientific Python workflow. But something’s wrong – The code won’t run! Your task is to follow the instructions below to clean and debug the Python code below so that it runs.
Tip
Don’t worry if you can’t solve every bug right away. We’ll get there! The most important thing is to identify problems with the code and write high-quality GitHub Issues.
At the end, you’ll repeat the workflow for a location and measurement of your choosing.
Alright! Let’s clean up this code. First things first…
Machine-readable file names
Rename this notebook (if necessary) with an expressive and machine-readable file name
Python packages let you use code written by experts around the world¶
Because Python is open source, lots of different people and organizations can contribute (including you!). Many contributions are in the form of packages which do not come with a standard Python download.
Read more
In the cell below, someone was trying to import the pandas package, which helps us to work with tabular data such as comma-separated value or csv files.
Your task
- Correct the typo below to properly import the pandas package under its alias pd.
- Run the cell to import pandas
NOTE: **Run your code in the right **environment** to avoid import errors**
We’ve created a coding environment for you to use that already has all the software and libraries you will need! When you try to run some code, you may be prompted to select a kernel. The kernel refers to the version of Python you are using. You should use the base kernel, which should be the default option.
# Import pandas
import pandas as pd
Once you have run the cell above and imported pandas
, run the cell
below. It is a test cell that will tell you if you completed the task
successfully. If a test cell isn’t working the way you expect, check
that you ran your code immediately before running the test.
# DO NOT MODIFY THIS TEST CELL
points = 0
try:
pd.DataFrame()
points += 5
print('\u2705 Great work! You correctly imported the pandas library.')
except:
print('\u274C Oops - pandas was not imported correctly.')
print('You earned {} of 5 points for importing pandas'.format(points))
✅ Great work! You correctly imported the pandas library. You earned 5 of 5 points for importing pandas
There are more Earth Observation data online than any one person could ever look at¶
NASA’s Earth Observing System Data and Information System (EOSDIS) alone manages over 9PB of data. 1 PB is roughly 100 times the entire Library of Congress (a good approximation of all the books available in the US). It’s all available to you once you learn how to download what you want.
Here we’re using the NOAA National Centers for Environmental Information (NCEI) Access Data Service application progamming interface (API) to request data from their web servers. We will be using data collected as part of the Global Historical Climatology Network daily (GHCNd) from their Climate Data Online library program at NOAA.
For this example we’re requesting daily summary data in Rapid City, CO (station ID USC00396947).
Your task:
- Research the Global Historical Climatology Network - Daily data source.
- In the cell below, write a 2-3 sentence description of the data source. You should describe:
- who takes the data
- where the data were taken
- what the maximum temperature units are
- how the data are collected
- Include a citation of the data (HINT: See the ‘Data Citation’ tab on the GHCNd overview page).
The data is taken from 30 different sources at climate stations around the world. Over 25,000 stations are regularly updated with observations from within roughly the last month. The dataset includes observations from World Meteorological Organization, Cooperative, and CoCoRaHS networks. NOAA's National Centers for Environmental Information houses this data. The maximum temperature units is tenths of degrees C. The dataset is also routinely reconstructed (usually every week) from its roughly 30 data sources to ensure that GHCN-Daily is generally in sync with its growing list of constituent sources. The data is collected from climate stations around the world.
Citation: Menne, Matthew J., Imke Durre, Bryant Korzeniewski, Shelley McNeill, Kristy Thomas, Xungang Yin, Steven Anthony, Ron Ray, Russell S. Vose, Byron E.Gleason, and Tamara G. Houston (2012): Global Historical Climatology Network - Daily (GHCN-Daily), Version 3. [indicate subset used]. NOAA National Climatic Data Center. doi:10.7289/V5D21VHZ [May 13, 2024].
You can access NCEI GHCNd Data from the internet using its API 🖥️ 📡 🖥️¶
The cell below contains the URL for the data you will use in this part of the notebook. We created this URL by generating what is called an API endpoint using the NCEI API documentation.
Note
An application programming interface (API) is a way for two or more computer programs or components to communicate with each other. It is a type of software interface, offering a service to other pieces of software (Wikipedia).
However, we still have a problem - we can’t get the URL back later on because it isn’t saved in a variable. In other words, we need to give the url a name so that we can request in from Python later (sadly, Python has no ‘hey what was that thingy I typed yesterday?’ function).
Read more
Check out the textbook section on variables
Your task
- Pick an expressive variable name for the URL. HINT: click on the
Variables
button up top to see all your variables. Your new url variable will not be there until you define it and run the code- Reformat the URL so that it adheres to the 79-character PEP-8 line limit.You should see two vertical lines in each cell - don’t let your code go past the second line
- At the end of the cell where you define your url variable, call your variable (type out its name) so it can be tested.
# Getting data from NCEI for Rapid City, CO
rapidcityurl = (
'https://www.ncei.noaa.gov/access/services/data/v1?'
'dataset=daily-summaries'
'&dataTypes=TOBS,PRCP'
'&stations=USC00396947'
'&startDate=1949-10-01'
'&endDate=2024-05-03'
'&includeStationName=true'
'&includeStationLocation=1'
'&units=standard')
rapidcityurl
'https://www.ncei.noaa.gov/access/services/data/v1?dataset=daily-summaries&dataTypes=TOBS,PRCP&stations=USC00396947&startDate=1949-10-01&endDate=2024-05-03&includeStationName=true&includeStationLocation=1&units=standard'
# DO NOT MODIFY THIS TEST CELL
resp_url = _
points = 0
if type(resp_url)==str:
points += 3
print('\u2705 Great work! You correctly called your url variable.')
else:
print('\u274C Oops - your url variable was not called correctly.')
if len(resp_url)==218:
points += 3
print('\u2705 Great work! Your url is the correct length.')
else:
print('\u274C Oops - your url variable is not the correct length.')
print('You earned {} of 6 points for defining a url variable'.format(points))
✅ Great work! You correctly called your url variable. ✅ Great work! Your url is the correct length. You earned 6 of 6 points for defining a url variable
Download and get started working with NCEI data¶
The pandas
library you imported can download data from the internet
directly into a type of Python object called a DataFrame
. In the
code cell below, you can see an attempt to do just this. But there are
some problems…
You’re ready to fix some code!
Your task is to:
Leave a space between the
#
and text in the comment and try making the comment more informativeMake any changes needed to get this code to run. HINT: The
my_url
variable doesn’t exist - you need to replace it with the variable name you chose.Modify the
.read_csv()
statement to include the following parameters:
index_col='DATE'
– this sets theDATE
column as the index. Needed for subsetting and resampling later onparse_dates=True
– this letspython
know that you are working with time-series data, and values in the indexed column are date time objectsna_values=['NaN']
– this letspython
know how to handle missing valuesClean up the code by using expressive variable names, expressive column names, PEP-8 compliant code, and descriptive comments
Make sure to call your DataFrame
by typing it’s name as the last
line of your code cell Then, you will be able to run the test cell
below and find out if your answer is correct.
# creating a data frame for Rapid City
rapidcity_df = pd.read_csv(
rapidcityurl,
index_col='DATE',
parse_dates=True,
na_values=['NaN'])
rapidcity_df
STATION | NAME | LATITUDE | LONGITUDE | ELEVATION | PRCP | TOBS | |
---|---|---|---|---|---|---|---|
DATE | |||||||
1949-10-01 | USC00396947 | RAPID CITY 4 NW, SD US | 44.12055 | -103.28417 | 1060.4 | 0.00 | 51.0 |
1949-10-02 | USC00396947 | RAPID CITY 4 NW, SD US | 44.12055 | -103.28417 | 1060.4 | 0.00 | 51.0 |
1949-10-03 | USC00396947 | RAPID CITY 4 NW, SD US | 44.12055 | -103.28417 | 1060.4 | 0.00 | 52.0 |
1949-10-04 | USC00396947 | RAPID CITY 4 NW, SD US | 44.12055 | -103.28417 | 1060.4 | 0.00 | 45.0 |
1949-10-05 | USC00396947 | RAPID CITY 4 NW, SD US | 44.12055 | -103.28417 | 1060.4 | 0.00 | 50.0 |
... | ... | ... | ... | ... | ... | ... | ... |
2024-04-28 | USC00396947 | RAPID CITY 4 NW, SD US | 44.12055 | -103.28417 | 1060.4 | 0.00 | NaN |
2024-04-29 | USC00396947 | RAPID CITY 4 NW, SD US | 44.12055 | -103.28417 | 1060.4 | 0.37 | 30.0 |
2024-04-30 | USC00396947 | RAPID CITY 4 NW, SD US | 44.12055 | -103.28417 | 1060.4 | 0.00 | 44.0 |
2024-05-01 | USC00396947 | RAPID CITY 4 NW, SD US | 44.12055 | -103.28417 | 1060.4 | 0.00 | 33.0 |
2024-05-02 | USC00396947 | RAPID CITY 4 NW, SD US | 44.12055 | -103.28417 | 1060.4 | 0.35 | 39.0 |
26109 rows × 7 columns
# DO NOT MODIFY THIS TEST CELL
tmax_df_resp = _
points = 0
if isinstance(tmax_df_resp, pd.DataFrame):
points += 1
print('\u2705 Great work! You called a DataFrame.')
else:
print('\u274C Oops - make sure to call your DataFrame for testing.')
print('You earned {} of 2 points for downloading data'.format(points))
✅ Great work! You called a DataFrame. You earned 1 of 2 points for downloading data
HINT: Check out the
type()
function below - you can use it to check that your data is now inDataFrame
type object
# Check that the data was imported into a pandas DataFrame
type(rapidcity_df)
pandas.core.frame.DataFrame
Clean up your DataFrame
Use double brackets to only select the columns you want in your DataFrame
Make sure to call your DataFrame
by typing it’s name as the last
line of your code cell Then, you will be able to run the test cell
below and find out if your answer is correct.
# Checking column names to know all columns in data
rapidcity_df.columns
Index(['STATION', 'NAME', 'LATITUDE', 'LONGITUDE', 'ELEVATION', 'PRCP', 'TOBS'], dtype='object')
# Rewriting data frame to only have precipitations and TOBS data
rapidcity_df = rapidcity_df[['PRCP', 'TOBS']]
rapidcity_df
PRCP | TOBS | |
---|---|---|
DATE | ||
1949-10-01 | 0.00 | 51.0 |
1949-10-02 | 0.00 | 51.0 |
1949-10-03 | 0.00 | 52.0 |
1949-10-04 | 0.00 | 45.0 |
1949-10-05 | 0.00 | 50.0 |
... | ... | ... |
2024-04-28 | 0.00 | NaN |
2024-04-29 | 0.37 | 30.0 |
2024-04-30 | 0.00 | 44.0 |
2024-05-01 | 0.00 | 33.0 |
2024-05-02 | 0.35 | 39.0 |
26109 rows × 2 columns
# DO NOT MODIFY THIS TEST CELL
tmax_df_resp = _
points = 0
summary = [round(val, 2) for val in tmax_df_resp.mean().values]
if summary == [0.05, 54.53]:
points += 4
print('\u2705 Great work! You correctly downloaded data.')
else:
print('\u274C Oops - your data are not correct.')
print('You earned {} of 5 points for downloading data'.format(points))
❌ Oops - your data are not correct. You earned 0 of 5 points for downloading data
Plot the precpitation column (PRCP) vs time to explore the data¶
Plotting in Python is easy, but not quite this easy:
# Plotting Rapid City PRCP and TOBS
rapidcity_df.plot()
<Axes: xlabel='DATE'>
****Label and describe your plots****
Make sure each plot has:
- A title that explains where and when the data are from
- x- and y- axis labels with units where appropriate
- A legend where appropriate
You’ll always need to add some instructions on labels and how you want your plot to look.
Your task:
- Change
dataframe
to yourDataFrame
name.- Change
y=
to the name of your observed temperature column name.- Use the
title
,ylabel
, andxlabel
parameters to add key text to your plot.- Adjust the size of your figure using
figsize=(x,y)
wherex
is figure width andy
is figure heightHINT: labels have to be a type in Python called a string. You can make a string by putting quotes around your label, just like the column names in the sample code (eg
y='TOBS'
).
# Plotting Daily Observed Temperature for Rapid City from 1944-2024
rapidcity_df.plot(
y='TOBS',
title='Rapid City Daily Observed Temperature 1944-2024',
xlabel='Date',
ylabel='Temperature (F)',
legend=False,
color='blue',
figsize=(10,5),
fontsize=14)
<Axes: title={'center': 'Rapid City Daily Observed Temperature 1944-2024'}, xlabel='Date', ylabel='Temperature (F)'>
Want an EXTRA CHALLENGE?
There are many other things you can do to customize your plot. Take a look at the pandas plotting galleries and the documentation of plot to see if there’s other changes you want to make to your plot. Some possibilities include:
- Remove the legend since there’s only one data series
- Increase the figure size
- Increase the font size
- Change the colors
- Use a bar graph instead (usually we use lines for time series, but since this is annual it could go either way)
- Add a trend line
Not sure how to do any of these? Try searching the internet, or asking an AI!
Convert units
Modify the code below to add a column that includes temperature in Celsius. The code below was written by your colleague. Can you fix this so that it correctly calculates temperature in Celsius and adds a new column?
# Convert to celcius
rapidcity_df.loc[:,'TCel'] = (rapidcity_df['TOBS'] - 32) * 5 / 9
rapidcity_df
/tmp/ipykernel_6497/1789627478.py:2: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy rapidcity_df.loc[:,'TCel'] = (rapidcity_df['TOBS'] - 32) * 5 / 9
PRCP | TOBS | TCel | |
---|---|---|---|
DATE | |||
1949-10-01 | 0.00 | 51.0 | 10.555556 |
1949-10-02 | 0.00 | 51.0 | 10.555556 |
1949-10-03 | 0.00 | 52.0 | 11.111111 |
1949-10-04 | 0.00 | 45.0 | 7.222222 |
1949-10-05 | 0.00 | 50.0 | 10.000000 |
... | ... | ... | ... |
2024-04-28 | 0.00 | NaN | NaN |
2024-04-29 | 0.37 | 30.0 | -1.111111 |
2024-04-30 | 0.00 | 44.0 | 6.666667 |
2024-05-01 | 0.00 | 33.0 | 0.555556 |
2024-05-02 | 0.35 | 39.0 | 3.888889 |
26109 rows × 3 columns
# DO NOT MODIFY THIS TEST CELL
tmax_df_resp = _
points = 0
if isinstance(tmax_df_resp, pd.DataFrame):
points += 1
print('\u2705 Great work! You called a DataFrame.')
else:
print('\u274C Oops - make sure to call your DataFrame for testing.')
summary = [round(val, 2) for val in tmax_df_resp.mean().values]
if summary == [0.05, 54.53, 12.52]:
points += 4
print('\u2705 Great work! You correctly converted to Celcius.')
else:
print('\u274C Oops - your data are not correct.')
print('You earned {} of 5 points for converting to Celcius'.format(points))
✅ Great work! You called a DataFrame. ❌ Oops - your data are not correct. You earned 1 of 5 points for converting to Celcius
Want an EXTRA CHALLENGE?
- As you did above, rewrite the code to be more expressive
- Using the code below as a framework, write and apply a function that converts to Celcius. > Functions let you reuse code you have already written
- You should also rewrite this function and parameter names to be more expressive.
# Creating a function to convert from fahrenheit to celsius
def to_celsius(fahrenheit):
"""Convert temperature to Celsius"""
return (fahrenheit - 32) * 5 / 9
# Displaying dataframe with new column, celsius
rapidcity_df['celsius'] = rapidcity_df['TOBS'].apply(to_celsius)
rapidcity_df
/tmp/ipykernel_6497/2859914288.py:7: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy rapidcity_df['celsius'] = rapidcity_df['TOBS'].apply(to_celsius)
PRCP | TOBS | TCel | celsius | |
---|---|---|---|---|
DATE | ||||
1949-10-01 | 0.00 | 51.0 | 10.555556 | 10.555556 |
1949-10-02 | 0.00 | 51.0 | 10.555556 | 10.555556 |
1949-10-03 | 0.00 | 52.0 | 11.111111 | 11.111111 |
1949-10-04 | 0.00 | 45.0 | 7.222222 | 7.222222 |
1949-10-05 | 0.00 | 50.0 | 10.000000 | 10.000000 |
... | ... | ... | ... | ... |
2024-04-28 | 0.00 | NaN | NaN | NaN |
2024-04-29 | 0.37 | 30.0 | -1.111111 | -1.111111 |
2024-04-30 | 0.00 | 44.0 | 6.666667 | 6.666667 |
2024-05-01 | 0.00 | 33.0 | 0.555556 | 0.555556 |
2024-05-02 | 0.35 | 39.0 | 3.888889 | 3.888889 |
26109 rows × 4 columns
Subsetting and Resampling¶
Often when working with time-series data you may want to focus on a shorter window of time, or look at weekly, monthly, or annual summaries to help make the analysis more manageable.
Read more
Read more about subsetting and resampling time-series data in our Learning Portal.
For this demonstration, we will look at the last 40 years worth of data and resample to explore a summary from each year that data were recorded.
Your task
- Replace
start-year
andend-year
with 1983 and 2023- Replace
dataframe
with the name of your data- Replace
new_dataframe
with something more expressive- Call your new variable
- Run the cell
# Subset the data
rapidcitysubset = rapidcity_df.loc['1983':'2023']
rapidcitysubset
PRCP | TOBS | TCel | celsius | |
---|---|---|---|---|
DATE | ||||
1983-01-01 | 0.00 | 30.0 | -1.111111 | -1.111111 |
1983-01-02 | 0.00 | 29.0 | -1.666667 | -1.666667 |
1983-01-03 | 0.00 | 40.0 | 4.444444 | 4.444444 |
1983-01-04 | 0.00 | 33.0 | 0.555556 | 0.555556 |
1983-01-05 | 0.00 | 43.0 | 6.111111 | 6.111111 |
... | ... | ... | ... | ... |
2023-12-27 | 0.31 | 32.0 | 0.000000 | 0.000000 |
2023-12-28 | 0.00 | 17.0 | -8.333333 | -8.333333 |
2023-12-29 | 0.00 | 28.0 | -2.222222 | -2.222222 |
2023-12-30 | 0.00 | NaN | NaN | NaN |
2023-12-31 | 0.00 | NaN | NaN | NaN |
13939 rows × 4 columns
# DO NOT MODIFY THIS TEST CELL
df_resp = _
points = 0
if isinstance(df_resp, pd.DataFrame):
points += 1
print('\u2705 Great work! You called a DataFrame.')
else:
print('\u274C Oops - make sure to call your DataFrame for testing.')
summary = [round(val, 2) for val in df_resp.mean().values]
if summary == [0.06, 55.67, 13.15]:
points += 5
print('\u2705 Great work! You correctly converted to Celcius.')
else:
print('\u274C Oops - your data are not correct.')
print('You earned {} of 5 points for subsetting'.format(points))
✅ Great work! You called a DataFrame. ❌ Oops - your data are not correct. You earned 1 of 5 points for subsetting
Now we are ready to calculate annual statistics¶
Here you will resample the 1983-2023 data to look the annual mean values.
Resample your data
- Replace
new_dataframe
with the variable you created in the cell above where you subset the data- Replace
'TIME'
with a'W'
,'M'
, or'Y'
depending on whether you’re doing a weekly, monthly, or yearly summary- Replace
STAT
with asum
,min
,max
, ormean
depending on what kind of statistic you’re interested in calculating.- Replace
resampled_data
with a more expressive variable name- Call your new variable
- Run the cell
# Resample the data to look at yearly mean values
rapidyearly = rapidcitysubset.resample('YE').mean()
rapidyearly
PRCP | TOBS | TCel | celsius | |
---|---|---|---|---|
DATE | ||||
1983-12-31 | 0.038849 | 59.302632 | 15.168129 | 15.168129 |
1984-12-31 | 0.026145 | 54.458182 | 12.476768 | 12.476768 |
1985-12-31 | 0.039091 | 50.691667 | 10.384259 | 10.384259 |
1986-12-31 | 0.069551 | 53.672673 | 12.040374 | 12.040374 |
1987-12-31 | 0.039011 | 56.988950 | 13.882750 | 13.882750 |
1988-12-31 | 0.028017 | 56.983240 | 13.879578 | 13.879578 |
1989-12-31 | 0.056359 | 38.072829 | 3.373794 | 3.373794 |
1990-12-31 | 0.039068 | 40.363112 | 4.646174 | 4.646174 |
1991-12-31 | 0.056875 | 39.945869 | 4.414372 | 4.414372 |
1992-12-31 | 0.036714 | 39.525862 | 4.181034 | 4.181034 |
1993-12-31 | 0.055881 | 35.522581 | 1.956989 | 1.956989 |
1994-12-31 | 0.034540 | 39.479769 | 4.155427 | 4.155427 |
1995-12-31 | 0.063609 | 39.150568 | 3.972538 | 3.972538 |
1996-12-31 | 0.058785 | 36.547486 | 2.526381 | 2.526381 |
1997-12-31 | 0.057634 | 38.825073 | 3.791707 | 3.791707 |
1998-12-31 | 0.068343 | 40.563739 | 4.757633 | 4.757633 |
1999-12-31 | 0.073104 | 41.688202 | 5.382335 | 5.382335 |
2000-12-31 | 0.050771 | 39.750751 | 4.305973 | 4.305973 |
2001-12-31 | 0.049639 | 43.371134 | 6.317297 | 6.317297 |
2002-12-31 | 0.036126 | 33.482143 | 0.823413 | 0.823413 |
2003-12-31 | 0.039186 | 40.455253 | 4.697363 | 4.697363 |
2004-12-31 | 0.030242 | 38.877828 | 3.821016 | 3.821016 |
2005-12-31 | 0.044620 | 40.627119 | 4.792844 | 4.792844 |
2006-12-31 | 0.042870 | 40.873278 | 4.929599 | 4.929599 |
2007-12-31 | 0.038515 | 34.806931 | 1.559406 | 1.559406 |
2008-12-31 | 0.025892 | 34.204969 | 1.224983 | 1.224983 |
2009-12-31 | 0.053828 | 35.871324 | 2.150735 | 2.150735 |
2010-12-31 | 0.056767 | 39.012384 | 3.895769 | 3.895769 |
2011-12-31 | 0.060282 | 40.313846 | 4.618803 | 4.618803 |
2012-12-31 | 0.019341 | 42.008746 | 5.560415 | 5.560415 |
2013-12-31 | 0.060685 | 38.392638 | 3.551466 | 3.551466 |
2014-12-31 | 0.057726 | 39.211310 | 4.006283 | 4.006283 |
2015-12-31 | 0.057260 | 41.351275 | 5.195153 | 5.195153 |
2016-12-31 | 0.039508 | 42.161644 | 5.645358 | 5.645358 |
2017-12-31 | 0.034082 | 41.013889 | 5.007716 | 5.007716 |
2018-12-31 | 0.057335 | 36.670732 | 2.594851 | 2.594851 |
2019-12-31 | 0.085056 | 36.159544 | 2.310858 | 2.310858 |
2020-12-31 | 0.044006 | 41.023438 | 5.013021 | 5.013021 |
2021-12-31 | 0.032225 | 40.363248 | 4.646249 | 4.646249 |
2022-12-31 | 0.028421 | 39.331395 | 4.072997 | 4.072997 |
2023-12-31 | 0.046313 | 40.144578 | 4.524766 | 4.524766 |
# DO NOT MODIFY THIS TEST CELL
df_resp = _
points = 0
if isinstance(df_resp, pd.DataFrame):
points += 1
print('\u2705 Great work! You called a DataFrame.')
else:
print('\u274C Oops - make sure to call your DataFrame for testing.')
summary = [round(val, 2) for val in df_resp.mean().values]
if summary == [0.06, 55.37, 12.99]:
points += 5
print('\u2705 Great work! You correctly converted to Celcius.')
else:
print('\u274C Oops - your data are not correct.')
print('You earned {} of 5 points for resampling'.format(points))
✅ Great work! You called a DataFrame. ❌ Oops - your data are not correct. You earned 1 of 5 points for resampling
Plot your resampled data
# Plot mean annual temperature values from 1983 to 2023
rapidyearly.plot(
y='TOBS',
title='Rapid City Annual Mean Temperatures 1983-2023',
xlabel='Year',
ylabel='Temperature (F)',
legend=False,
color='blue',
figsize=(10,5),
fontsize=14
)
<Axes: title={'center': 'Rapid City Annual Mean Temperatures 1983-2023'}, xlabel='Year', ylabel='Temperature (F)'>
Describe your plot
We like to use an approach called “Assertion-Evidence” for presenting scientific results. There’s a lot of video tutorials and example talks available on the Assertion-Evidence web page. The main thing you need to do now is to practice writing a message or headline rather than descriptions or topic sentences for the plot you just made (what they refer to as “visual evidence”).
For example, it would be tempting to write something like “A plot of maximum annual temperature in Rapid City, Colorado over time (1983-2023)”. However, this doesn’t give the reader anything to look at, or explain why we made this particular plot (we know, you made this one because we told you to)
Some alternatives for different plots of Rapid City temperature that are more of a starting point for a presentation or conversation are:
- Rapid City, SD experienced cooler than average temperatures in 1995
- Temperatures in Rapid City, SD appear to be on the rise over the past 40 years
- Maximum annual temperatures in Rapid City, CO are becoming more variable over the previous 40 years
We could back up some of these claims with further analysis included later on, but we want to make sure that our audience has some guidance on what to look for in the plot.
**Rapid City, CO colder than 40 years ago and temperatures continue to shift! ** 📰 🗞️ 📻¶
In the late eighties, Rapid City, Colorado experienced a drastic decrease in average annual temperature of almost 20 degrees fahrenheit. Today the annual mean remains low and is having more stark changes year to year.
Image credit: https://www.craiyon.com/image/OAbZtyelSoS7FdGko6hvQg
THIS ISN’T THE END! 😄¶
Don’t forget to reproduce your analysis in a new location or time!
Image source: https://www.independent.co.uk/climate-change/news/by-the-left-quick-march-the-emperor-penguins-migration-1212420.html
Your turn: pick a new location and/or measurement to plot 🌏 📈¶
Below (or in a new notebook!), recreate the workflow you just did in a place that interests you OR with a different measurement. See the instructions above to adapt the URL that we created for Rapid City, CO using the NCEI API. You will need to make your own new Markdown and Code cells below this one, or create a new notebook.
Congratulations, you’re almost done with this coding challenge 🤩 – now make sure that your code is reproducible¶
Image source: https://dfwurbanwildlife.com/2018/03/25/chris-jacksons-dfw-urban-wildlife/snow-geese-galore/
Your task
- If you didn’t already, go back to the code you modified about and write more descriptive comments so the next person to use this code knows what it does.
- Make sure to
Restart
andRun all
up at the top of your notebook. This will clear all your variables and make sure that your code runs in the correct order. It will also export your work in Markdown format, which you can put on your website.Always run your code start to finish before submitting!
Before you commit your work, make sure it runs reproducibly by clicking:
Restart
(this button won’t appear until you’ve run some code), thenRun All
BONUS: Create a shareable Markdown of your work¶
Below is some code that you can run that will save a Markdown file of your work that is easily shareable and can be uploaded to GitHub Pages. You can use it as a starting point for writing your portfolio post!
#Recreating with Rainier Paradise Ranger Station in Mount Rainier
#USC00456898 46.7858 -121.7425 1654.1 WA RAINIER PARADISE RS
rainierurl = (
'https://www.ncei.noaa.gov/access/services/data/v1?'
'dataset=daily-summaries'
'&dataTypes=TOBS,PRCP'
'&stations=USC00456898'
'&startDate=1916-12-01'
'&endDate=2024-05-12'
'&includeStationName=true'
'&includeStationLocation=1'
'&units=standard')
rainierurl
'https://www.ncei.noaa.gov/access/services/data/v1?dataset=daily-summaries&dataTypes=TOBS,PRCP&stations=USC00456898&startDate=1916-12-01&endDate=2024-05-12&includeStationName=true&includeStationLocation=1&units=standard'
# creating a data frame for Rainier
rainier_df = pd.read_csv(
rainierurl,
index_col='DATE',
parse_dates=True,
na_values=['NaN'])
rainier_df
STATION | NAME | LATITUDE | LONGITUDE | ELEVATION | PRCP | TOBS | |
---|---|---|---|---|---|---|---|
DATE | |||||||
1916-12-01 | USC00456898 | RAINIER PARADISE RANGER STATION, WA US | 46.78639 | -121.74222 | 1650.5 | NaN | 29.0 |
1916-12-02 | USC00456898 | RAINIER PARADISE RANGER STATION, WA US | 46.78639 | -121.74222 | 1650.5 | NaN | 28.0 |
1916-12-03 | USC00456898 | RAINIER PARADISE RANGER STATION, WA US | 46.78639 | -121.74222 | 1650.5 | NaN | 21.0 |
1916-12-04 | USC00456898 | RAINIER PARADISE RANGER STATION, WA US | 46.78639 | -121.74222 | 1650.5 | NaN | 23.0 |
1916-12-05 | USC00456898 | RAINIER PARADISE RANGER STATION, WA US | 46.78639 | -121.74222 | 1650.5 | NaN | 22.0 |
... | ... | ... | ... | ... | ... | ... | ... |
2024-05-08 | USC00456898 | RAINIER PARADISE RANGER STATION, WA US | 46.78639 | -121.74222 | 1650.5 | NaN | NaN |
2024-05-09 | USC00456898 | RAINIER PARADISE RANGER STATION, WA US | 46.78639 | -121.74222 | 1650.5 | NaN | 54.0 |
2024-05-10 | USC00456898 | RAINIER PARADISE RANGER STATION, WA US | 46.78639 | -121.74222 | 1650.5 | 0.0 | 61.0 |
2024-05-11 | USC00456898 | RAINIER PARADISE RANGER STATION, WA US | 46.78639 | -121.74222 | 1650.5 | 0.0 | 59.0 |
2024-05-12 | USC00456898 | RAINIER PARADISE RANGER STATION, WA US | 46.78639 | -121.74222 | 1650.5 | 0.0 | 54.0 |
35888 rows × 7 columns
# Keeping only precipitation and TOBS columns
rainier_df = rainier_df[['PRCP', 'TOBS']]
rainier_df
PRCP | TOBS | |
---|---|---|
DATE | ||
1916-12-01 | NaN | 29.0 |
1916-12-02 | NaN | 28.0 |
1916-12-03 | NaN | 21.0 |
1916-12-04 | NaN | 23.0 |
1916-12-05 | NaN | 22.0 |
... | ... | ... |
2024-05-08 | NaN | NaN |
2024-05-09 | NaN | 54.0 |
2024-05-10 | 0.0 | 61.0 |
2024-05-11 | 0.0 | 59.0 |
2024-05-12 | 0.0 | 54.0 |
35888 rows × 2 columns
# Plotting Rainier Daily Observed Temperature
rainier_df.plot(
y='TOBS',
title='Rainier Paradise Ranger Station City Daily Observed Temperature 1916-2024',
xlabel='Date',
ylabel='Temperature (F)',
legend=False,
color='blue',
figsize=(10,5),
fontsize=14)
<Axes: title={'center': 'Rainier Paradise Ranger Station City Daily Observed Temperature 1916-2024'}, xlabel='Date', ylabel='Temperature (F)'>
# Convert to celcius
rainier_df.loc[:,'TCel'] = (rainier_df['TOBS'] - 32) * 5 / 9
rainier_df
/tmp/ipykernel_6497/1784484435.py:2: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy rainier_df.loc[:,'TCel'] = (rainier_df['TOBS'] - 32) * 5 / 9
PRCP | TOBS | TCel | |
---|---|---|---|
DATE | |||
1916-12-01 | NaN | 29.0 | -1.666667 |
1916-12-02 | NaN | 28.0 | -2.222222 |
1916-12-03 | NaN | 21.0 | -6.111111 |
1916-12-04 | NaN | 23.0 | -5.000000 |
1916-12-05 | NaN | 22.0 | -5.555556 |
... | ... | ... | ... |
2024-05-08 | NaN | NaN | NaN |
2024-05-09 | NaN | 54.0 | 12.222222 |
2024-05-10 | 0.0 | 61.0 | 16.111111 |
2024-05-11 | 0.0 | 59.0 | 15.000000 |
2024-05-12 | 0.0 | 54.0 | 12.222222 |
35888 rows × 3 columns
# creating subset from 1980 to 2023
rainiersubset = rainier_df.loc['1980':'2023']
rainiersubset
PRCP | TOBS | TCel | |
---|---|---|---|
DATE | |||
1980-01-01 | 1.29 | 29.0 | -1.666667 |
1980-01-02 | 0.08 | 31.0 | -0.555556 |
1980-01-03 | 0.74 | 18.0 | -7.777778 |
1980-01-04 | 0.10 | 21.0 | -6.111111 |
1980-01-05 | 1.58 | 18.0 | -7.777778 |
... | ... | ... | ... |
2023-12-27 | NaN | NaN | NaN |
2023-12-28 | NaN | 36.0 | 2.222222 |
2023-12-29 | 0.00 | 40.0 | 4.444444 |
2023-12-30 | 0.20 | 34.0 | 1.111111 |
2023-12-31 | 0.39 | 32.0 | 0.000000 |
15280 rows × 3 columns
# Resampling to get only mean yearly values
rainieryearly = rainiersubset.resample('YE').mean()
rainieryearly
PRCP | TOBS | TCel | |
---|---|---|---|
DATE | |||
1980-12-31 | 0.340847 | 34.983516 | 1.657509 |
1981-12-31 | 0.337315 | 36.912088 | 2.728938 |
1982-12-31 | 0.316418 | 34.668524 | 1.482513 |
1983-12-31 | 0.355205 | 35.173077 | 1.762821 |
1984-12-31 | 0.337637 | 33.704918 | 0.947177 |
1985-12-31 | 0.228000 | 35.640110 | 2.022283 |
1986-12-31 | 0.310932 | 38.142466 | 3.412481 |
1987-12-31 | 0.199331 | 39.781818 | 4.323232 |
1988-12-31 | 0.328297 | 37.186301 | 2.881279 |
1989-12-31 | 0.274848 | 35.632597 | 2.018109 |
1990-12-31 | 0.405439 | 35.107042 | 1.726135 |
1991-12-31 | 0.347582 | 36.430137 | 2.461187 |
1992-12-31 | 0.272932 | 37.699454 | 3.166363 |
1993-12-31 | 0.218822 | 33.498630 | 0.832572 |
1994-12-31 | 0.384620 | 35.877483 | 2.154157 |
1995-12-31 | 0.382644 | 35.853968 | 2.141093 |
1996-12-31 | 0.354772 | 36.933824 | 2.741013 |
1997-12-31 | 0.398710 | 35.944984 | 2.191658 |
1998-12-31 | 0.330840 | 36.889205 | 2.716225 |
1999-12-31 | 0.342132 | 35.739394 | 2.077441 |
2000-12-31 | 0.261652 | 33.385382 | 0.769657 |
2001-12-31 | 0.332028 | 33.288401 | 0.715778 |
2002-12-31 | 0.289858 | 37.241379 | 2.911877 |
2003-12-31 | 0.357840 | 35.927419 | 2.181900 |
2004-12-31 | 0.301335 | 40.064706 | 4.480392 |
2005-12-31 | 0.297229 | 38.962536 | 3.868076 |
2006-12-31 | 0.378636 | 38.075758 | 3.375421 |
2007-12-31 | 0.299623 | 37.391975 | 2.995542 |
2008-12-31 | 0.318908 | 35.789272 | 2.105151 |
2009-12-31 | 0.264627 | 36.655052 | 2.586140 |
2010-12-31 | 0.276537 | 38.596330 | 3.664628 |
2011-12-31 | 0.370717 | 34.744681 | 1.524823 |
2012-12-31 | 0.339748 | 36.509434 | 2.505241 |
2013-12-31 | 0.286890 | 38.206294 | 3.447941 |
2014-12-31 | 0.345017 | 38.678457 | 3.710254 |
2015-12-31 | 0.321603 | 41.861314 | 5.478508 |
2016-12-31 | 0.375076 | 38.674658 | 3.708143 |
2017-12-31 | 0.297438 | 40.143791 | 4.524328 |
2018-12-31 | 0.385768 | 38.235880 | 3.464378 |
2019-12-31 | 0.256429 | 38.764331 | 3.757962 |
2020-12-31 | 0.488738 | 37.295276 | 2.941820 |
2021-12-31 | 0.340103 | 40.833333 | 4.907407 |
2022-12-31 | 0.227745 | 41.918301 | 5.510167 |
2023-12-31 | 0.232778 | 43.404669 | 6.335927 |
# Plot mean annual temperature values for Rainier from 1980 to 2023
rainieryearly.plot(
y='TOBS',
title='Rainier Paradise Ranger Station Annual Mean Temperatures 1980-2023',
xlabel='Year',
ylabel='Temperature (F)',
legend=False,
color='blue',
figsize=(10,5),
fontsize=14
)
<Axes: title={'center': 'Rainier Paradise Ranger Station Annual Mean Temperatures 1980-2023'}, xlabel='Year', ylabel='Temperature (F)'>
**Temperatures on the rise in Mount Rainier, WA over the last 40 years! ** 📰 🗞️ 📻¶
Over the last 40 years, Mount Rainier has experienced many drastic changes in temperature. Today we can see that both the extreme lows and extreme highs are increasing.
Image credit: https://www.craiyon.com/image/OAbZtyelSoS7FdGko6hvQg
%%capture
%%bash
jupyter nbconvert *.ipynb --to markdown
%%capture
%%bash
jupyter nbconvert *.ipynb --to html