US Weather Data API
November 16, 2019

First post on this topic! How exciting! Did you know that the National Oceanic and Atmospheric Administration (NOAA) has an awesome weather data API?
In this post we will explore how to quickly obtain the weather data needed to plot a distribution of average temperature in 4 american cities. Just like in the chart below.
I highly encourage you to request a Token here and play with it! I’m sure I’ll write a lot more interesting things in the future. Let’s get a quick basic python setup ready.
Token = "copy_paste_your_token_here"
#needed libraries
import requests
import pandas as pd
import json
import numpy as np
from datetime import datetimeYou can easily find weather stations here that have daily Let’s save the state and zipcode of the stations for later use.
#Overall dataframe with all cities
all_cities_df = pd.DataFrame()
all_cities_df['date'] = []
weather_station_dict = {}
#Adding a cities weather station
weather_station_dict['Long Beach'] = ['GHCND:USW00023129', 'CA', '90808']
weather_station_dict['Washington'] = ['GHCND:USW00013743', 'DC–VA–MD–WV', '22202']
weather_station_dict['Orlando'] = ['GHCND:USW00012815', 'FL', '32827']
weather_station_dict['Seattle'] = ['GHCND:USW00024233', 'WA', '98158']
Let’s now loop through our weather stations and collect the last 5 years of average temperatures.
for x in weather_station_dict:
#initialize lists to store data
dates_temp = []
dates_prcp = []
temps = []
prcp = []
station_id = weather_station_dict[x][0]
#for each year from 2015-2019 ...
for year in range(2015, 2019 + 1):
year = str(year)
#make the api call
r = requests.get(
'https://www.ncdc.noaa.gov/cdo-web/api/v2/data?datasetid=GHCND&datatypeid=TAVG&limit=1000&stationid='+\
station_id+'&startdate='+year+'-01-01&enddate='+year+'-12-31', headers={'token':Token}
)
#load the api response as a json
d = json.loads(r.text)
#get all items in the response which are average temperature readings
avg_temps = [item for item in d['results'] if item['datatype']=='TAVG']
#get the date field from all average temperature readings
dates_temp += [item['date'] for item in avg_temps]
#get the actual average temperature from all average temperature readings
temps += [item['value'] for item in avg_temps]
#initialize dataframe
df_temp = pd.DataFrame()
#populate date and average temperature fields
# (cast string date to datetime and convert temperature from tenths of Celsius to Fahrenheit)
df_temp['date'] = [datetime.strptime(d, "%Y-%m-%dT%H:%M:%S") for d in dates_temp]
df_temp[''.join( map(str.lower, x.split() ) )] = [float(v)/10.0*1.8 + 32 for v in temps]
all_cities_df = all_cities_df.merge(df_temp, how='outer', left_on='date', right_on='date') And voilà! A quick look at the output with:
all_cities_df.dropna().describe()Will return something like the table below. As expected, we see that the average temperature in Orlando, FL is higher than Seattle, WA. Hopefully this makes sense. All that’s left to do is to plot it! For example we can draw a histogram and compare the temperatures between cities.
| longbeach | washington | orlando | seattle | |
|---|---|---|---|---|
| count | 1621 | 1621 | 1621 | 1621 |
| mean | 65.3847 | 59.6348 | 73.3121 | 53.7652 |
| std | 7.05271 | 17.3509 | 8.53603 | 10.598 |
| min | 46.58 | 12.92 | 40.82 | 26.6 |
| 25% | 60.62 | 45.14 | 68.72 | 45.86 |
| 50% | 64.58 | 60.62 | 75.38 | 53.24 |
| 75% | 70.7 | 75.74 | 79.88 | 61.7 |
| max | 87.26 | 91.04 | 87.26 | 81.68 |
We can see that places like Long Beach, CA and Washington, DC average temperatures are close (within 5F) despite a more volatile weather in DC with warmer summers and colder winters.