Choropleth Map not Displaying Correctly

Hello all,
I am having troubles getting my choropleth map to display data.

I use the crime.csv table from https://www.kaggle.com/ankkur13/boston-crime-data, and I am using a GEOJSON file from Boston Analytics (Fetched in the script). Currently, my code runs without errors, but it does not load the data onto the plot.

import pandas as pd
import plotly.graph_objs as go
from urllib.request import urlopen
import json

# Read Dataset
# Located at: https://www.kaggle.com/ankkur13/boston-crime-data
df = pd.read_csv("crime.csv")

with urlopen('http://bostonopendata-boston.opendata.arcgis.com/datasets/9a3a8c427add450eaf45a470245680fc_5.geojson?outSR={%22latestWkid%22:2249,%22wkid%22:102686}') as response:
    pd_districts = json.load(response)

df_agg = df.groupby("DISTRICT").agg(CRIMES=("YEAR","count")) 
df_agg.reset_index(inplace=True)
df_agg = df_agg[df_agg['DISTRICT'] != 'nan']

df['DISTRICT'] = df['DISTRICT'].apply(lambda x: str(x))
fig = go.Figure(go.Choroplethmapbox(geojson=pd_districts, 
    locations=df_agg['DISTRICT'], 
    z=df_agg['CRIMES']))

fig.update_layout(mapbox_style="carto-positron")
fig.update_geos(fitbounds="locations")
fig.show()

I believe it has something to do with my featureidkey. However, I have tried multiple variants such as properties.DISTRICT, properties.ID, but to no avail.

I also ran a few sanity checks to make sure that the data was accessible, and here they are:

print(df_agg['DISTRICT'].unique())
print(pd_districts['features'][0]['properties']['ID'])
print(df['DISTRICT'].dtype)

I can load this data into a folium choropleth map if I remove every other entry:

import pandas as pd
import folium

m = folium.Map(location=[42.3, -71.05], zoom_start=11, tiles='cartodbpositron')
folium.Choropleth(geo_data='police_districts_tst.geojson',
    data=df_agg,
    columns=['DISTRICT', 'CRIMES'],
    key_on='properties.DISTRICT',
    fill_color='YlGn',
    fill_opacity=0.6,
    line_opacity=0.2,
).add_to(m)

m

Any help would be appreciated, as I would eventually like to build a dash application, and I would be unable to do that with a Folium Choropleth.

Hi @hunterb9101 welcome to the forum! I gave it a try with go.Choropleth instead of go.Choroplethmapbox and it displayed the attached figure. It looks like the geojson file has lots lots of details (this or it’s not correctly processed), maybe mapbox is having a hard time to use it? Any chance you could use a simpler geojson file?


Below is the code I used but my computer was clearly having a hard time rendering the figure, probably because of the very big geojson file

import pandas as pd
import plotly.graph_objs as go
from urllib.request import urlopen
import json

# Read Dataset
# Located at: https://www.kaggle.com/ankkur13/boston-crime-data
df = pd.read_csv("~/Downloads/crime/crime.csv", encoding='latin-1')

with urlopen('http://bostonopendata-boston.opendata.arcgis.com/datasets/9a3a8c427add450eaf45a470245680fc_5.geojson?outSR={%22latestWkid%22:2249,%22wkid%22:102686}') as response:
    pd_districts = json.load(response)

df_agg = df.groupby("DISTRICT").agg(CRIMES=("YEAR","count")) 
df_agg.reset_index(inplace=True)
df_agg = df_agg[df_agg['DISTRICT'] != 'nan']

df['DISTRICT'] = df['DISTRICT'].apply(lambda x: str(x))
fig = go.Figure(go.Choropleth(
    geojson=pd_districts, 
    featureidkey='properties.ID',
    locations=df_agg['DISTRICT'], 
    z=df_agg['CRIMES']))

fig.show()

Hi @hunterb9101,

There are more issues with your data, and how you are instantiating the go.Choroplethmapbox class:

  • first, each feature in the geojson file must have a key β€˜id’, i.e. feature['id'], not
    feature['properties']['id'];

  • the column in the dataframe that provides the z-values must have its elements associated to each geographical region (Polygon or MultiPolygon) given in the same order the corresponding feature['id'] appears in the geojson data. In your case this order is different.

  • the region geometries must be given in lon and lat coordinates, but your geojson file has this geometry described in odd coordinates, some thousands times greater than the lon and lat for Boston.

Here is a code that prepares data for a Choroplethmapbox:

import pandas as pd
import plotly.graph_objs as go
from urllib.request import urlopen
import json

with urlopen('http://bostonopendata-boston.opendata.arcgis.com/datasets/9a3a8c427add450eaf45a470245680fc_5.geojson?outSR={%22latestWkid%22:2249,%22wkid%22:102686}') as response:
    jdata = json.load(response)
distr_ids = [feat['properties']['DISTRICT'] for feat in jdata['features']]
#distr_ids
#Extend each feature in jdata['features'] with the key 'id' equal to the district id, because Choroplethmapbox #looks for this id when it colormaps the corresponding z-value to a color in the colorscale:
for k, feat in enumerate(jdata['features']):
    jdata['features'][k]['id']= distr_ids[k]
df = pd.read_csv("crime.csv", encoding ='iso-8859-1')

df_agg = df.groupby("DISTRICT").agg(CRIMES=("YEAR","count")) 
df_agg.reset_index(inplace=True)
df_agg = df_agg[df_agg['DISTRICT'] != 'nan']

df['DISTRICT'] = df['DISTRICT'].apply(lambda x: str(x))
#Reorder the rows in the dataframe df_agg such that the elements in the column 'DISTRICT' have the same #order like elements in the list distr_ids (otherwise the Choroplethmapbox does not work)
df_ids = list(df_agg['DISTRICT'])
J = [df_ids.index(elem) for elem in distr_ids]

df_agg_new = df_agg.iloc[J]

#df_agg_new
fig = go.Figure(go.Choroplethmapbox(geojson=jdata, 
                locations=df_agg_new['DISTRICT'], 
                z=df_agg_new['CRIMES']))

fig.update_layout(mapbox_style="carto-positron",
                 mapbox_center=dict(lat=42.361145, lon=-71.057083),
                 mapbox_zoom=10)

fig.show('browser')

The coordinates of the first region in the geojson are as follows:

jdata['features'][0]['geometry']
{'type': 'MultiPolygon',
 'coordinates': [[[[771204.9004331976, 2967614.94987002],
    [771204.1002379358, 2967616.399998352],
    [771201.8000456989, 2967620.999726683],
    [771200.9004411846, 2967622.000052765],
    [771199.9001151025, 2967623.3002470136],
    [771198.8997890204, 2967624.199851513],
    [771197.2997266054, 2967625.4997176826],
    [771195.9998604357, 2967626.5000437647],
    [771194.6996661872, 2967627.199517444],
    [771193.0996037722, 2967628.1001061797],
    [771191.0999358594, 2967628.799579859],
    [771189.200005278, 2967629.799905941],

but the Boston lon and lat, set above are lon=-71.057083, lat=42.361145
What are those big numbers listed in the geojson file??!