How to plot data on a Map using GeoPandas in Python.
I love to do Analysis of Data related to India. There is a ton of data available on Kaggle and they are a great way of practicing the stuff that I learned. The fun thing about Kaggle is you can publish your entire analysis along with code in a Jupyter Notebook for others to see. In this way, you can get great feedback from the community, and also it helps you keep being engaged in your process of learning Data Science. So, I recommend you to do that if you're in the process of learning Data Science.
Now, before I get sidetracked by the awesomeness of Kaggle, let me tell you how to represent the data on the Map of India because it can be much more impactful than your plain old Bar Plots. You can replicate this process for any country as long as you have the related GIS Data.
STEP 1: Getting GIS Data for plotting on the India map
Along with great Data Sets for Data Science, Kaggle also has tons of Geographic Information System(GIS) data. You can find India’s GIS data here. The only folder you need is the Indian States Folder. We don't need the Indian Boundary Folder.
STEP 2: Installing GeoPandas
GeoPandas is a great library to work with GIS Data. It’s built on top of several other libraries and hence requires us to install those dependencies for GeoPandas to work. But if you have Anaconda Distribution, this becomes a much more easy process. So, if you don't have Anaconda Distribution installed on your system I recommend you to install it immediately. Now assuming you have this distribution all you have to do is open Anaconda Prompt(you can easily find it in the search) and type the following command and press Enter.
conda install --channel conda-forge geopandas
Wait for the installation to complete and we are good to go for the next step.
What exactly are Shapefiles?
Before going to Step 3, let me explain what a shapefile is. It is important to have an idea about them because they are at the heart of visualizing the shape of a map. In simple terms, a shapefile is a nontopological format for storing the geometric location and attribute information of geographic features. Geographic features in a shapefile can be represented by points, lines, or polygons (areas). Listen, knowing exactly what a shapefile is not that crucial for our task. So, don't sweat about it.
STEP 3: Loading the Shapefiles and Plotting the map
Before loading up the shapefile through GeoPandas library, Just make sure you have the path to the file Indian_states.shp.
The following Syntax Loads the shapefile(Indian_states.shp) into a GeoDataFrame which I assigned the name “india”.
From the above output, we can see that the object “india” is a GeoDataFrame. It is very similar to Pandas DataFrame, many of the operations that we do on Pandas DataFrames are also applicable to it. GeoDataFrames usually have a column(in this case ‘geometry’) containing values representing different shapes. Now to visualize the shape of the Map all we have to do is call .plot() method on the “india” GeoDataFrame.
STEP 4: Representing Data on the visualized map.
To plot the data, all we have to do is add a column to ‘india’ GeoDataFrame containing values we want to represent with respect to each state in ‘st_nm’ column. For example, take the “Total_Suicides” column that I added to “india” GeoDataFrame.
The Total_Suicides column represents the total number of suicides in each state from 2001–2012. Now to represent this data on the map of India all we have to do is pass the name of the column to argument “column =” inside the .plot() method of the GeoDataFrame.
NOTE: The passing of the argument legend = True creates the frequency bar that you see on the right.
As we can see from the above image, we can easily visualize data on any map if we have the relevant GIS data. I recommend you to go to kaggle and download a data set on whichever country you are interested in and start visualizing the data on a map.
I hope you liked the article and stay tuned for much more.
Thanks for reading!