Making sense of COVID-19 data using Pandas & Python with VS Code on WSL

Viral Patel
3 min readMar 30, 2020

(Once a helicopter parent, now busy doing nothing on weekends!! )

Overview

This article should give you a high level understanding of using Python and Pandas for basic data digging, and hopefully open up your window to the amazing world of data science.

I used names of a few tools in the title of this post, and here is breakdown of those.,

  1. WSL (Windows Subsystem for Linux) with Windows 10. It’s amazing what you can do with that now, grep, tail -f, bash….and everything you that you can do with Linux.
  2. VS Code. Unless you are living under a rock, you know what this is. One of the best IDEs out there, lightweight, powerful and functional.
  3. Python. An interpreted, high-level programming language, first release in 1991, and has recently got popular after adaptation by major data science projects and information security industry. Download Python Interpreter.
  4. Pandas. An open source data analysis library built for Python.

Steps

Install WSL

Here are the steps to install and troubleshoot, https://docs.microsoft.com/en-us/windows/wsl/install-win10. I have Ubuntu 18.4.

Install VS Code

Download here

Install Remote VS Code

Start Ubutu from windows, navigate to the folder where you have your code and type

code 

This will install necessary tools and open a remote VS Code in your WSL and connect WSL by using the following command

Start a WSL session

Select Python interpreter, if prompted for.

Select Python 3 interpreter

Validate python by typing the following

python --version

Install Pandas

As per pandas.pydata.org, Pandas is part of Anaconda distribution. If you prefer pip, install using one of the following commands.

Photo by Sid Balachandran on Unsplash

Install pip (may take more than 5 minutes if you are on WSL, its faster on WSL2)

sudo apt install python-pip

and install pandas

pip install pandas

if that doesn’t work, try this

python -m pip install --user pandas

And you might have to install those on remote VS Code again, even if you have installed on those on full VS Code on Windows.

You might have to install xlrd for this.

pip install xlrd

or

python -m pip install — user xlrd

Write and execute Python with pandas

Now the real stuff!!

import pandas as pndscovid19_worldwide_df = pnds.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/03-28-2020.csv')print(covid19_worldwide_df.groupby('Country_Region').sum().sort_values(by='Deaths', ascending=False).head(10))

And run….

This takes the data from Daily report of 3–28–2020 and groups by Country, and shows the top 10 results by deaths.

Everything Else

Resources

  1. Very useful info at 10 minutes to pandas
  2. Data is from 2019 Novel Coronavirus COVID-19 (2019-nCoV) Data Repository by Johns Hopkins CSSE
  3. And lot of articles on medium.com

Extras

Using Hack fonts in VS Code was amazing, download from here.

Next steps

https://towardsdatascience.com/ has amazing articles and useful tips to take your data digging to next level

--

--

Viral Patel

I code, I manage teams, I contribute, I learn everyday and write on some days. Passionate by making use of technology to achieve greater heights, for everyone.