In an effort to better slow learn python pandas for myself, I'm going to be making a series of small posts aimed at people like me who are having difficulty with small things that impede great progress. It sometimes helps to see something laid out with a little more explanation than just RTFM―though with familiarity, reading the manual can be much more efficient. If you just want to RTFM on importing a csv file, go here.
note: this is a work-in-progress post
The Dataset
Let's look at leading causes of death in the United States. Download the csv file here. Mine was downloaded to my download folder. Take special note of where you're saving this file (we'll need to know for line #2 of our program).
The Code
We're going to slowly walk through the following code:
import pandas as pd
path = "/home/kyconway/Downloads/NCHS_-_Leading_Causes_of_Death__United_States.csv"
df = pd.read_csv(path)
df.head()
It's not very long, there's not much there, but it's a wonderful incantation. If we do everything just right we'll see an output of data something like the following. If your's looks less pretty, that's okay. I'm using jupyter notebook for this (which you're free to use). You can also use any other editor like Atom or Geany or Notepad++ if you like.
At the moment this blog doesn't have a post on text editors, installing python, or running your code. My apologies for this. If you're just starting out I suggest you look at Zed Shaw's excellent Learn Python the Hard Way.
A small step, but an important one―we will have successfully imported data from a csv file into a python pandas dataframe! But don't copy the code just yet. We're slow learning. Learning slowly will pay off in unexpected ways when you go off to RTFM.
Line 1: Import Pandas (and other libraries)
import pandas as pd
Let's explain that line (as best I know how). At the top of my new file I'll type the command: import
[space] followed by the library name (in this case "pandas") [space] followed by as
in order to invoke the library more simply by a variable (in this case "pd") when writing code.
As an example, instead of having to write: pandas.read_csv(path)
I can instead write pd.read_csv(path)
. So, the "import as" function is a variable you can define to call the library in your code easier.
While you could just as readily callimport pandas as treeman
and then calltreeman.read_csv(path)
, it is probably best to usepd
as the shorthand for pandas (as it's quite common in code you will read and write with others.
Line 2: Where is the File
path = "/home/kyconway/Downloads/NCHS_-_Leading_Causes_of_Death__United_States.csv"
Line 3: Create a Dataframe
df = pd.read_csv(path)
Line 4: Print 1st Five Rows
df.head()