Using the csv module

Published

2023-07-31

The csv module

There’s a very common format in use for tabular data, the CSV or comma separated value format. Many on-line data sources publish data in this format, and all spreadsheet software can read from and write to this format. The idea is simple: columns of data are separated by commas. That’s it!

Here’s an example of some tabular data:

Year FIFA champion
2018 France
2014 Germany
2010 Spain
2006 Italy
2002 Brazil

Here’s how it might be represented in CSV format:

Year,FIFA champion
2018,France
2014,Germany
2010,Spain
2006,Italy
2002,Brazil

Pretty simple.

What happens if we have commas in our data? Usually numbers don’t include comma separators when in CSV format. Instead, commas are added only when data are displayed. So, for example, we might have data like this (using format specifiers):

Country 2018 population
China 1,427,647,786
India 1,352,642,280
USA 327,096,265
Indonesia 267,670,543
Pakistan 212,228,286
Brazil 209,469,323
Nigeria 195,874,683
Bangladesh 161,376,708
Russia 145,734,038

and the CSV data would look like this:

Country,2018 population
China,1427647786
India,1352642280
USA,327096265
Indonesia,267670543
Pakistan,212228286
Brazil,209469323
Nigeria,195874683
Bangladesh,161376708
Russia,145734038

But what if we really wanted commas in our data?

Building Address
Waterman 85 Prospect St, Burlington, VT
Innovation 82 University Pl, Burlington, VT

We’d probably break this into additional columns.

Building,Street,City,State
Waterman,85 Prospect St,Burlington,VT
Innovation,82 University Pl,Burlington,VT

What if we really, really had to have commas in our data? Oh, OK. Here are cousin David’s favorite bands of all time:

Band Rank
Lovin’ Spoonful 1
Sly and the Family Stone 2
Crosby, Stills, Nash and Young 3
Earth, Wind and Fire 4
Herman’s Hermits 5
Iron Butterfly 6
Blood, Sweat & Tears 7
The Monkees 8
Peter, Paul & Mary 9
Ohio Players 10

Now there’s no way around commas in the data. For this we wrap the data including commas in quotation marks.

Band,Rank
Lovin' Spoonful,1
Sly and the Family Stone,2
"Crosby, Stills, Nash and Young",3
"Earth, Wind and Fire",4
Herman's Hermits,5
Iron Butterfly,6
"Blood, Sweat & Tears",7
The Monkees,8
"Peter, Paul & Mary",9
Ohio Players,10

(We’ll save the case of data which includes commas and quotation marks for another day.)

We can read data like this using Python’s csv module.

import csv
with open('bands.csv', newline='') as csvfile:
    reader = csv.reader(csvfile)
    for row in reader:
        print(row)

This prints:

['Band', 'Rank']
["Lovin' Spoonful", '1']
['Sly and the Family Stone', '2']
['Crosby, Stills, Nash and Young', '3']
['Earth, Wind and Fire', '4']
["Herman's Hermits", '5']
['Iron Butterfly', '6']
['Blood, Sweat & Tears', '7']
['The Monkees', '8']
['Peter, Paul & Mary', '9']
['Ohio Players', '10']

Notice that we have to create a special object, a CSV reader. We instantiate this object by calling the constructor function, csv.reader(), and we pass to this function the file object we wish to read. Notice also that we read each row of our data file into a list, where the columns are separated by commas. That’s very handy!

We can write data to a CSV file as well.

import csv

bands = [['Deerhoof', 1],
         ['Lightning Bolt', 2],
         ['Radiohead', 3],
         ['Big Thief', 4],
         ['King Crimson', 5],
         ['French for Rabbits', 6],
         ['Yak', 7],
         ['Boygenius', 8],
         ['Tipsy', 9],
         ['My Bloody Valentine', 10]]

with open('bands.csv', 'w', newline='') as csvfile:
    writer = csv.writer(csvfile)
    for item in bands:
        writer.writerow(item)

This writes

Deerhoof,1
Lightning Bolt,2
Radiohead,3
Big Thief,4
King Crimson,5
French for Rabbits,6
Yak,7
Boygenius,8
Tipsy,9
My Bloody Valentine,10

to the file.

The newline='' keyword argument

If you’re using a Mac or a Linux machine, the newline='' keyword argument may not be strictly necessary when opening a file for use with a csv reader or writer. However, omitting it could cause problems on a Windows machine and so it’s probably best to include it for maximum portability. The Python documentation recommends using it.

Iterating CSV reader objects

Unlike a list, tuple, or string, a CSV reader object can only be iterated once. This is because a CSV reader object is a type of iterator (which is different from an iterable). An iterator can be iterated only once. Accordingly, if you wish to iterate over a CSV file more than once, you’ll need to create a new CSV reader object each time.

Original author: Clayton Cafiero < [given name] DOT [surname] AT uvm DOT edu >

No generative AI was used in producing this material. This was written the old-fashioned way.

This material is for free use under either the GNU Free Documentation License or the Creative Commons Attribution-ShareAlike 3.0 United States License (take your pick).