I recently discovered that reading the field names of a .csv file using Python’s DictReader module causes the insertion of the field names back into the DictReader object. This seems like unexpected behavior to me. It appears to happen only once, the first time you attempt to read the field names. I’m using Python 3.3, and found no mention of this in the Python documentation. According to the documentation, fieldnames is a public attribute for reader objects. Unless passed as a parameter when the DictReader is created, it is initialized when the first record is read from the file, or when the attribute is first accessed. Apparently, this initialization is an insertion of the field names into the DictReader object in memory. This caused me much frustration as I was developing an application for work.
I was attempting to print out the field names of a newly-loaded .csv file so that the user could see the field names without opening the file in Excel or notepad, etc. After displaying the field names, my program would do some processing. Unfortunately, this was resulting in an extra “row” of data in the DictReader object. If I imported a .csv file with 4 rows of data, there would be 5 rows of data after accessing the field names. The field names are added as the first row of the DictReader, and all other rows are moved to row i + 1. I have pasted some sample code that I wrote separately from my application to duplicate and demonstrate the problem:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 | import sys import os import csv ''' Opens and imports a .csv file into a DictReader object. Once opened, the rows of the DictReader are printed, the field names are accessed, and the rows are printed again. ''' #======================================================================= # Set the file name of the .csv file (hard-coded for this test program) #======================================================================= filename = "TestList.csv" #======================================================================= # Open input file and establish a DictReader file for easy file reading. # Use the csv sniffer to determine the .csv file's dialect. #======================================================================= _openedfile = open(filename, 'r') idialect = csv.Sniffer().sniff(_openedfile.read(2048)) _openedfile.seek(0) _dictfile = csv.DictReader(_openedfile, dialect=idialect) #======================================================================= # Print each row of the DictReader #======================================================================= _openedfile.seek(0) for row in _dictfile: print(row) print() #======================================================================= # Access the field names in the DictReader object to display them to the # user (not shown) #======================================================================= fields = _dictfile.fieldnames #======================================================================= # Print each row of the DictReader. An extra row with the field names # was added! #======================================================================= _openedfile.seek(0) for row in _dictfile: print(row) print() _openedfile.close() |
print() results:
{'Address': '123 Any St', 'CitySTZIP': 'San Francisco CA 94124', 'ID': '1', 'Name': 'Bill Boulder'} {'Address': '234 Another St', 'CitySTZIP': 'Burlingame CA 94010', 'ID': '2', 'Name': 'Jill Jenkins'} {'Address': '345 Main St', 'CitySTZIP': 'San Francisco CA 94124', 'ID': '3', 'Name': 'Mark Masters'} {'Address': '456 That Ave', 'CitySTZIP': 'Burlingame CA 94010', 'ID': '4', 'Name': 'Sarah Skunk'} {'Address': 'Address', 'CitySTZIP': 'CitySTZIP', 'ID': 'ID', 'Name': 'Name'}{'Address': '123 Any St', 'CitySTZIP': 'San Francisco CA 94124', 'ID': '1', 'Name': 'Bill Boulder'} {'Address': '234 Another St', 'CitySTZIP': 'Burlingame CA 94010', 'ID': '2', 'Name': 'Jill Jenkins'} {'Address': '345 Main St', 'CitySTZIP': 'San Francisco CA 94124', 'ID': '3', 'Name': 'Mark Masters'} {'Address': '456 That Ave', 'CitySTZIP': 'Burlingame CA 94010', 'ID': '4', 'Name': 'Sarah Skunk'} |
In order to solve this problem, I had to force the DictReader to the next row using the next() function to advance past the newly-added row of field names. This means that you have to call next() on the DictReader object immediately after calling seek(0) on the opened file (to return the file pointer to the beginning of the file). This must be done every time that you want to read data from the DictReader starting at the beginning. It also means that immediately after creating the DictReader you must access the field names. This ensures that the extra row is added. If you don’t do this, then calling next() later in the program will result in an unwanted skipping of the first row of actual data.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 | import sys import os import csv ''' Opens and imports a .csv file into a DictReader object. Once opened, the rows of the DictReader are printed, the field names are accessed, and the rows are printed again. ''' #======================================================================= # Set the file name of the .csv file (hard-coded for this test program) #======================================================================= filename = "TestList.csv" #======================================================================= # Open input file and establish a DictReader file for easy file reading. # Use the csv sniffer to determine the .csv file's dialect. #======================================================================= _openedfile = open(filename, 'r') idialect = csv.Sniffer().sniff(_openedfile.read(2048)) _openedfile.seek(0) _dictfile = csv.DictReader(_openedfile, dialect=idialect) fields = _dictfile.fieldnames # access immediately to ensure row is added #======================================================================= # Print each row of the DictReader #======================================================================= _openedfile.seek(0) next(_dictfile)for row in _dictfile: print(list(row.values())) print() #=======================================================================# Accessing the field names in the DictReader a second time# does not insert a second row of field names#=======================================================================fields = _dictfile.fieldnames #======================================================================= # Print each row of the DictReader. An extra row with the field names # was added! #======================================================================= _openedfile.seek(0) next(_dictfile)for row in _dictfile: print(list(row.values())) print() _openedfile.close() |
print() results:
{'Address': '123 Any St', 'CitySTZIP': 'San Francisco CA 94124', 'ID': '1', 'Name': 'Bill Boulder'} {'Address': '234 Another St', 'CitySTZIP': 'Burlingame CA 94010', 'ID': '2', 'Name': 'Jill Jenkins'} {'Address': '345 Main St', 'CitySTZIP': 'San Francisco CA 94124', 'ID': '3', 'Name': 'Mark Masters'} {'Address': '456 That Ave', 'CitySTZIP': 'Burlingame CA 94010', 'ID': '4', 'Name': 'Sarah Skunk'} {'Address': '123 Any St', 'CitySTZIP': 'San Francisco CA 94124', 'ID': '1', 'Name': 'Bill Boulder'} {'Address': '234 Another St', 'CitySTZIP': 'Burlingame CA 94010', 'ID': '2', 'Name': 'Jill Jenkins'} {'Address': '345 Main St', 'CitySTZIP': 'San Francisco CA 94124', 'ID': '3', 'Name': 'Mark Masters'} {'Address': '456 That Ave', 'CitySTZIP': 'Burlingame CA 94010', 'ID': '4', 'Name': 'Sarah Skunk'} |