Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
588 views
in Technique[技术] by (71.8m points)

python - Parse CSV file and aggregate the values

I'd like to parse a CSV file and aggregate the values. The city row has repeating values (sample):

CITY,AMOUNT
London,20
Tokyo,45
London,55
New York,25

After parsing the result should be something like:

CITY, AMOUNT
London,75
Tokyo,45
New York,25

I've written the following code to extract the unique city names:

def main():
    contrib_data = list(csv.DictReader(open('contributions.csv','rU')))
    combined = []
    for row in contrib_data:
      if row['OFFICE'] not in combined:
        combined.append(row['OFFICE'])

How do I then aggregate values?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Tested in Python 3.2.2:

import csv
from collections import defaultdict
reader = csv.DictReader(open('test.csv', newline=''))
cities = defaultdict(int)
for row in reader:
    cities[row["CITY"]] += int(row["AMOUNT"])

writer = csv.writer(open('out.csv', 'w', newline = ''))
writer.writerow(["CITY", "AMOUNT"])
writer.writerows([city, cities[city]] for city in cities)

Result:

CITY,AMOUNT
New York,25
London,75
Tokyo,45

As for your added requirements:

import csv
from collections import defaultdict

def default_factory():
    return [0, None, None, 0]

reader = csv.DictReader(open('test.csv', newline=''))
cities = defaultdict(default_factory)
for row in reader:
    amount = int(row["AMOUNT"])
    cities[row["CITY"]][0] += amount
    max = cities[row["CITY"]][1]
    cities[row["CITY"]][1] = amount if max is None else amount if amount > max else max
    min = cities[row["CITY"]][2]
    cities[row["CITY"]][2] = amount if min is None else amount if amount < min else min
    cities[row["CITY"]][3] += 1
for city in cities:
    cities[city][3] = cities[city][0]/cities[city][3] # calculate mean

writer = csv.writer(open('out.csv', 'w', newline = ''))
writer.writerow(["CITY", "AMOUNT", "max", "min", "mean"])
writer.writerows([city] + cities[city] for city in cities)

This gives you

CITY,AMOUNT,max,min,mean
New York,25,25,25,25.0
London,75,55,20,37.5
Tokyo,45,45,45,45.0

Note that under Python 2, you'll need the additional line from __future__ import division at the top to get correct results.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...