Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
190 views
in Technique[技术] by (71.8m points)

python - Pandas column formatting

I have a pandas column with strings in the form of: '0.47±0.1'. What would be the best way of summing the entire column with an overall uncertainty?

Thanks!


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The most straigt forward way is to use pandas.DataFrame.apply(...,axis=1) to apply a function across your rows of the tables.

import numpy as np
import pandas as pd

df = pd.DataFrame({'msr':['1.82±0.10','1.72±.8','1.93±.7']})

def fun(row):
    v,e = row['msr'].split('±')
    row['val'] = float(v)
    row['err'] = float(e)
    return row
df = df.apply(fun,axis=1)

#adding a new systematic uncertainty in quadrature
syst_err = 0.05 
df['tot_err'] = np.sqrt(df['err']**2 + syst_err**2) 

enter image description here

Efficiency-wise it is (always) better to not specify a function yourself but use in-built functions. Such as pandas.DataFrame.str.split().

val, err = np.array(df['msr'].str.split('±').to_list(),dtype=float).transpose()
df['val'] = val
df['err'] = err

A quick run-time comparison gives 2.75 ms ± 134 μs for the first approach and 425 μs ± 5.35 μs for the second approach.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...