Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
284 views
in Technique[技术] by (71.8m points)

python - 用于在带有数字列表的pd.DF中缩放数字的功能(Function for scaling numbers in a pd.DF with a list of numbers)

Pretty new to Python.

(Python的新手。)

I'm trying to create a function which should look at a csv file, with an ID number, Name, and then N collumns of numbers from different tests and then scale/round the numbers so they can be compared to the Danish grading system from [-3, 00, 02, 4, 7, 10, 12].

(我正在尝试创建一个函数,该函数应查看一个csv文件,该文件具有ID号,名称,然后是来自不同测试的N个数字列,然后缩放/四舍五入这些数字,以便可以将它们与丹麦的评分系统进行比较[-3,00,02,4,4,7,10,12]。)

My script below does exactly that, but my function only returns the last result of the DF.

(我下面的脚本正是这样做的,但是我的函数仅返回DF的最后结果。)

Heres the CSV i use for testing:

(这是我用于测试的CSV文件:)

StudentID,Name,Assignment1,Assignment2,Assignment3 
s123456,Michael Andersen,7,5,4 
s123789,Bettina Petersen,12,3,10 
s123468,Thomas Nielsen,-3,7,2 
s123579,Marie Hansen,10,12,12
s123579,Marie Hansen,10,12,12
s127848, Andreas Nielsen,2,2,2
s120799, Mads Westergaard,12,12,10

Its worth to mention that i need these functions separate, for my main script.

(值得一提的是,对于我的主脚本,我需要将这些功能分开。)

I've made a simple function which loads the file using pandas:

(我做了一个简单的函数,该函数使用pandas加载文件:)

import pandas as pd

def dataLoad(filename):
    grades = pd.read_csv(filename)
    return grades

then i've made this script for the rounding of the numbers:

(然后我就用这个脚本对数字进行了四舍五入:)

# Importing modules
import pandas as pd
import numpy as np
#Loading in the function dataLoad
from dataLoad import dataLoad

#Defining my data witht the function
grades=dataLoad('Karakterer.csv')
def roundGrade(grades):
    #Dropping the two first collumns of the pd.DF
    grades=grades.drop(['StudentID','Name'],axis=1)
    #Making the pd.DF into a numpy array
    sample_grades=np.array(grades)
    #Setting the parameters of the scale to round up to
    grade_Scale = np.array([-3,0,2,4,7,10,12])
    #Defining i, so i get gradually bigger with each cycle
    i=0
    #Making a for loop, which rounds every number in every row of the given array
    for i in range(0,len(grades)):

        grouped = [min(grade_Scale,key=lambda x:abs(grade-x)) for grade in sample_grades[i,:]]
        #Making i 1 time bigger for each cycle
        i=i+1

    return grouped

Tell if you need some more information about the script, cheers guys!

(告诉大家您是否需要有关脚本的更多信息,干杯!)

  ask by Mads Westergaard translate from so

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

For improve performance use numpy :

(为了提高性能,请使用numpy :)

#assign output to df instead grades for possible assign values back in last step
df = dataLoad('Karakterer.csv')
grades = df.drop(['StudentID','Name'],axis=1)

grade_Scale = np.array([-3,0,2,4,7,10,12])
grades=df.drop(['StudentID','Name'],axis=1)
print (grades)
   Assignment1  Assignment2  Assignment3
0            7            5            4
1           12            3           10
2           -3            7            2
3           10           12           12
4           10           12           12
5            2            2            2
6           12           12           10

arr = grades.values
a = grade_Scale[np.argmin(np.abs(arr[:,:, None] - grade_Scale[None,:]), axis=2)]
print (a)
[[ 7  4  4]
 [12  2 10]
 [-3  7  2]
 [10 12 12]
 [10 12 12]
 [ 2  2  2]
 [12 12 10]]

Last if need assign back output to columns:

(最后,如果需要将输出分配回列:)

df[grades.columns] = a
print (df)
  StudentID               Name  Assignment1  Assignment2  Assignment3
0   s123456   Michael Andersen            7            4            4
1   s123789   Bettina Petersen           12            2           10
2   s123468     Thomas Nielsen           -3            7            2
3   s123579       Marie Hansen           10           12           12
4   s123579       Marie Hansen           10           12           12
5   s127848    Andreas Nielsen            2            2            2
6   s120799   Mads Westergaard           12           12           10

Explanation :

(说明 :)

It is used this solution but for multiple columns:

(它用于此解决方案,但用于多个列:)

Idea is compare 2d array created from all columns from DataFrame to arr by array grade_Scale .

(想法是通过array grade_Scale比较从DataFramearr的所有列创建的2d数组。)

So you can use broadcasting for possible create 3d array of differences between them with absolute values:

(因此,您可以使用broadcasting来创建可能具有绝对值的3d array差异3d array :)

print (np.abs(arr[:,:, None] - grade_Scale[None,:]))

[[[10  7  5  3  0  3  5]
  [ 8  5  3  1  2  5  7]
  [ 7  4  2  0  3  6  8]]

 [[15 12 10  8  5  2  0]
  [ 6  3  1  1  4  7  9]
  [13 10  8  6  3  0  2]]

 [[ 0  3  5  7 10 13 15]
  [10  7  5  3  0  3  5]
  [ 5  2  0  2  5  8 10]]

 [[13 10  8  6  3  0  2]
  [15 12 10  8  5  2  0]
  [15 12 10  8  5  2  0]]

 [[13 10  8  6  3  0  2]
  [15 12 10  8  5  2  0]
  [15 12 10  8  5  2  0]]

 [[ 5  2  0  2  5  8 10]
  [ 5  2  0  2  5  8 10]
  [ 5  2  0  2  5  8 10]]

 [[15 12 10  8  5  2  0]
  [15 12 10  8  5  2  0]
  [13 10  8  6  3  0  2]]]

Then use position by minimal values by numpy.argmin per axis=2 (working with 3rd axis in 3d array):

(然后使用每axis=2numpy.argmin最小值来使用位置(在3d数组中使用第3轴):)

print (np.argmin(np.abs(arr[:,:, None] - grade_Scale[None,:]), axis=2))
[[4 3 3]
 [6 2 5]
 [0 4 2]
 [5 6 6]
 [5 6 6]
 [2 2 2]
 [6 6 5]]

And last use indexing by grade_Scale values:

(最后一次使用grade_Scale值建立索引:)

print (grade_Scale[np.argmin(np.abs(arr[:,:, None] - grade_Scale[None,:]), axis=2)])
[[ 7  4  4]
 [12  2 10]
 [-3  7  2]
 [10 12 12]
 [10 12 12]
 [ 2  2  2]
 [12 12 10]]

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...