python - 用于在带有数字列表的pd.DF中缩放数字的功能(Function for scaling numbers in a pd.DF with a list of numbers)

Question

Welcome To Ask or Share your Answers For Others

python - 用于在带有数字列表的pd.DF中缩放数字的功能(Function for scaling numbers in a pd.DF with a list of numbers)

posted Feb 21, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - 用于在带有数字列表的pd.DF中缩放数字的功能(Function for scaling numbers in a pd.DF with a list of numbers)

Pretty new to Python.

(Python的新手。)

I'm trying to create a function which should look at a csv file, with an ID number, Name, and then N collumns of numbers from different tests and then scale/round the numbers so they can be compared to the Danish grading system from [-3, 00, 02, 4, 7, 10, 12].

(我正在尝试创建一个函数，该函数应查看一个csv文件，该文件具有ID号，名称，然后是来自不同测试的N个数字列，然后缩放/四舍五入这些数字，以便可以将它们与丹麦的评分系统进行比较[-3，00，02，4，4，7，10，12]。)

My script below does exactly that, but my function only returns the last result of the DF.

(我下面的脚本正是这样做的，但是我的函数仅返回DF的最后结果。)

Heres the CSV i use for testing:

(这是我用于测试的CSV文件：)

StudentID,Name,Assignment1,Assignment2,Assignment3 
s123456,Michael Andersen,7,5,4 
s123789,Bettina Petersen,12,3,10 
s123468,Thomas Nielsen,-3,7,2 
s123579,Marie Hansen,10,12,12
s123579,Marie Hansen,10,12,12
s127848, Andreas Nielsen,2,2,2
s120799, Mads Westergaard,12,12,10

Its worth to mention that i need these functions separate, for my main script.

(值得一提的是，对于我的主脚本，我需要将这些功能分开。)

I've made a simple function which loads the file using pandas:

(我做了一个简单的函数，该函数使用pandas加载文件：)

import pandas as pd

def dataLoad(filename):
    grades = pd.read_csv(filename)
    return grades

then i've made this script for the rounding of the numbers:

(然后我就用这个脚本对数字进行了四舍五入：)

# Importing modules
import pandas as pd
import numpy as np
#Loading in the function dataLoad
from dataLoad import dataLoad

#Defining my data witht the function
grades=dataLoad('Karakterer.csv')
def roundGrade(grades):
    #Dropping the two first collumns of the pd.DF
    grades=grades.drop(['StudentID','Name'],axis=1)
    #Making the pd.DF into a numpy array
    sample_grades=np.array(grades)
    #Setting the parameters of the scale to round up to
    grade_Scale = np.array([-3,0,2,4,7,10,12])
    #Defining i, so i get gradually bigger with each cycle
    i=0
    #Making a for loop, which rounds every number in every row of the given array
    for i in range(0,len(grades)):

        grouped = [min(grade_Scale,key=lambda x:abs(grade-x)) for grade in sample_grades[i,:]]
        #Making i 1 time bigger for each cycle
        i=i+1

    return grouped

Tell if you need some more information about the script, cheers guys!

(告诉大家您是否需要有关脚本的更多信息，干杯！)

ask by Mads Westergaard translate from so

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-02-20T21:50:04+0000

For improve performance use numpy :

(为了提高性能，请使用numpy ：)

#assign output to df instead grades for possible assign values back in last step
df = dataLoad('Karakterer.csv')
grades = df.drop(['StudentID','Name'],axis=1)

grade_Scale = np.array([-3,0,2,4,7,10,12])
grades=df.drop(['StudentID','Name'],axis=1)
print (grades)
   Assignment1  Assignment2  Assignment3
0            7            5            4
1           12            3           10
2           -3            7            2
3           10           12           12
4           10           12           12
5            2            2            2
6           12           12           10

arr = grades.values
a = grade_Scale[np.argmin(np.abs(arr[:,:, None] - grade_Scale[None,:]), axis=2)]
print (a)
[[ 7  4  4]
 [12  2 10]
 [-3  7  2]
 [10 12 12]
 [10 12 12]
 [ 2  2  2]
 [12 12 10]]

Last if need assign back output to columns:

(最后，如果需要将输出分配回列：)

df[grades.columns] = a
print (df)
  StudentID               Name  Assignment1  Assignment2  Assignment3
0   s123456   Michael Andersen            7            4            4
1   s123789   Bettina Petersen           12            2           10
2   s123468     Thomas Nielsen           -3            7            2
3   s123579       Marie Hansen           10           12           12
4   s123579       Marie Hansen           10           12           12
5   s127848    Andreas Nielsen            2            2            2
6   s120799   Mads Westergaard           12           12           10

Explanation :

(说明：)

It is used this solution but for multiple columns:

(它用于此解决方案，但用于多个列：)

Idea is compare 2d array created from all columns from DataFrame to arr by array grade_Scale .

(想法是通过array grade_Scale比较从DataFrame到arr的所有列创建的2d数组。)

So you can use broadcasting for possible create 3d array of differences between them with absolute values:

(因此，您可以使用broadcasting来创建可能具有绝对值的3d array差异3d array ：)

print (np.abs(arr[:,:, None] - grade_Scale[None,:]))

[[[10  7  5  3  0  3  5]
  [ 8  5  3  1  2  5  7]
  [ 7  4  2  0  3  6  8]]

 [[15 12 10  8  5  2  0]
  [ 6  3  1  1  4  7  9]
  [13 10  8  6  3  0  2]]

 [[ 0  3  5  7 10 13 15]
  [10  7  5  3  0  3  5]
  [ 5  2  0  2  5  8 10]]

 [[13 10  8  6  3  0  2]
  [15 12 10  8  5  2  0]
  [15 12 10  8  5  2  0]]

 [[13 10  8  6  3  0  2]
  [15 12 10  8  5  2  0]
  [15 12 10  8  5  2  0]]

 [[ 5  2  0  2  5  8 10]
  [ 5  2  0  2  5  8 10]
  [ 5  2  0  2  5  8 10]]

 [[15 12 10  8  5  2  0]
  [15 12 10  8  5  2  0]
  [13 10  8  6  3  0  2]]]

Then use position by minimal values by numpy.argmin per axis=2 (working with 3rd axis in 3d array):

(然后使用每axis=2的numpy.argmin最小值来使用位置（在3d数组中使用第3轴）：)

print (np.argmin(np.abs(arr[:,:, None] - grade_Scale[None,:]), axis=2))
[[4 3 3]
 [6 2 5]
 [0 4 2]
 [5 6 6]
 [5 6 6]
 [2 2 2]
 [6 6 5]]

And last use indexing by grade_Scale values:

(最后一次使用grade_Scale值建立索引：)

print (grade_Scale[np.argmin(np.abs(arr[:,:, None] - grade_Scale[None,:]), axis=2)])
[[ 7  4  4]
 [12  2 10]
 [-3  7  2]
 [10 12 12]
 [10 12 12]
 [ 2  2  2]
 [12 12 10]]

Categories

python - 用于在带有数字列表的pd.DF中缩放数字的功能(Function for scaling numbers in a pd.DF with a list of numbers)

python - 用于在带有数字列表的pd.DF中缩放数字的功能(Function for scaling numbers in a pd.DF with a list of numbers)

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags