Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
309 views
in Technique[技术] by (71.8m points)

python - Python多项式回归绘图错了吗?(Python polynomial regression plotting wrong?)

Blockquote

(块引用)

New to python and trying to complete a third order polynomial regression on some data.

(python的新手,正在尝试对某些数据完成三阶多项式回归。)

when I use polynomial regression I don't get the fit I am expecting.

(当我使用多项式回归时,我没有达到预期的拟合度。)

I am trying to understand why the polynomial regression in python is worse then in excel.

(我试图理解为什么python中的多项式回归要比excel中的差。)

When I fit the same data in excel I get a coefficient of determination of ≈.95 and the plot looks like a third order polynomial.

(当我在excel中拟合相同的数据时,我得到的确定系数约为0.95,该图看起来像三阶多项式。)

However, using sickitlearn it is ≈.78 and the fit almost looks linear.

(但是,使用病态学习≈.78时,拟合度几乎呈线性。)

Is this happening because I do not have enough data?

(是否因为我没有足够的数据而发生这种情况?)

Also does having x as datetime64[ns]type on my x-axis effect the regression?

(在x轴上使用x作为datetime64 [ns]类型是否还会影响回归?)

The code runs.

(代码运行。)

However,I am not sure if this is a coding problem or some other problem.

(但是,我不确定这是编码问题还是其他问题。)

I am using anaconda (python 3.7) and running the code in spyder

(我正在使用anaconda(python 3.7)并在spyder中运行代码)

import operator
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
#import data
data = pd.read_excel(r'D:AnacondaAnacondaXDatadata.xlsx', skiprows = 0)

x=np.c_[data['Date']]
y=np.c_[data['level']]
#regression
polynomial_features= PolynomialFeatures(degree=3)
x_poly = polynomial_features.fit_transform(x)

model = LinearRegression()
model.fit(x_poly, y)
y_poly_pred = model.predict(x_poly)
#check regression stats
rmse = np.sqrt(mean_squared_error(y,y_poly_pred))
r2 = r2_score(y,y_poly_pred)
print(rmse)
print(r2)

#plot
plt.scatter(x, y, s=10)

# sort the values of x b[![enter image description here][1]][1]efore line plot
sort_axis = operator.itemgetter(0)
sorted_zip = sorted(zip(x,y_poly_pred), key=sort_axis)
x, y_poly_pred = zip(*sorted_zip)
plt.plot(x, y_poly_pred, color='m')
plt.show()

Python图

在此处输入图片说明

  ask by G_EXL_snake translate from so

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The problem is in using datetime64[ns] type on x-axis.

(问题是在x轴上使用datetime64[ns]类型。)

There is an issue on github about how datetime64[ns] is handled inside sklearn .

(github上一个问题,关于如何在sklearn处理datetime64[ns] 。)

The thing is datetime64[ns] features are scaled as features of the order of 101? in this case:

(在这种情况下, datetime64[ns]功能按比例缩放为101?左右。)

x_poly
Out[91]: 
array([[1.00000000e+00, 1.29911040e+18, 1.68768783e+36, 2.19249281e+54],
       [1.00000000e+00, 1.33617600e+18, 1.78536630e+36, 2.38556361e+54],
       [1.00000000e+00, 1.39129920e+18, 1.93571346e+36, 2.69315659e+54],
       [1.00000000e+00, 1.41566400e+18, 2.00410456e+36, 2.83713868e+54],
       [1.00000000e+00, 1.43354880e+18, 2.05506216e+36, 2.94603190e+54],
       [1.00000000e+00, 1.47061440e+18, 2.16270671e+36, 3.18050764e+54],
       [1.00000000e+00, 1.49670720e+18, 2.24013244e+36, 3.35282236e+54],
       [1.00000000e+00, 1.51476480e+18, 2.29451240e+36, 3.47564662e+54],
       [1.00000000e+00, 1.57610880e+18, 2.48411895e+36, 3.91524174e+54]])

The easiest way to handle it is to use StandardScaler or convert datetime using pd.to_numeric and scale it:

(处理它的最简单方法是使用StandardScaler或使用pd.to_numeric转换datetime并pd.to_numeric进行缩放:)

scaler = StandardScaler()
x_scaled = scaler.fit_transform(np.c_[data['Date']])

or simply

(或简单地)

x_scaled = np.c_[pd.to_numeric(data['Date'])] / 10e17  # convert and scale

That gives appropriately scaled features:

(这给出了适当缩放的功能:)

x_poly = polynomial_features.fit_transform(x_scaled)
x_poly
Out[94]: 
array([[1.        , 1.2991104 , 1.68768783, 2.19249281],
       [1.        , 1.336176  , 1.7853663 , 2.38556361],
       [1.        , 1.3912992 , 1.93571346, 2.69315659],
       [1.        , 1.415664  , 2.00410456, 2.83713868],
       [1.        , 1.4335488 , 2.05506216, 2.9460319 ],
       [1.        , 1.4706144 , 2.16270671, 3.18050764],
       [1.        , 1.4967072 , 2.24013244, 3.35282236],
       [1.        , 1.5147648 , 2.2945124 , 3.47564662],
       [1.        , 1.5761088 , 2.48411895, 3.91524174]])

The result will be looking like this afterwards:

(之后的结果将如下所示:)

在此处输入图片说明


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...