Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
721 views
in Technique[技术] by (71.8m points)

python - Pandas read excel returning type object when time is 00:00

In more recent versions of Pandas (I am using 1.2.3) when reading times from an excel file, there is a problem when the time is 00:00:00. Below script, where filepath is the route to my excel file, which contains a column with a header named 'Time'.

import pandas as pd

df = pd.read_excel(filepath)
print(df['Time'])

Output:

0                20:00:00
1                22:00:00
2                23:00:00
3     1899-12-30 00:00:00
4                02:00:00
5                02:45:00
6                03:30:00
7                04:00:00
8                04:45:00
9                05:30:00
10               07:00:00
11               08:00:00
12               08:45:00
13               09:30:00
14               10:30:00
15               10:45:00
16               11:45:00
17               12:30:00
18               13:15:00
19               14:00:00
20               14:45:00
21               15:45:00
22               23:00:00
23    1899-12-30 00:00:00

This was not the case in version 1.0.5.

Is there a way to read in these times correctly, without the date on rows 3 and 23 above?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I can reproduce this behavior (pandas 1.2.3); it leaves you with a mix of datetime.datetime and datetime.time objects in the 'time' column.


One way around can be to import the time column as type string; you can explicitly specify that like

df = pd.read_excel(path_to_your_excelfile, dtype={'Time': str})

which will give you "excel day zero" prefixed to some entries. You can remove them by split on space and then taking the last element of the split result:

df['Time'].str.split(' ').str[-1]

Now you can proceed by converting string to datetime, timedelta etc. - whatever makes sense in your context.


Another way to handle this can be to specify that pandas should parse this column to datetime; like

df = pd.read_excel(path_to_your_excelfile, parse_dates=['Time'])

Then, you'll have pandas' datetime, with either today's date or "excel day zero":

df['Time']

0    2021-03-04 20:00:00
1    2021-03-04 22:00:00
2    2021-03-04 23:00:00
3    1899-12-30 00:00:00
4    2021-03-04 02:00:00
...
23   1899-12-30 00:00:00
Name: Time, dtype: datetime64[ns]

Now you have some options, depending on what you intend to do further with the data. You could just ignore the date, or strip it (df['Time'].dt.time), or parse to string (df['Time'].dt.strftime('%H:%M:%S')) etc.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...