Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
264 views
in Technique[技术] by (71.8m points)

pandas - An easy way to calculate time intervals between dates in a column in Python

Suppose I have a Pandas DataFrame like this:

 item   event      date 
  A       1     2020-03-09
  B       1     2020-03-09
  A       2     2020-05-01
  B       2     2020-05-01
  C       2     2020-05-01
  A       3     2020-06-25
  C       3     2020-06-25
  B       4     2020-07-18
  C       4     2020-07-18

This dataframe contains a unique date per 'event' per 'item'. So this means that an item has several events with distinct dates.

Now I would like to calculate per item the average amount of days between the dates. So this will be different values for each item and it thus requires me to calculate the average of the time between the dates per event per item.

So the expected output would look like:

  item   average_interval_in_days
    A              54
    B              65.5
    C              39.5 

Anyone an idea how to do this?

question from:https://stackoverflow.com/questions/65901247/an-easy-way-to-calculate-time-intervals-between-dates-in-a-column-in-python

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Very similar to @BradSolomon's answer, with two small differences:

df.sort_values(['item', 'date']).groupby('item')['date'].agg(
    lambda g: g.diff().mean() / pd.Timedelta(days=1))

# gives:
item
A    54.0
B    65.5
C    39.0

Notes:

  1. ensure that dates are sorted within each group, otherwise the mean will depend on the order; in your example, the dates happen to be sorted, so if you can guarantee it, you may skip .sort_values();
  2. use ... / pd.Timedelta(days=1) to produce directly the mean difference in units of days.

Alternative for speed (no sort, no lambda, but a bit more opaque)

gb = df.groupby('item')['date']
(gb.max() - gb.min()) / (gb.count() - 1) / pd.Timedelta(days=1)

# gives:
item
A    54.0
B    65.5
C    39.0

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...