I have several timeseries arranged in a data frame, similar to below:
category value time_idx
0 810 0.118794 0
1 830 0.552947 0
2 1120 0.133193 0
3 1370 0.840183 0
4 810 0.129385 1
... ... ... ...
6095 1370 0.157391 1523
6096 810 0.141377 1524
6097 830 0.212254 1524
6098 1120 0.069970 1524
6099 1370 0.134947 1524
Some values are NaN. What I would like is to replace any NaN values that are NOT repeated with 0, as I am assuming the value is 0 for that category at that time. However, any time that every single category has a value of NaN at that same time (i.e. at the same time_idx), then I want to replace every value with -1.
Just replacing the NaNs with a value is of course trivial in Pandas, but the added complexity of specifically replacing NaNs that are NaN for every category at a given time has stumped me. I know I can just loop through the time indices, but my actual datasets will have over 900 categories, so I would like to find a more efficient Pandas-esque method.
The only thing I could think of was list comprehension, which I don't think is even necessarily more efficient than an explicit loop anyway, plus I couldn't come up with one that worked properly.
I know that I can just replace all NaNs like so:
data["value"] = data["value"].replace(np.nan, 0)
but I'm not sure how to implement this in my case, where I only want to replace long NaNs with 0. This is the loop I have currently:
num_channels = data["category"].nunique()
nan_vals = data[lambda x: np.isnan(x.value)]
nan_times = nan_vals["time_idx"]
for time in nan_times:
if nan_vals[lambda x: x.time_idx == time]["category"].nunique() < num_channels:
# Set 0 for every channel that has nan at time t
index = nan_vals[lambda x: x.time_idx == time].index
data.loc[index, ["value"]] = data.loc[index, "value"].replace(np.nan, 0)
else:
index = nan_vals[lambda x: x.time_idx == time].index
data.loc[index, ["value"]] = data[lambda x: x.time_idx == time]["value"].replace(np.nan, -1)
Any ideas are appreciated.
Here is an example:
given the following data frame:
category value time_idx
0 810 NaN 0
1 830 NaN 0
2 1120 NaN 0
3 1370 NaN 0
4 810 0.129385 1
5 830 NaN 1
6 1120 0.144378 1
7 1370 NaN 1
8 810 0.124334 2
9 830 0.487274 2
10 1120 0.119153 2
11 1370 0.871687 2
I would like this output:
category value time_idx
0 810 -1.000000 0
1 830 -1.000000 0
2 1120 -1.000000 0
3 1370 -1.000000 0
4 810 0.129385 1
5 830 0.000000 1
6 1120 0.144378 1
7 1370 0.000000 1
8 810 0.124334 2
9 830 0.487274 2
10 1120 0.119153 2
11 1370 0.871687 2
In this example, at time = 0 every category's value was NaN, so they would be replaced with -1. At time = 1, there were non-NaN values, so any NaN values present (category 830 and 1370) would be replaced with 0.
question from:
https://stackoverflow.com/questions/65939149/how-to-replace-repeated-nans-with-a-different-value-from-lone-nans-from-pandas-d