Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
582 views
in Technique[技术] by (71.8m points)

python - How to sort a numpy array with key as isnan?

I have a numpy array like

np.array([[1.0, np.nan, 5.0, 1, True, True, np.nan, True],
       [np.nan, 4.0, 7.0, 2, True, np.nan, False, True],
       [2.0, 5.0, np.nan, 3, False, False, True, np.nan]], dtype=object)

Now I want to sort the values with key as isnan? How can I do that? So that I would end up in the array

np.array([[1.0, 5.0, 1, True, True, True, np.nan, np.nan],
   [4.0, 7.0, 2, True, False, True, np.nan, np.nan],
   [2.0, 5.0, 3, False, False, True, np.nan, np.nan]], dtype=object)

np.sort() didn't work. The same can be achieved in pandas by applying sorted over columns with sorted function with key as pd.isnull(), but looking for a numpy answer for speed.

In pandas

data = pd.DataFrame({'Key': [1, 2, 3], 'Var': [True, True, False], 'ID_1':[1, np.NaN, 2],
                'Var_1': [True, np.NaN, False], 'ID_2': [np.NaN, 4, 5], 'Var_2': [np.NaN, False, True],
                'ID_3': [5, 7, np.NaN], 'Var_3': [True, True, np.NaN]})

data.apply(lambda x : sorted(x,key=pd.isnull),1).values 

Output :

array([[1.0, 5.0, 1, True, True, True, nan, nan],
   [4.0, 7.0, 2, True, False, True, nan, nan],
   [2.0, 5.0, 3, False, False, True, nan, nan]], dtype=object)
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Approach #1

Here's a vectorized approach borrowing the concept of masking from this post -

def mask_app(a):
    out = np.empty_like(a)
    mask = np.isnan(a.astype(float))
    mask_sorted = np.sort(mask,1)
    out[mask_sorted] = a[mask]
    out[~mask_sorted] = a[~mask]
    return out

Sample run -

# Input dataframe
In [114]: data
Out[114]: 
   ID_1  ID_2  ID_3  Key    Var  Var_1  Var_2 Var_3
0   1.0   NaN   5.0    1   True   True    NaN  True
1   NaN   4.0   7.0    2   True    NaN  False  True
2   2.0   5.0   NaN    3  False  False   True   NaN

# Use pandas approach for verification    
In [115]: data.apply(lambda x : sorted(x,key=pd.isnull),1).values
Out[115]: 
array([[1.0, 5.0, 1, True, True, True, nan, nan],
       [4.0, 7.0, 2, True, False, True, nan, nan],
       [2.0, 5.0, 3, False, False, True, nan, nan]], dtype=object)

# Use proposed approach and verify
In [116]: mask_app(data.values)
Out[116]: 
array([[1.0, 5.0, 1, True, True, True, nan, nan],
       [4.0, 7.0, 2, True, False, True, nan, nan],
       [2.0, 5.0, 3, False, False, True, nan, nan]], dtype=object)

Approach #2

With few more modifications, a simplified version with the idea from this post -

def mask_app2(a):
    out = np.full(a.shape,np.nan,dtype=a.dtype)
    mask = ~np.isnan(a.astype(float))
    out[np.sort(mask,1)[:,::-1]] = a[mask]
    return out

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...