python - Finding the top n values in a row of a scipy sparse matrix

Question

Welcome To Ask or Share your Answers For Others

python - Finding the top n values in a row of a scipy sparse matrix

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Finding the top n values in a row of a scipy sparse matrix

I have a scipy sparse matrix in CSR format. It's 72665x72665 so it's impractical to convert this matrix to a dense matrix to perform operations on (the dense representation of this matrix is like 40 gigs). The matrix is symmetric, and has about 82 million non-zero entries (~1.5%).

What I would like to be able to do is, for each row, I want to get the indices of the largest N values. If this were a numpy array, I would use np.argpartition to do it like so:

    for row in matrix:
        top_n_idx = np.argpartition(row,-n)[-n:]

Is there something similar to this I can do for a sparse matrix?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T20:07:11+0000

Improve from @Paul Panzer's solution. Now it can handle the case when any row has less than n values.

def top_n_idx_sparse(matrix, n):
    '''Return index of top n values in each row of a sparse matrix'''
    top_n_idx = []
    for le, ri in zip(matrix.indptr[:-1], matrix.indptr[1:]):
        n_row_pick = min(n, ri - le)
        top_n_idx.append(matrix.indices[le + np.argpartition(matrix.data[le:ri], -n_row_pick)[-n_row_pick:]])
    return top_n_idx

Categories

python - Finding the top n values in a row of a scipy sparse matrix

python - Finding the top n values in a row of a scipy sparse matrix

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags