Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.1k views
in Technique[技术] by (71.8m points)

performance - Efficiently counting number of unique elements - NumPy / Python

When running np.unique(), it first flattens the array, sorts the array, then finds the unique values. When I have arrays with shape (10, 3000, 3000), it takes about a second to find the uniques, but this quickly adds up as I need to call np.unique() multiple times. Since I only care about the total number of unique numbers in an array, sorting seems like a waste of time.

Is there a faster method of find the total number of unique values in a large array other than np.unique()?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Here's a method that works for an array with dtype np.uint8 that is faster than np.unique.

First, create an array to work with:

In [128]: a = np.random.randint(1, 128, size=(10, 3000, 3000)).astype(np.uint8)

For later comparison, find the unique values using np.unique:

In [129]: u = np.unique(a)

Here's the faster method; v will contain the result:

In [130]: q = np.zeros(256, dtype=int)

In [131]: q[a.ravel()] = 1

In [132]: v = np.nonzero(q)[0]

Verify that we got the same result:

In [133]: np.array_equal(u, v)
Out[133]: True

Timing:

In [134]: %timeit u = np.unique(a)
2.86 s ± 9.02 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [135]: %timeit q = np.zeros(256, dtype=int); q[a.ravel()] = 1; v = np.nonzero(q)
300 ms ± 5.52 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

So 2.86 seconds for np.unique(), and 0.3 seconds for the alternative method.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...