I'd like to represent an arbitrary list as two other lists.
(我想将任意列表表示为另外两个列表。)
The first, call it values
, containing the unique elements in the original list, and the second, call it codes
, containing the index in values
of each element in the original list, in such a way that the original list could be reconstructed as (第一个称为它的values
,包含原始列表中的唯一元素,第二个称为它的codes
,包含原始列表中每个元素的values
的索引,以这种方式可以将原始列表重建为)
orig_list = [values[c] for c in codes]
(Note: this is similar to how pandas.Categorical
represents series)
((注意:这与pandas.Categorical
代表系列的方式类似))
I've created the function below to do this decomposition:
(我创建了下面的函数来进行分解:)
def decompose(x):
values = sorted(list(set(x)))
codes = [0 for _ in x]
for i, value in enumerate(values):
codes = [i if elem == value else code for elem, code in zip(x, codes)]
return values, codes
This works, but I would like to know if there is a better/more efficient way of achieving this (no double loop?), or if there's something in the standard library that could do this for me.
(这行得通,但是我想知道是否有更好/更有效的方法来实现这一点(没有双循环?),或者标准库中是否有可以为我做到这一点的东西。)
Update :
(更新 :)
The answers below are great and a big improvement to my function.
(以下答案对我的功能有很大的改善。)
I've timed all that worked as intended: (我已经按照预期的时间进行了计时:)
test_list = [random.randint(1, 10) for _ in range(10000)]
functions = [decompose, decompose_boris1, decompose_boris2,
decompose_alexander, decompose_stuart1, decompose_stuart2,
decompose_dan1]
for f in functions:
print("-- " + f.__name__)
# test
values, codes = f(test_list)
decoded_list = [values[c] for c in codes]
if decoded_list == test_list:
print("Test passed")
%timeit f(test_list)
else:
print("Test failed")
Results:
(结果:)
-- decompose
Test passed
12.4 ms ± 269 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
-- decompose_boris1
Test passed
1.69 ms ± 21.9 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
-- decompose_boris2
Test passed
1.63 ms ± 18.6 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
-- decompose_alexander
Test passed
681 μs ± 2.15 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
-- decompose_stuart1
Test passed
1.7 ms ± 3.42 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
-- decompose_stuart2
Test passed
682 μs ± 5.98 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
-- decompose_dan1
Test passed
896 μs ± 19.5 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
I'm accepting Stuart's answer for being the simplest and one of the fastest.
(我接受Stuart的回答,因为它是最简单也是最快的之一。)
ask by foglerit translate from so