python - 将Python列表编码为唯一值的索引(Encode Python lists as indexes of unique values)

Question

Welcome To Ask or Share your Answers For Others

python - 将Python列表编码为唯一值的索引(Encode Python lists as indexes of unique values)

posted Feb 21, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - 将Python列表编码为唯一值的索引(Encode Python lists as indexes of unique values)

I'd like to represent an arbitrary list as two other lists.

(我想将任意列表表示为另外两个列表。)

The first, call it values , containing the unique elements in the original list, and the second, call it codes , containing the index in values of each element in the original list, in such a way that the original list could be reconstructed as

(第一个称为它的values ，包含原始列表中的唯一元素，第二个称为它的codes ，包含原始列表中每个元素的values的索引，以这种方式可以将原始列表重建为)

orig_list = [values[c] for c in codes]

(Note: this is similar to how pandas.Categorical represents series)

(（注意：这与pandas.Categorical代表系列的方式类似）)

I've created the function below to do this decomposition:

(我创建了下面的函数来进行分解：)

def decompose(x):
    values = sorted(list(set(x)))
    codes = [0 for _ in x]
    for i, value in enumerate(values):
        codes = [i if elem == value else code for elem, code in zip(x, codes)]
    return values, codes

This works, but I would like to know if there is a better/more efficient way of achieving this (no double loop?), or if there's something in the standard library that could do this for me.

(这行得通，但是我想知道是否有更好/更有效的方法来实现这一点（没有双循环？），或者标准库中是否有可以为我做到这一点的东西。)

Update :

(更新：)

The answers below are great and a big improvement to my function.

(以下答案对我的功能有很大的改善。)

I've timed all that worked as intended:

(我已经按照预期的时间进行了计时：)

test_list = [random.randint(1, 10) for _ in range(10000)]
functions = [decompose, decompose_boris1, decompose_boris2,
             decompose_alexander, decompose_stuart1, decompose_stuart2,
             decompose_dan1]
for f in functions:
    print("-- " + f.__name__)
    # test
    values, codes = f(test_list)
    decoded_list = [values[c] for c in codes]
    if decoded_list == test_list:
        print("Test passed")
        %timeit f(test_list)
    else:
        print("Test failed")

Results:

(结果：)

-- decompose
Test passed
12.4 ms ± 269 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
-- decompose_boris1
Test passed
1.69 ms ± 21.9 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
-- decompose_boris2
Test passed
1.63 ms ± 18.6 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
-- decompose_alexander
Test passed
681 μs ± 2.15 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
-- decompose_stuart1
Test passed
1.7 ms ± 3.42 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
-- decompose_stuart2
Test passed
682 μs ± 5.98 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
-- decompose_dan1
Test passed
896 μs ± 19.5 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

I'm accepting Stuart's answer for being the simplest and one of the fastest.

(我接受Stuart的回答，因为它是最简单也是最快的之一。)

ask by foglerit translate from so

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-02-21T04:09:39+0000

I would think that this would be more efficient than your code, despite having to use index to look up each value in x , and I suspect the fastest way for most x without using numpy or pandas :

(我认为这将比您的代码更高效，尽管必须使用index来查找x每个值，并且我怀疑对于大多数x ，最快的方法是不使用numpy或pandas ：)

def decompose(x):
    values = sorted(set(x))
    return values, [values.index(v) for v in x]

Representing values as a dictionary might bring some extra speed if needed.

(如果需要，将values表示为字典可能会带来一些额外的速度。)

def decompose(x):
    values = sorted(set(x))
    d = {value: index for index, value in enumerate(values)}
    return values, [d[v] for v in x]

Categories

python - 将Python列表编码为唯一值的索引(Encode Python lists as indexes of unique values)

python - 将Python列表编码为唯一值的索引(Encode Python lists as indexes of unique values)

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags