Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
304 views
in Technique[技术] by (71.8m points)

python - 如何在Python中将一个字符串附加到另一个字符串?(How do I append one string to another in Python?)

I want an efficient way to append one string to another in Python.

(我想要一种有效的方法在Python中将一个字符串附加到另一个字符串。)

var1 = "foo"
var2 = "bar"
var3 = var1 + var2

Is there any good built-in method to use?

(有没有什么好的内置方法可供使用?)

  ask by user469652 translate from so

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

If you only have one reference to a string and you concatenate another string to the end, CPython now special cases this and tries to extend the string in place.

(如果你只有一个字符串的引用,并且你将另一个字符串连接到结尾,CPython现在特殊情况,并尝试扩展字符串。)

The end result is that the operation is amortized O(n).

(最终结果是操作是摊销O(n)。)

eg

(例如)

s = ""
for i in range(n):
    s+=str(i)

used to be O(n^2), but now it is O(n).

(曾经是O(n ^ 2),但现在是O(n)。)

From the source (bytesobject.c):

(从源代码(bytesobject.c):)

void
PyBytes_ConcatAndDel(register PyObject **pv, register PyObject *w)
{
    PyBytes_Concat(pv, w);
    Py_XDECREF(w);
}


/* The following function breaks the notion that strings are immutable:
   it changes the size of a string.  We get away with this only if there
   is only one module referencing the object.  You can also think of it
   as creating a new string object and destroying the old one, only
   more efficiently.  In any case, don't use this if the string may
   already be known to some other part of the code...
   Note that if there's not enough memory to resize the string, the original
   string object at *pv is deallocated, *pv is set to NULL, an "out of
   memory" exception is set, and -1 is returned.  Else (on success) 0 is
   returned, and the value in *pv may or may not be the same as on input.
   As always, an extra byte is allocated for a trailing  byte (newsize
   does *not* include that), and a trailing  byte is stored.
*/

int
_PyBytes_Resize(PyObject **pv, Py_ssize_t newsize)
{
    register PyObject *v;
    register PyBytesObject *sv;
    v = *pv;
    if (!PyBytes_Check(v) || Py_REFCNT(v) != 1 || newsize < 0) {
        *pv = 0;
        Py_DECREF(v);
        PyErr_BadInternalCall();
        return -1;
    }
    /* XXX UNREF/NEWREF interface should be more symmetrical */
    _Py_DEC_REFTOTAL;
    _Py_ForgetReference(v);
    *pv = (PyObject *)
        PyObject_REALLOC((char *)v, PyBytesObject_SIZE + newsize);
    if (*pv == NULL) {
        PyObject_Del(v);
        PyErr_NoMemory();
        return -1;
    }
    _Py_NewReference(*pv);
    sv = (PyBytesObject *) *pv;
    Py_SIZE(sv) = newsize;
    sv->ob_sval[newsize] = '';
    sv->ob_shash = -1;          /* invalidate cached hash value */
    return 0;
}

It's easy enough to verify empirically.

(通过经验验证很容易。)

$ python -m timeit -s"s=''" "for i in xrange(10):s+='a'"
1000000 loops, best of 3: 1.85 usec per loop
$ python -m timeit -s"s=''" "for i in xrange(100):s+='a'"
10000 loops, best of 3: 16.8 usec per loop
$ python -m timeit -s"s=''" "for i in xrange(1000):s+='a'"
10000 loops, best of 3: 158 usec per loop
$ python -m timeit -s"s=''" "for i in xrange(10000):s+='a'"
1000 loops, best of 3: 1.71 msec per loop
$ python -m timeit -s"s=''" "for i in xrange(100000):s+='a'"
10 loops, best of 3: 14.6 msec per loop
$ python -m timeit -s"s=''" "for i in xrange(1000000):s+='a'"
10 loops, best of 3: 173 msec per loop

It's important however to note that this optimisation isn't part of the Python spec.

(需要注意的是这种优化是不是Python的规范的一部分也挺重要 。)

It's only in the cPython implementation as far as I know.

(据我所知,这只是在cPython实现中。)

The same empirical testing on pypy or jython for example might show the older O(n**2) performance .

(例如,对于pypy或jython的相同经验测试可能会显示较旧的O(n ** 2)性能。)

$ pypy -m timeit -s"s=''" "for i in xrange(10):s+='a'"
10000 loops, best of 3: 90.8 usec per loop
$ pypy -m timeit -s"s=''" "for i in xrange(100):s+='a'"
1000 loops, best of 3: 896 usec per loop
$ pypy -m timeit -s"s=''" "for i in xrange(1000):s+='a'"
100 loops, best of 3: 9.03 msec per loop
$ pypy -m timeit -s"s=''" "for i in xrange(10000):s+='a'"
10 loops, best of 3: 89.5 msec per loop

So far so good, but then,

(到目前为止很好,但是,)

$ pypy -m timeit -s"s=''" "for i in xrange(100000):s+='a'"
10 loops, best of 3: 12.8 sec per loop

ouch even worse than quadratic.

(哎哟比二次更糟糕。)

So pypy is doing something that works well with short strings, but performs poorly for larger strings.

(因此pypy正在做一些适用于短字符串的东西,但对于较大的字符串表现不佳。)


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...