Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
265 views
in Technique[技术] by (71.8m points)

python - Tensorflow element wise gradient slower when using tf.function

I need to compute an elementwise gradient as input for a neural network. I decided to use a tf.data.Dataset to store the input data. Preprocessing the data and computing the gradients is expensive and I want to do it in batches and then store the data.

I simplified the function to work with the shape (batch_size, x, y) and I want to compute the gradient for every y seperatly.

Using tf.GradientTape this looks as follows:

import tensoflow as tf

# @tf.function
def test(inp):
    with tf.GradientTape(persistent=True) as tape:
        tape.watch(inp)
        out = inp**2
        out = tf.unstack(out, axis=-1)
    grad = []
    for x in out:
        grad.append(tape.gradient(x, inp))
    del tape
    return tf.stack(grad, axis=-1)

inp = tf.random.normal((32, 100, 50))
test(inp)

This code takes ~ 76 ms to execute without and 3.1 s with the tf.function decorator. Unfortunatley the same slow down occurs, when using it with tf.data.Dataset.map which I assume converts it to a tf.function

I tried using tf.batch_jacobian instead, which does not suffer from the tf.function slow down but computes way more gradients and I have to reduce them. It takes around 15 s to execute.

@tf.function
def test(inp):
    with tf.GradientTape() as tape:
        tape.watch(inp)
        out = inp**2
    grad = tape.batch_jacobian(out, inp)
    return tf.math.reduce_sum(grad, axis=3)
x  = test(inp)

For the larger dataset and more ressource heavy computations I try to avoid such a slowdown, but I haven't found a solution yet, neither do I understand, why it computes so much slower. Is there a way to reshape the data and use the jacobian method or other things, that can overcome this issue?

question from:https://stackoverflow.com/questions/65846305/tensorflow-element-wise-gradient-slower-when-using-tf-function

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

So lets do a quick experiment with IPython's %timeit. I defined two functions, one with the tf.function decorator, one without :

import tensorflow as tf

def test_no_tracing(inp):
    with tf.GradientTape(persistent=True) as tape:
        tape.watch(inp)
        out = inp**2
        out = tf.unstack(out, axis=-1)
    grad = []
    for x in out:
        grad.append(tape.gradient(x, inp))
    del tape
    return tf.stack(grad, axis=-1)

@tf.function
def test_tracing(inp):
    print("Tracing")
    with tf.GradientTape(persistent=True) as tape:
        tape.watch(inp)
        out = inp**2
        out = tf.unstack(out, axis=-1)
    grad = []
    for x in out:
        grad.append(tape.gradient(x, inp))
    del tape
    return tf.stack(grad, axis=-1)

inp = tf.random.normal((32, 100, 50))

Let's see the results :

With the tf.function decorator:

In [2]: %timeit test_tracing(inp)
Tracing
2021-01-22 15:22:15.003262: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2021-01-22 15:22:15.076448: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 2599990000 Hz
10.3 ms ± 579 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)

And without :

In [3]: %timeit test_no_tracing(inp)
71.7 ms ± 1.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

The function decorated with tf.function is roughly 7times faster. It can seem slower if you run the function only one time, because the decorated function has the overhead of the tracing, converting the code into a graph. Once that tracing is done, the code is much quicker.

This can be verified by running the function only one time, when it has not been traced already. We can do that by telling %timeit to do only one loop and one repetition:

In [2]: %timeit -r 1 -n 1 test_tracing(inp)
Tracing
2021-01-22 15:29:47.189850: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2021-01-22 15:29:47.284413: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 2599990000 Hz
4.97 s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)

Here, the time is much bigger, closer to what you report in your question. But once this is done, the traced function is a lot quicker! Let's do it again:

In [3]: %timeit -r 1 -n 1 test_tracing(inp)
29.1 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)

You can read more about how achieving better performances with tf.function in the guide: Better performance with tf.function


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...