Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
257 views
in Technique[技术] by (71.8m points)

TensorFlow: slow performance when getting gradients at inputs

I'm building a simple multilayer perceptron with TensorFlow, and I also need to obtain the gradients (or error signal) of the loss at the neural network's inputs.

Here's my code, which works:

cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(self.network, self.y))
optimizer = tf.train.AdagradOptimizer(learning_rate=nn_learning_rate).minimize(cost)
...
for i in range(epochs):
    ....
    for batch in batches:
        ...
        sess.run(optimizer, feed_dict=feed_dict)
        grads_wrt_input = sess.run(tf.gradients(cost, self.x), feed_dict=feed_dict)[0]

(edited to include training loop)

Without the last line (grads_wrt_input...), this runs really fast on a CUDA machine. However, tf.gradients() reduces performance greatly by tenfold or more.

I recall that the error signals at the nodes are computed as intermediate values in the backpropagation algorithm, and I have successfully done this using the Java library DeepLearning4j. I was also under the impression that this would be a slight modification to the computation graph already built by optimizer.

How can this be made faster, or is there any other way to compute the gradients of the loss w.r.t. the inputs?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The tf.gradients() function builds a new backpropagation graph each time it is called, so the reason for the slowdown is that TensorFlow has to parse a new graph on each iteration of the loop. (This can be surprisingly expensive: the current version of TensorFlow is optimized for executing the same graph a large number of times.)

Fortunately the solution is easy: just compute the gradients once, outside the loop. You can restructure your code as follows:

cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(self.network, self.y))
optimizer = tf.train.AdagradOptimizer(learning_rate=nn_learning_rate).minimize(cost)
grads_wrt_input_tensor = tf.gradients(cost, self.x)[0]
# ...
for i in range(epochs):
    # ...
    for batch in batches:
        # ...
        _, grads_wrt_input = sess.run([optimizer, grads_wrt_input_tensor],
                                      feed_dict=feed_dict)

Note that, for performance, I also combined the two sess.run() calls. This ensures that the forward propagation, and much of the backpropagation, will be reused.


As an aside, one tip to find performance bugs like this is to call tf.get_default_graph().finalize() before starting your training loop. This will raise an exception if you inadvertantly add any nodes to the graph, which makes it easier to trace the cause of these bugs.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...