python - Using tf.data.Dataset with Keras on a TPU

Question

Welcome To Ask or Share your Answers For Others

python - Using tf.data.Dataset with Keras on a TPU

posted Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - Using tf.data.Dataset with Keras on a TPU

I am training a model with Keras which constitutes of a Huggingface RoBERTa model as a backbone with a downstream task of span prediction and binary prediction for text.

I have been training the model regularly with datasets which are under 2 Gb in size, which has worked fine. The dataset has grown in size in recent weeks and now recently, it has gotten to around 2.3 Gb in size which makes it over the 2 Gb google protobuf hard limit. This makes it impossible to train the model with keras with numpy tensors without a generator on TPUs as tensorflow uses google protobuf to buffer the tensors for the TPUs, and trying to serve all the data without a generator fails. If I use a dataset under 2 Gb in size, everything works fine. TPUs don't support Keras generators yet, so I was looking into using the tf.data.Dataset api instead.

After seeing this question I adopted code from this gist trying to get this to work, resulting in the following code:

def tfdata_generator(x, y, is_training, batch_size=384):

    dataset = tf.data.Dataset.from_tensor_slices((x, y))

    if is_training:
        dataset = dataset.shuffle(1000)
    dataset = dataset.map(map_fn)
    dataset = dataset.batch(batch_size)
    dataset = dataset.repeat()
    dataset = dataset.prefetch(tf.data.experimental.AUTOTUNE)

    return dataset

The model is created and compiled for TPU use as before which has never caused any problems and then I create the generators and call the fit function:

train_gen = tfdata_generator(x_train, y_train, is_training=True)

model.fit(
  train_gen,
  steps_per_epoch=10000,
  epochs=1,
)

This results in the following error:

FetchOutputs node : not found [Op:AutoShardDataset]

edit: Colab with bare minimum code and a dummy dataset - unfortunately, b/c of Colab RAM restrictions, building a dummy dataset exceeding 2 Gb in size crashes the notebook. But still, displays code that runs and works on CPU/TPU with a smaller dataset.

This code does however work on a CPU. I can't find any further information on this error online and haven't been able to find more detailed information on how to use TPUs with Keras servicing training data using generators. Have looked into tfrecords a bit but also find documentation on TPUs missing. All help appreciated!

question from:https://stackoverflow.com/questions/65835572/using-tf-data-dataset-with-keras-on-a-tpu

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-06T19:35:16+0000

For numpy tensors, 2GB seams to a hard limit for TPU training (as of now). I see 2 workarounds that you could use.

Write your tf.data to a gs bucket as TFRecord/CSV using TFRecordWriter and let the TPU use training data from that bucket.
Use tf.data service, for your input pipeline. It's a relatively new service that let's you run your data pipeline on separate workers. For details on how to run please see running_the_tfdata_service.

Categories

python - Using tf.data.Dataset with Keras on a TPU

python - Using tf.data.Dataset with Keras on a TPU

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags