I am training a model with Keras which constitutes of a Huggingface RoBERTa model as a backbone with a downstream task of span prediction and binary prediction for text.
I have been training the model regularly with datasets which are under 2 Gb in size, which has worked fine. The dataset has grown in size in recent weeks and now recently, it has gotten to around 2.3 Gb in size which makes it over the 2 Gb google protobuf hard limit. This makes it impossible to train the model with keras with numpy tensors without a generator on TPUs as tensorflow uses google protobuf to buffer the tensors for the TPUs, and trying to serve all the data without a generator fails. If I use a dataset under 2 Gb in size, everything works fine. TPUs don't support Keras generators yet, so I was looking into using the tf.data.Dataset api instead.
After seeing this question I adopted code from this gist trying to get this to work, resulting in the following code:
def tfdata_generator(x, y, is_training, batch_size=384):
dataset = tf.data.Dataset.from_tensor_slices((x, y))
if is_training:
dataset = dataset.shuffle(1000)
dataset = dataset.map(map_fn)
dataset = dataset.batch(batch_size)
dataset = dataset.repeat()
dataset = dataset.prefetch(tf.data.experimental.AUTOTUNE)
return dataset
The model is created and compiled for TPU use as before which has never caused any problems and then I create the generators and call the fit function:
train_gen = tfdata_generator(x_train, y_train, is_training=True)
model.fit(
train_gen,
steps_per_epoch=10000,
epochs=1,
)
This results in the following error:
FetchOutputs node : not found [Op:AutoShardDataset]
edit: Colab with bare minimum code and a dummy dataset - unfortunately, b/c of Colab RAM restrictions, building a dummy dataset exceeding 2 Gb in size crashes the notebook. But still, displays code that runs and works on CPU/TPU with a smaller dataset.
This code does however work on a CPU. I can't find any further information on this error online and haven't been able to find more detailed information on how to use TPUs with Keras servicing training data using generators. Have looked into tfrecords a bit but also find documentation on TPUs missing. All help appreciated!
question from:
https://stackoverflow.com/questions/65835572/using-tf-data-dataset-with-keras-on-a-tpu 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…