I’m trying to train BERT model from scratch using my own dataset using HuggingFace library. I would like to train the model in a way that it has the exact architecture of the original BERT model.
In the original paper, it stated that: “BERT is trained on two tasks: predicting randomly masked tokens (MLM) and predicting whether two sentences follow each other (NSP). SCIBERT follows the same architecture as BERT but is instead pretrained on scientific text.”
I’m trying to understand how to train the model on two tasks as above. At the moment, I initialised the model as below:
from transformers import BertForMaskedLM
model = BertForMaskedLM(config=config)
However, it would just be for MLM and not NSP. How can I initialize and train the model with NSP as well or maybe my original approach was fine as it is?
My assumptions would be either
Initialize with BertForPreTraining
(for both MLM and NSP), OR
After finish training with BertForMaskedLM
,
initalize the same model and train again with
BertForNextSentencePrediction
(but this approach’s computation and
resources would cost twice…)
I’m not sure which one is the correct way. Any insights or advice would be greatly appreciated.
question from:
https://stackoverflow.com/questions/65646925/how-to-train-bert-from-scratch-on-a-new-domain-for-both-mlm-and-nsp 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…