deep learning - How to train BERT from scratch on a new domain for both MLM and NSP?

Question

Welcome To Ask or Share your Answers For Others

deep learning - How to train BERT from scratch on a new domain for both MLM and NSP?

posted Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

deep learning - How to train BERT from scratch on a new domain for both MLM and NSP?

I’m trying to train BERT model from scratch using my own dataset using HuggingFace library. I would like to train the model in a way that it has the exact architecture of the original BERT model.

In the original paper, it stated that: “BERT is trained on two tasks: predicting randomly masked tokens (MLM) and predicting whether two sentences follow each other (NSP). SCIBERT follows the same architecture as BERT but is instead pretrained on scientific text.”

I’m trying to understand how to train the model on two tasks as above. At the moment, I initialised the model as below:

from transformers import BertForMaskedLM
model = BertForMaskedLM(config=config)

However, it would just be for MLM and not NSP. How can I initialize and train the model with NSP as well or maybe my original approach was fine as it is?

My assumptions would be either

Initialize with BertForPreTraining (for both MLM and NSP), OR
After finish training with BertForMaskedLM, initalize the same model and train again with BertForNextSentencePrediction (but this approach’s computation and resources would cost twice…)

I’m not sure which one is the correct way. Any insights or advice would be greatly appreciated.

question from:https://stackoverflow.com/questions/65646925/how-to-train-bert-from-scratch-on-a-new-domain-for-both-mlm-and-nsp

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-06T18:44:39+0000

I would suggest doing the following:

First pre-train BERT on the MLM objective. HuggingFace provides a script especially for training BERT on the MLM objective on your own data. You can find it here. As you can see in the run_mlm.py script, they use AutoModelForMaskedLM, and you can specify any architecture you want.
Second, if you really want to train on the next sentence prediction task (although shown that it's not really helpful in BERT's NLU capabilities), you can define a BertForPretraining model (which has both the MLM and NSP heads on top), then load in the weights from the model you trained in step 1, and then further pre-train it on a next sentence prediction task.

Categories

deep learning - How to train BERT from scratch on a new domain for both MLM and NSP?

deep learning - How to train BERT from scratch on a new domain for both MLM and NSP?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags