I'm following this tutorial (https://mccormickml.com/2019/07/22/BERT-fine-tuning/#a1-saving--loading-fine-tuned-model) in order to fine-tune a BertForSequenceClassification. After I train the model, I want to load this model to code a function "classify_sentence(sentence)": it takes a sentence and return a logit vector of predictions.
def classify_sentence(self, sentence):
self.model = BertForSequenceClassification.from_pretrained(output_dir)
self.tokenizer = BertTokenizer.from_pretrained(output_dir)
encoded_dict = self.tokenizer.encode_plus(
sentence, # Sentence to encode.
add_special_tokens = True, # Add '[CLS]' and '[SEP]'
max_length = 64, # Pad & truncate all sentences.
pad_to_max_length = True,
return_attention_mask = True, # Construct attn. masks.
return_tensors = 'pt', # Return pytorch tensors.
)
# Add the encoded sentence to the list.
input_id = encoded_dict['input_ids']
# And its attention mask (simply differentiates padding from non-padding).
attention_mask = encoded_dict['attention_mask']
input_id = torch.cat(input_id, dim=0)
attention_mask = torch.cat(attention_mask, dim=0)
with torch.no_grad():
output = self.model(input_id,
token_type_ids=None,
attention_mask=attention_mask
)
logits = outputs[0]
return logits
output_dir is a directory that contains these files: config.json, pytorch_model.bin, special_tokens_map.json, tokenizer_config.json and vocab.txt.
When I run this function, I get an error:
AttributeError: 'BertTokenizer' object has no attribute 'encode_plus'
However I used this method to encode the sentence during the train. Is there any alternative way to tokenize a sentence after load a trained BERT model?
question from:
https://stackoverflow.com/questions/65846926/how-to-load-a-fine-tuned-model-from-bertforsequenceclassification-and-use-it-to 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…