python - How to load a fine tuned model from BertForSequenceClassification and use it to tokenize a sentence?

Question

Welcome To Ask or Share your Answers For Others

python - How to load a fine tuned model from BertForSequenceClassification and use it to tokenize a sentence?

posted Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - How to load a fine tuned model from BertForSequenceClassification and use it to tokenize a sentence?

I'm following this tutorial (https://mccormickml.com/2019/07/22/BERT-fine-tuning/#a1-saving--loading-fine-tuned-model) in order to fine-tune a BertForSequenceClassification. After I train the model, I want to load this model to code a function "classify_sentence(sentence)": it takes a sentence and return a logit vector of predictions.

def classify_sentence(self, sentence):


    self.model = BertForSequenceClassification.from_pretrained(output_dir)
    self.tokenizer = BertTokenizer.from_pretrained(output_dir)

    encoded_dict = self.tokenizer.encode_plus(
                sentence,                      # Sentence to encode.
                add_special_tokens = True, # Add '[CLS]' and '[SEP]'
                max_length = 64,           # Pad & truncate all sentences.
                pad_to_max_length = True,
                return_attention_mask = True,   # Construct attn. masks.
                return_tensors = 'pt',     # Return pytorch tensors.
    )

    # Add the encoded sentence to the list.    
    input_id = encoded_dict['input_ids']
    # And its attention mask (simply differentiates padding from non-padding).
    attention_mask = encoded_dict['attention_mask']
    
    input_id = torch.cat(input_id, dim=0)
    attention_mask = torch.cat(attention_mask, dim=0)

    with torch.no_grad():

        output = self.model(input_id, 
        token_type_ids=None, 
        attention_mask=attention_mask
        )

    logits = outputs[0]

    return logits

output_dir is a directory that contains these files: config.json, pytorch_model.bin, special_tokens_map.json, tokenizer_config.json and vocab.txt.

When I run this function, I get an error:

AttributeError: 'BertTokenizer' object has no attribute 'encode_plus'

However I used this method to encode the sentence during the train. Is there any alternative way to tokenize a sentence after load a trained BERT model?

question from:https://stackoverflow.com/questions/65846926/how-to-load-a-fine-tuned-model-from-bertforsequenceclassification-and-use-it-to

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

Categories

python - How to load a fine tuned model from BertForSequenceClassification and use it to tokenize a sentence?

python - How to load a fine tuned model from BertForSequenceClassification and use it to tokenize a sentence?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags