在线时间:8:00-16:00
迪恩网络APP
随时随地掌握行业动态
扫描二维码
关注迪恩网络微信公众号
开源软件名称(OpenSource Name):codertimo/BERT-pytorch开源软件地址(OpenSource Url):https://github.com/codertimo/BERT-pytorch开源编程语言(OpenSource Language):Python 99.8%开源软件介绍(OpenSource Introduction):BERT-pytorchPytorch implementation of Google AI's 2018 BERT, with simple annotation
IntroductionGoogle AI's BERT paper shows the amazing result on various NLP task (new 17 NLP tasks SOTA), including outperform the human F1 score on SQuAD v1.1 QA task. This paper proved that Transformer(self-attention) based encoder can be powerfully used as alternative of previous language model with proper language model training method. And more importantly, they showed us that this pre-trained language model can be transfer into any NLP task without making task specific model architecture. This amazing result would be record in NLP history, and I expect many further papers about BERT will be published very soon. This repo is implementation of BERT. Code is very simple and easy to understand fastly. Some of these codes are based on The Annotated Transformer Currently this project is working on progress. And the code is not verified yet. Installation
QuickstartNOTICE : Your corpus should be prepared with two sentences in one line with tab(\t) separator 0. Prepare your corpus
or tokenized corpus (tokenization is not in package)
1. Building vocab based on your corpusbert-vocab -c data/corpus.small -o data/vocab.small 2. Train your own BERT modelbert -c data/corpus.small -v data/vocab.small -o output/bert.model Language Model Pre-trainingIn the paper, authors shows the new language model training methods, which are "masked language model" and "predict next sentence". Masked Language Model
Rules:Randomly 15% of input token will be changed into something, based on under sub-rules
Predict Next Sentence
"Is this sentence can be continuously connected?" understanding the relationship, between two text sentences, which is not directly captured by language modeling Rules:
AuthorJunseong Kim, Scatter Lab ([email protected] / [email protected]) LicenseThis project following Apache 2.0 License as written in LICENSE file Copyright 2018 Junseong Kim, Scatter Lab, respective BERT contributors Copyright (c) 2018 Alexander Rush : The Annotated Trasnformer |
2023-10-27
2022-08-15
2022-08-17
2022-09-23
2022-08-13
请发表评论