在线时间:8:00-16:00
迪恩网络APP
随时随地掌握行业动态
扫描二维码
关注迪恩网络微信公众号
开源软件名称:robrua/easy-bert开源软件地址:https://github.com/robrua/easy-bert开源编程语言:Java 62.2%开源软件介绍:easy-berteasy-bert is a dead simple API for using Google's high quality BERT language model in Python and Java. Currently, easy-bert is focused on getting embeddings from pre-trained BERT models in both Python and Java. Support for fine-tuning and pre-training in Python will be added in the future, as well as support for using easy-bert for other tasks besides getting embeddings. PythonHow To Get Iteasy-bert is available on PyPI. You can install with UsageYou can use easy-bert with pre-trained BERT models from TensorFlow Hub or from local models in the TensorFlow saved model format. To create a BERT embedder from a TensowFlow Hub model, simply instantiate a Bert object with the target tf-hub URL: from easybert import Bert
bert = Bert("https://tfhub.dev/google/bert_multi_cased_L-12_H-768_A-12/1") You can also load a local model in TensorFlow's saved model format using from easybert import Bert
bert = Bert.load("/path/to/your/model/") Once you have a BERT model loaded, you can get sequence embeddings using x = bert.embed("A sequence")
y = bert.embed(["Multiple", "Sequences"]) If you want per-token embeddings, you can set x = bert.embed("A sequence", per_token=True)
y = bert.embed(["Multiple", "Sequences"], per_token=True) easy-bert returns BERT embeddings as numpy arrays Every time you call with bert:
x = bert.embed("A sequence", per_token=True)
y = bert.embed(["Multiple", "Sequences"], per_token=True) You can save a BERT model using bert.save("/path/to/your/model/")
bert = Bert.load("/path/to/your/model/") CLIeasy-bert also provides a CLI tool to conveniently do one-off embeddings of sequences with BERT. It can also convert a TensorFlow Hub model to a saved model. Run Dockereasy-bert comes with a docker build that can be used as a base image for applications that rely on bert embeddings or to just run the CLI tool without needing to install an environment. JavaHow To Get Iteasy-bert is available on Maven Central. It is also distributed through the releases page. To add the latest easy-bert release version to your maven project, add the dependency to your <dependencies>
<dependency>
<groupId>com.robrua.nlp</groupId>
<artifactId>easy-bert</artifactId>
<version>1.0.3</version>
</dependency>
</dependencies> Or, if you want to get the latest development version, add the Sonaype Snapshot Repository to your <dependencies>
<dependency>
<groupId>com.robrua.nlp</groupId>
<artifactId>easy-bert</artifactId>
<version>1.0.4-SNAPSHOT</version>
</dependency>
</dependencies>
<repositories>
<repository>
<id>snapshots-repo</id>
<url>https://oss.sonatype.org/content/repositories/snapshots</url>
<releases>
<enabled>false</enabled>
</releases>
<snapshots>
<enabled>true</enabled>
</snapshots>
</repository>
</repositories> UsageYou can use easy-bert with pre-trained BERT models generated with easy-bert's Python tools. You can also used pre-generated models on Maven Central. To load a model from your local filesystem, you can use: try(Bert bert = Bert.load(new File("/path/to/your/model/"))) {
// Embed some sequences
} If the model is in your classpath (e.g. if you're pulling it in via Maven), you can use: try(Bert bert = Bert.load("/resource/path/to/your/model")) {
// Embed some sequences
} Once you have a BERT model loaded, you can get sequence embeddings using float[] embedding = bert.embedSequence("A sequence");
float[][] embeddings = bert.embedSequences("Multiple", "Sequences"); If you want per-token embeddings, you can use float[][] embedding = bert.embedTokens("A sequence");
float[][][] embeddings = bert.embedTokens("Multiple", "Sequences"); Pre-Generated Maven Central ModelsVarious TensorFlow Hub BERT models are available in easy-bert format on Maven Central. To use one in your project, add the following to your <dependencies>
<dependency>
<groupId>com.robrua.nlp.models</groupId>
<artifactId>ARTIFACT-ID</artifactId>
<version>1.0.0</version>
</dependency>
</dependencies> Once you've pulled in the dependency, you can load the model using this code. Substitute the appropriate Resource Path from the list below in place of try(Bert bert = Bert.load("RESOURCE-PATH")) {
// Embed some sequences
} Available Models
Creating Your Own ModelsFor now, easy-bert can only use pre-trained TensorFlow Hub BERT models that have been converted using the Python tools. We will be adding support for fine-tuning and pre-training new models easily, but there are no plans to support these on the Java side. You'll need to train in Python, save the model, then load it in Java. BugsIf you find bugs please let us know via a pull request or issue. Citing easy-bertIf you used easy-bert for your research, please cite the project. |
2023-10-27
2022-08-15
2022-08-17
2022-09-23
2022-08-13
请发表评论