在线时间:8:00-16:00
迪恩网络APP
随时随地掌握行业动态
扫描二维码
关注迪恩网络微信公众号
开源软件名称:mmhanif/nlp_workshop开源软件地址:https://github.com/mmhanif/nlp_workshop开源编程语言:Jupyter Notebook 100.0%开源软件介绍:An NLP workshop - Categorizing tweets into relevant or non-relevantadapted from https://github.com/hundredblocks/concrete_NLP_tutorial.git Workshop OverviewIn this workshop you will combine a few different NLP techniques to classify tweets. You will:
We will present you with the code for three different word embeddings (i.e. ways to turn text into vectors of numbers) and two different classification algorithms. It will be your job to select an embedding and a model, train it with the data provided (you might want to perform some additional cleansing of the data beforehand) and then test the accuracy of your trained model. The team with the most accurate model wins! Environment SetupIf you are using Anaconda, you set up a new conda environment as follows:
This creates a new conda environment called nlpw. To add this as a kernel to Jupyter, use the following command:
Download pre-trained Word2Vec modelYou can download a pre-trained Word2Vec model that was trained on a large corpus of Google News articles here: https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM/edit?usp=sharing Note for MacOSUse env-MacOS.yml instead of env.yml to create the conda environment on MacOS. By default, conda installs MKL versions of libraries such as numpy, tensorflow etc. However, this seems to cause some issues. To fix we include a 'nomkl' package to force installation of alternative libraries. In addition, Tensorflow on MacOS seems to have problems with protobuf versions > 3.8, so we downgrade to version 3.8 of protobuf. |
2023-10-27
2022-08-15
2022-08-17
2022-09-23
2022-08-13
请发表评论