• 设为首页
  • 点击收藏
  • 手机版
    手机扫一扫访问
    迪恩网络手机版
  • 关注官方公众号
    微信扫一扫关注
    迪恩网络公众号

mmhanif/nlp_workshop: Jupyter notebook for NLP Workshop

原作者: [db:作者] 来自: 网络 收藏 邀请

开源软件名称:

mmhanif/nlp_workshop

开源软件地址:

https://github.com/mmhanif/nlp_workshop

开源编程语言:

Jupyter Notebook 100.0%

开源软件介绍:

An NLP workshop - Categorizing tweets into relevant or non-relevant

adapted from https://github.com/hundredblocks/concrete_NLP_tutorial.git

Workshop Overview

In this workshop you will combine a few different NLP techniques to classify tweets. You will:

  • clean up your input data
  • transform your text data into numerical vectors (because machine learning algorithms need numbers as inputs!)
  • use those vectors as input to machine learning classifiers

We will present you with the code for three different word embeddings (i.e. ways to turn text into vectors of numbers) and two different classification algorithms. It will be your job to select an embedding and a model, train it with the data provided (you might want to perform some additional cleansing of the data beforehand) and then test the accuracy of your trained model. The team with the most accurate model wins!

Environment Setup

If you are using Anaconda, you set up a new conda environment as follows:

conda env create -f env.yml

This creates a new conda environment called nlpw. To add this as a kernel to Jupyter, use the following command:

python -m ipykernel install --user --name nlpw --display-name "Python (nlpw)"

Download pre-trained Word2Vec model

You can download a pre-trained Word2Vec model that was trained on a large corpus of Google News articles here: https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM/edit?usp=sharing

Note for MacOS

Use env-MacOS.yml instead of env.yml to create the conda environment on MacOS.

By default, conda installs MKL versions of libraries such as numpy, tensorflow etc. However, this seems to cause some issues.

To fix we include a 'nomkl' package to force installation of alternative libraries.

In addition, Tensorflow on MacOS seems to have problems with protobuf versions > 3.8, so we downgrade to version 3.8 of protobuf.




鲜花

握手

雷人

路过

鸡蛋
该文章已有0人参与评论

请发表评论

全部评论

专题导读
热门推荐
阅读排行榜

扫描微信二维码

查看手机版网站

随时了解更新最新资讯

139-2527-9053

在线客服(服务时间 9:00~18:00)

在线QQ客服
地址:深圳市南山区西丽大学城创智工业园
电邮:jeky_zhao#qq.com
移动电话:139-2527-9053

Powered by 互联科技 X3.4© 2001-2213 极客世界.|Sitemap