• 设为首页
  • 点击收藏
  • 手机版
    手机扫一扫访问
    迪恩网络手机版
  • 关注官方公众号
    微信扫一扫关注
    迪恩网络公众号

ssampang/im2latex: Tensorflow port of im2markup project

原作者: [db:作者] 来自: 网络 收藏 邀请

开源软件名称(OpenSource Name):

ssampang/im2latex

开源软件地址(OpenSource Url):

https://github.com/ssampang/im2latex

开源编程语言(OpenSource Language):

Python 92.5%

开源软件介绍(OpenSource Introduction):

NOTE: This project is not actively maintained and may not run on recent versions of TensorFlow

im2latex

This is a TensorFlow port of harvardnlp's im2markup project. The model accepts images of mathematical equations typeset in Latex, and outputs the markup used to generate those images. It contains the im2markup project as a git submodule to make use of its preprocessing scripts.

Model

Briefly, the model uses a convolutional network followed by a Bi-RNN row encoder as an encoder, and an LSTM with an attention mechanism as a decoder.

Run

To run, clone this repository, run the prep.sh (will install nodeJS if not already installed) script in the data directory, and then run python im2latex.py.

Results

Unfortunately, as TensorFlow requires more memory than Torch, this project was not able to use the complete dataset used by the im2markup project. Assuming a batch size of 20, about one fourth of the dataset consisted of images that were too large to fit on a Titan X GPU. Still, over a period of 100 epochs, each taking approximately 100 minutes, the model seemed to saturate at 64.0% validation accuracy at around epoch 60. Test accuracy was only measured after 100 epochs, and was 64.9%. The im2markup project made use of beam search to measure test accuracy, which has not been implemented in this project yet. This, and the inability to use images from the full dataset, may help to explain the difference in performance relative to the im2markup project, which achieved 75% test accuracy on the full dataset.

Future Work

I hope to look into implementing beam search, and running this model on AWS, where I can distribute the model across multiple GPUs and hopefully run it on the full dataset.




鲜花

握手

雷人

路过

鸡蛋
该文章已有0人参与评论

请发表评论

全部评论

专题导读
热门推荐
阅读排行榜

扫描微信二维码

查看手机版网站

随时了解更新最新资讯

139-2527-9053

在线客服(服务时间 9:00~18:00)

在线QQ客服
地址:深圳市南山区西丽大学城创智工业园
电邮:jeky_zhao#qq.com
移动电话:139-2527-9053

Powered by 互联科技 X3.4© 2001-2213 极客世界.|Sitemap