在线时间:8:00-16:00
迪恩网络APP
随时随地掌握行业动态
扫描二维码
关注迪恩网络微信公众号
开源软件名称:GravityLabs/goose开源软件地址:https://github.com/GravityLabs/goose开源编程语言:Scala 100.0%开源软件介绍:#Goose - Article Extractor ##Intro Goose was originally an article extractor written in Java that has most recently (aug2011) converted to a scala project. It's mission is to take any news article or article type web page and not only extract what is the main body of the article but also all meta data and most probable image candidate. The extraction goal is to try and get the purest extraction from the beginning of the article for servicing flipboard/pulse type applications that need to show the first snippet of a web article along with an image. Goose will try to extract the following information:
The wiki has the full details on how to use Goose https://github.com/jiminoc/goose/wiki Goose was open sourced by Gravity.com in 2011 Lead Programmer: Jim Plush (Gravity.com) Contributers: Robbie Coleman (Gravity.com) Try it out online! http://jimplush.com/blog/goose ##Licensing If you find Goose useful or have issues please drop me a line, I'd love to hear how you're using it or what features should be improved Goose is licensed by Gravity.com under the Apache 2.0 license, see the LICENSE file for more details ##Take it for a spin To use goose from the command line:
##Regarding the port from JAVA to Scala Here are some of the reasons for the port to Scala:
##Issues It was a pretty fast Java to Scala port so lots of the nicities of the Scala language aren't in the codebase yet, but those will come over the coming months as we re-write alot of the internal methods to be more Scalesque. We made sure it was still nice and operable from Java as well so if you're using goose from java you still should be able to use it with a few changes to the method signatures. |
2023-10-27
2022-08-15
2022-08-17
2022-09-23
2022-08-13
请发表评论