• 设为首页
  • 点击收藏
  • 手机版
    手机扫一扫访问
    迪恩网络手机版
  • 关注官方公众号
    微信扫一扫关注
    迪恩网络公众号

egonSchiele/HandsomeSoup: Easy HTML parsing for Haskell

原作者: [db:作者] 来自: 网络 收藏 邀请

开源软件名称(OpenSource Name):

egonSchiele/HandsomeSoup

开源软件地址(OpenSource Url):

https://github.com/egonSchiele/HandsomeSoup

开源编程语言(OpenSource Language):

Haskell 94.1%

开源软件介绍(OpenSource Introduction):

HandsomeSoup

Current Status: Usable and stable. Needs GHC 7.6. Please file bugs!

HandsomeSoup is the library I wish I had when I started parsing HTML in Haskell.

It is built on top of HXT and adds a few functions that make it easier to work with HTML.

Most importantly, it adds CSS selectors to HXT. The goal of HandsomeSoup is to be a complete CSS2 selector parser for HXT.

Install

cabal install HandsomeSoup

Example

Nokogiri, the HTML parser for Ruby, has an example showing how to scrape Google search results. This is easy in HandsomeSoup:

import Text.XML.HXT.Core
import Text.HandsomeSoup

main = do
    let doc = fromUrl "http://www.google.com/search?q=egon+schiele"
    links <- runX $ doc >>> css "h3.r a" ! "href"
    mapM_ putStrLn links

What can HandsomeSoup do for you?

Easily parse an online page using fromUrl

let doc = fromUrl "http://example.com"

Or a local page using parseHtml

contents <- readFile [filename]
let doc = parseHtml contents

Easily extract elements using css

Here are some valid selectors:

doc <<< css "a"
doc <<< css "*"
doc <<< css "a#link1"
doc <<< css "a.foo"
doc <<< css "p > a"
doc <<< css "p strong"
doc <<< css "#container h1"
doc <<< css "img[width]"
doc <<< css "img[width=400]"
doc <<< css "a[class~=bar]"
doc <<< css "a:first-child"

Easily get attributes using (!)

doc <<< css "img" ! "src"
doc <<< css "a" ! "href"

Docs

Find Haddock docs on Hackage.

I also wrote The Complete Guide To Parsing HXT With Haskell.

Credits

Made by Adit.




鲜花

握手

雷人

路过

鸡蛋
该文章已有0人参与评论

请发表评论

全部评论

专题导读
热门推荐
阅读排行榜

扫描微信二维码

查看手机版网站

随时了解更新最新资讯

139-2527-9053

在线客服(服务时间 9:00~18:00)

在线QQ客服
地址:深圳市南山区西丽大学城创智工业园
电邮:jeky_zhao#qq.com
移动电话:139-2527-9053

Powered by 互联科技 X3.4© 2001-2213 极客世界.|Sitemap