jgm/cheapskate: Experimental markdown processor in Haskell

原作者: [db:作者] 来自: 网络收藏邀请

开源软件名称（OpenSource Name）：

jgm/cheapskate

开源软件地址(OpenSource Url)：

https://github.com/jgm/cheapskate

开源编程语言(OpenSource Language)：

HTML 50.9%

开源软件介绍(OpenSource Introduction)：

Cheapskate

Note: This library is unmaintained (by me anyway). I recommend using commonmark.

This is an experimental Markdown processor in pure Haskell. (A cheapskate is always in search of the best markdown.) It aims to process Markdown efficiently and in the most forgiving possible way. It is about seven times faster than pandoc and uses a fifth the memory. It is also faster, and considerably more accurate, than the markdown package on Hackage.

There is no such thing as an invalid Markdown document. Any string of characters is valid Markdown. So the processor should finish efficiently no matter what input it gets. Garbage in should not cause an error or exponential slowdowns. This processor has been tested on many large inputs consisting of random strings of characters, with performance that is consistently linear with the input size. (Try make fuzztest.)

Installing

To build, get the Haskell Platform, then:

cabal update && cabal install

This will install both the cheapskate executable and the Haskell library. A man page can be found in man/man1 in the source.

Usage

As an executable:

cheapskate [FILE*]

As a library:

import Cheapskate
import Text.Blaze.Html

toMarkdown :: Text -> Html
toMarkdown = toHtml . markdown def

If the markdown input you are converting comes from an untrusted source (e.g. a web form), you should always set sanitize to True. This causes the generated HTML to be filtered through xss-sanitize's sanitizeBalance function. Otherwise you risk a XSS attack from raw HTML or a markdown link or image attribute attribute.

You may also wish to disallow users from entering raw HTML for aesthetic, rather than security reasons. In that case, set allowRawHtml to False, but let sanitize stay True, since it still affects attributes coming from markdown links and images.

Manipulating the parsed document

You can manipulate the parsed document before rendering using the walk and walkM functions. For example, you might want to highlight code blocks using highlighting-kate:

import Data.Text as T
import Data.Text.Lazy as TL
import Cheapskate
import Text.Blaze.Html
import Text.Blaze.Html.Renderer.Text
import Text.Highlighting.Kate

markdownWithHighlighting :: Text -> Html
markdownWithHighlighting = toHtml . walk addHighlighting . markdown def

addHighlighting :: Block -> Block
addHighlighting (CodeBlock (CodeAttr lang _) t) =
  HtmlBlock (T.concat $ TL.toChunks
             $ renderHtml $ toHtml
             $ formatHtmlBlock defaultFormatOpts
             $ highlightAs (T.unpack lang) (T.unpack t))
addHighlighting x = x

Extensions

This processor adds the following Markdown extensions:

Hyperlinked URLs

All absolute URLs are automatically made into hyperlinks, where inside <> or not.

Fenced code blocks

Fenced code blocks with attributes are allowed. These begin with a line of three or more backticks or tildes, followed by an optional language name and possibly other metadata. They end with a line of backticks or tildes (the same character as started the code block) of at least the length of the starting line.

Explicit hard line breaks

A hard line break can be indicated with a backslash before a newline. The standard method of two spaces before a newline also works, but this gives a more "visible" alternative.

Backslash escapes

All ASCII symbols and punctuation marks can be backslash-escaped, not just those with a use in Markdown.

Revisions

In departs from the markdown syntax document in the following ways:

Intraword emphasis

Underscores cannot be used for word-internal emphasis. This prevents common mistakes with filenames, usernames, and indentifiers. Asterisks can still be used if word internal emphasis is needed.

The exact rule is this: an underscore that appears directly after an alphanumeric character does not begin an emphasized span. (However, an underscore directly before an alphanumeric can end an emphasized span.)

Ordered lists

The starting number of an ordered list is now significant. Other numbers are ignored, so you can still use 1. for each list item.

In addition to the 1. form, you can use 1) in your ordered lists. A new list starts if you change the form of the delimiter. So, the following is two lists:

1. one
2. two
1) one
2) two

Bullet lists

A new bullet lists starts if you change the bullet marker. So, the following is two consecutive bullet lists:

+ one
+ two
- one
- two

List separation

Two blank lines breaks out of a list. This allows you to have consecutive lists:

- one

- two


- one (new list)

The blank lines break out of a list no matter how deeply it is nested:

- one
  - two
    - three


  - new top-level list

Indentation of list continuations

Block elements inside list items need not be indented four spaces. If they are indented beyond the bullet or numerical list marker, they will be considered additional blocks inside the list item. So, the following is a list item with two paragraphs:

- one

 two

The amount of indentation required for an indented code block inside a list item depends on the first line of the list item. Generally speaking, code must be indented four spaces past the first non-space character after the list marker. Thus:

 -   My code

         {code here}

 - My code

       {code here}

The following diagram shows how the first line of a list item divides the following lines into three regions:

 -   My code
  |     |
  +-----+

Content to the left of the marked region will not be part of the list item. Content to the right of the marked region will be indented code under the list item. Regular blocks that belong under the list item should start inside the marked region.

When the first line itself contains indented code, this code and subsequent indented code blocks should be indented five spaces past the list marker:

 -     { code }

       { more code }

Raw HTML blocks

Raw HTML blocks work a bit differently than in Markdown.pl. A raw HTML block starts with a block-level HTML tag (opening or closing), or a comment start , and goes until the next blank line. The whole block is included as raw HTML. No attempt is made to parse balanced tags. This means that in the following, the asterisks are literal asterisks:

<div>
*hello*
</div>

while in the following, the asterisks are interpreted as markdown emphasis:

<div>

*hello*

</div>

In the first example, we have a single raw HTML block; in the second, we have two raw HTML blocks with an intervening paragraph. This system provides flexibility to authors to use enclose markdown sections in html block-level tags if they wish, while also allowing them to include verbatim HTML blocks (taking care that the don't include any blank lines).

As a consequence of this rule, HTML blocks may not contain blank lines.

Clarifications

This implementation resolves the following issues left vague in the markdown syntax document:

Tight vs. loose lists

A list is considered "tight" if (a) it has only one item or there is no blank space between any two consecutive items, and (b) no item has blank lines as its immediate children. If a list is "tight," then list items consisting of a single paragraph or a paragraph followed by a sublist will be rendered without <p> tags.

Sublists

Sublists work like other block elements inside list items; they must be indented past the bullet or numerical list marker (but no more than three spaces past, or they will be interpreted as indented code).

ATX headers

ATX headers must have a space after the initial ###s.

Separation of block quotes

A blank line will end a blockquote. So, the following is a single blockquote:

> hi
>
> there

But this is two blockquotes:

> hi

> there

Blank lines are not required before horizontal rules, blockquotes, lists, code blocks, or headers. They are not required after, either, though in many cases "laziness" will effectively require a blank line after. For example, in

Hello there.
> A quote.
Still a quote.

the "Still a quote." is part of the block quote, because of laziness (the ability to leave off the > from the beginning of subsequent lines). Laziness also affects lists. However, we can have a code block, ATX header, or horizontal rule between two paragraphs without any blank lines.

Link references

Link references may occur anywhere in the document, even in nested list contexts. They need not be at the outer level.

Tests

The tests subdirectory contains an extensive suite of tests, including all of John Gruber's original Markdown tests, plus many of the tests from Michel Fortin's mdtest suite. Each test consists in two files with the same basename, a markdown source and an expected HTML output.

To run the test suite, do

make test

To run only tests that match a regex pattern, do

PATT=Orig make test

Setting the environment variable TIDY=1 will run the expected and actual output through tidy before comparing them. You can run this test suite on another markdown processor by doing

PROG=myothermarkdown make test

Benchmarks

To run a crude benchmark comparing cheapskate to pandoc, do make bench. Set the BENCHPROGS environment variable to compare to other implementations.

License

The library is released under the BSD license; see LICENSE for terms.

Some of the test cases are borrowed from Michel Fortin's mdtest suite and John Gruber's original markdown test suite.

鲜花

握手

雷人

路过

鸡蛋

该文章已有0人参与评论

请发表评论

全部评论

专题导读

More+

10-27 六六分期app的软件客服如何联系？(六六分期

11-06 可心卡盟:win10系统火狐flash插件崩溃怎么

11-06 亲亲特价:怎么删除回收站图标

11-06 济南大学虚拟社区:鲁大师节能降温的具体办

11-06 xlueops.exe:无线网络安装向导

11-06 女斗合众国:win7系统cf与主机连接不稳定怎

11-06 0xc000022-[cf烟雾头]cf怎么调烟雾头

11-06 qizideyouhuo:应用程序无法正常启动0xc0000

11-06 ipz-185:win7系统vcf文件怎么打开

11-06 傻哥蹦迪:win10系统s4怎么打开usb调试

11-06 八神浩树gtaste:回收站清空了怎么恢复

11-06 妖尾之黑色守护:win10系统电脑没有1440x900

11-06 校园至尊魔王小说:win7系统浏览网页时字体

11-06 女斗合众国:win10系统访问共享文件夹提示请

11-06 tokyo hot n0654:恢复win7系统默认字体一招

11-06 雨酷仙境:设置win7系统转移临时文件夹腾出

11-06 阿穆纳伊之杖:win7系统开始菜单在右边还原

11-06 tunespotting:win10系统火狐flash插件总是

11-06 甘尔葛分析师：计谋网站seo关键词暴涨有什

11-06 蔡贵霖: 计谋网站seo关键词暴涨有什么秘密

11-06 博益网首页:ao3网页版进入不了解决方法

11-06 漏斗子专栏: 网站数据分析小白易懂精华篇

11-06 见证双虹怎么做:win7系统开启telnet命令的

11-06 颾狐蝶蜋:系统资源不足无法完成请求的服务

11-06 国光中学校歌:提交网站到alexa查询详细步骤

11-06 西安有情天:静态网页和动态网页的区别

11-06 红木雅尚斋:外部链接构造对网站的好处

11-06 前官礼遇：防止域名劫持–增强域安全性的10

11-06 密传二转答案: 中文分词算法有哪些

11-06 金泉家园邮编:百度快照劫持的表现及应对方

guardians-of-life/haskell-editor-setup: Easy instructions for setting up Haskell ...发布时间：2022-06-22

riak-haskell-client/riak-haskell-client: A fast Haskell client library for the R ...发布时间：2022-06-22

剪的笔顺,诠释剪的笔画,认识剪的部首

1 六六分期app的软件客服如何联系？(六六分期

六六分期app的软件客服如何联系？不知道吗？加qq群【895510560】即可！标题：六六分期

阅读：18280|2023-10-27

2 可心卡盟:win10系统火狐flash插件崩溃怎么

今天小编告诉大家如何处理win10系统火狐flash插件总是崩溃的问题，可能很多用户都不知

阅读：9678|2022-11-06

3 亲亲特价:怎么删除回收站图标

今天小编告诉大家如何对win10系统删除桌面回收站图标进行设置，可能很多用户都不知道

阅读：8180|2022-11-06

4 济南大学虚拟社区:鲁大师节能降温的具体办

今天小编告诉大家如何对win10系统电脑设置节能降温的设置方法，想必大家都遇到过需要

阅读：8549|2022-11-06

5 xlueops.exe:无线网络安装向导

我们在使用xp系统的过程中,经常需要对xp系统无线网络安装向导设置进行设置，可能很多

阅读：8458|2022-11-06

6 女斗合众国:win7系统cf与主机连接不稳定怎

今天小编告诉大家如何处理win7系统玩cf老是与主机连接不稳定的问题，可能很多用户都不

阅读：9393|2022-11-06

7 0xc000022-[cf烟雾头]cf怎么调烟雾头

电脑对日常生活的重要性小编就不多说了，可是一旦碰到win7系统设置cf烟雾头的问题，很

阅读：8431|2022-11-06

8 qizideyouhuo:应用程序无法正常启动0xc0000

我们在日常使用电脑的时候，有的小伙伴们可能在打开应用的时候会遇见提示应用程序无法

阅读：7865|2022-11-06

9 ipz-185:win7系统vcf文件怎么打开

今天小编告诉大家如何对win7系统打开vcf文件进行设置，可能很多用户都不知道怎么对win

阅读：8416|2022-11-06

10 傻哥蹦迪:win10系统s4怎么打开usb调试

今天小编告诉大家如何对win10系统s4开启USB调试模式进行设置，可能很多用户都不知道怎

阅读：7394|2022-11-06

客服电话

电子邮件

jgm/cheapskate: Experimental markdown processor in Haskell

开源软件名称（OpenSource Name）：

开源软件地址(OpenSource Url)：

开源编程语言(OpenSource Language)：

开源软件介绍(OpenSource Introduction)：

Cheapskate

Installing

Usage

Manipulating the parsed document

Extensions

Hyperlinked URLs

Fenced code blocks

Explicit hard line breaks

Backslash escapes

Revisions

Intraword emphasis

Ordered lists

Bullet lists

List separation

Indentation of list continuations

Raw HTML blocks

Clarifications

Tight vs. loose lists

Sublists

ATX headers

Separation of block quotes

Link references

Tests

Benchmarks

License

请发表评论

全部评论

上一篇：

下一篇：

撇的笔顺,理解撇的笔画,解读撇的部首

chasinginfinity/ml-from-scratch: Machine

delphi线程使用

mkyong/spring3-mvc-maven-annotation-hell

床的笔顺,关于床的笔画,体会床的部首

剪的笔顺,诠释剪的笔画,认识剪的部首

六六分期app的软件客服如何联系？(六六分期

florent37/ViewAnimator: A fluent Android

florent37/Shrine-MaterialDesign2: implem

CVE-2020-36276

SimpleSoftwareIO/simple-sms: Send and re

关于我们

产品与服务

解决方案

139-2527-9053