在线时间:8:00-16:00
迪恩网络APP
随时随地掌握行业动态
扫描二维码
关注迪恩网络微信公众号
开源软件名称(OpenSource Name):parrt/bookish开源软件地址(OpenSource Url):https://github.com/parrt/bookish开源编程语言(OpenSource Language):Java 90.3%开源软件介绍(OpenSource Introduction):BookishBookish is an xml-ish + some markdown format for books and articles that it can convert to HTML and latex. I used it to generate this article: The Matrix Calculus You Need For Deep Learning. You can use python directly in the doc like a notebook to compute and print stuff: and display data frames: and even show matplotlib graphs: As see below, it also does some really fancy magic to convert full latex equations (or even latex chunks) to SVG images for display inline (tricky to get vertical alignment correct.) Meta-languageBookish is mostly XML-like but uses markdown for the more common things like italics and code fonts. (Note that the xml tags do not always have an end tag or even the trailing Bookish requires a root document that is kind of like a metadata file: <book title="A simple book" author="T. Parr">
<include file=chap1.xml>
<include file=chap2.xml> Then the chapter files look like:
CheatsheetHere are the tags that contain attributes, not all of which are required:
Origins in math-infested markupJeremy Howard and I wrote up a nice mathy latex document called ``The Matrix Calculus You Need For Deep Learning'' that has over 600 equations. We wanted to post it to the web in HTML or markdown but quickly ran into a problem trying to get equations rendered. In the end we converted the source document to markdown and build a translator that generated HTML using SVG for equations and PDF from native latex equations. It does a pretty good job with html as you can see: All of those equations, even the ones inline in the text paragraph, are Here is the raw matrix-calculus.md that What's so hard about rendering equations?If you're doing markdown or HTML, people tend to use MathJax or its faster cousin Katex. MathJax is just too slow when you have 600 equations. Katex is much better but it (and MathJax) requires every There's another problem. Is all of that JavaScript gonna work in epub formats? What about the Kindle? Because I'm hoping to write a book on machine learning, I'm leary of relying on full-blown JavaScript to render equations. I tried pandoc and a few other tools like multimarkdown but not everything came through correctly to the translated output and I got tired of chasing all of this down. As the ANTLR guy, I ain't afeared of building a language translator and so, following my motto ``Why program by hand in five days what you can spend five years of your life automating'', I decided to simply solve this problem by building my own markdown translator. How to typeset and display math via SVGIf you can't use JavaScript, you have to use images. If you have to use images, you want scalable graphics, which means SVG files. So, the translator must extract equations and replace them with The really tricky bit is the vertical alignment of equations within a line of HTML text. Check out this sentence with embedded equations: (I had to take a snapshot and show that instead of giving raw HTML plus equations; github's markdown processor didn't handle it properly. haha.) What does it mean to properly align an equation's image? It's painful. We need to convince latex to give us metrics on how far the typeset image drops below the baseline. (Latex calls this the depth.) It took a while, but I figured out how to not only compute the depth below baseline but also how to get it back into this Java program via the latex log file. You can see how all of this is done here: Translator.visitEqn(). Here is the latex incantation to extract height and depth of the rendered equation: \begin{document}
\thispagestyle{empty}
<body>
\setbox0=\vbox{<body>}
\typeout{// bookish metrics: \the\ht0, \the\dp0}
\end{document} where Oh, and to get the font to look less anemic, you need to set the math fonts: \DeclareSymbolFont{operators} {OT1}{ztmcm}{m}{n}
\DeclareSymbolFont{letters} {OML}{ztmcm}{m}{it}
\DeclareSymbolFont{symbols} {OMS}{ztmcm}{m}{n}
\DeclareSymbolFont{largesymbols}{OMX}{ztmcm}{m}{n}
\DeclareSymbolFont{bold} {OT1}{ptm}{bx}{n}
\DeclareSymbolFont{italic} {OT1}{ptm}{m}{it} One last little tidbit. Image file names are based upon the MD5 digest hash of the equation. There are two benefits: (1) repeated equations share the same file and (2) latex is slow, like 1 second per equation, but the hashed filename lets us cache all of the images and know when we must refresh an image because the equation changed. It's safe to stop reading here. You can learn everything you need to know about doing this yourself from this description and the source code. This repository is just getting started and is in progress so don't expect a tool you can use yourself, at least at the moment. ImplementationYou will also notice that I have built this program as if it were a programming language translator. The strategy I use is to construct a model of the document from the parse tree using a visitor. Then I use a fiendishly clever bit of code to automatically convert that representation of the document into a tree of string templates. Of course the set of templates you use determines what output you get. Change the templates and you change the target language. For example here are the HTML templates. |
2023-10-27
2022-08-15
2022-08-17
2022-09-23
2022-08-13
请发表评论