Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
135 views
in Technique[技术] by (71.8m points)

javascript - Remove formatting tags from string body of email

How do you remove all formatting tags when calling:

GmailApp.getInboxThreads()[0].getMessages()[0].getBody()

such that the only remainder of text is that which can be read.

Formatting can be destroyed; the text in the body is only needed to be parsed, but tags such as:

"&" 
<br>

and possibly others, need to be removed.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Even though there's no DOM in Apps Script, you can parse out HTML and get the plain text this way:

function getTextFromHtml(html) {
  return getTextFromNode(Xml.parse(html, true).getElement());
}

function getTextFromNode(x) {
  switch(x.toString()) {
    case 'XmlText': return x.toXmlString();
    case 'XmlElement': return x.getNodes().map(getTextFromNode).join('');
    default: return '';
  }
}

calling

getTextFromHtml("hello <div>foo</div>&amp; world <br /><div>bar</div>!");

will return

"hello foo& world bar!".

To explain, Xml.parse with the second param as "true" parses the document as an HTML page. We then walk the document (which will be patched up with missing HTML and BODY elements, etc. and turned into a valid XHTML page), turning text nodes into text and expanding all other nodes.

This is admittedly poorly documented; I wrote this by playing around with the Xml object and logging intermediate results until I got it to work. We need to document the Xml stuff better.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...