php - How to saveHTML of DOMDocument without HTML wrapper?

Question

Welcome To Ask or Share your Answers For Others

php - How to saveHTML of DOMDocument without HTML wrapper?

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

php - How to saveHTML of DOMDocument without HTML wrapper?

I'm the function below, I'm struggling to output the DOMDocument without it appending the XML, HTML, body and p tag wrappers before the output of the content. The suggested fix:

$postarray['post_content'] = $d->saveXML($d->getElementsByTagName('p')->item(0));

Only works when the content has no block level elements inside it. However, when it does, as in the example below with the h1 element, the resulting output from saveXML is truncated to...

<p>If you like</p>

I've been pointed to this post as a possible workaround, but I can't understand how to implement it into this solution (see commented out attempts below).

Any suggestions?

function rseo_decorate_keyword($postarray) {
    global $post;
    $keyword = "Jasmine Tea"
    $content = "If you like <h1>jasmine tea</h1> you will really like it with Jasmine Tea flavors. This is the last ocurrence of the phrase jasmine tea within the content. If there are other instances of the keyword jasmine tea within the text what happens to jasmine tea."
    $d = new DOMDocument();
    @$d->loadHTML($content);
    $x = new DOMXpath($d);
    $count = $x->evaluate("count(//text()[contains(translate(., 'ABCDEFGHJIKLMNOPQRSTUVWXYZ', 'abcdefghjiklmnopqrstuvwxyz'), '$keyword') and (ancestor::b or ancestor::strong)])");
    if ($count > 0) return $postarray;
    $nodes = $x->query("//text()[contains(translate(., 'ABCDEFGHJIKLMNOPQRSTUVWXYZ', 'abcdefghjiklmnopqrstuvwxyz'), '$keyword') and not(ancestor::h1) and not(ancestor::h2) and not(ancestor::h3) and not(ancestor::h4) and not(ancestor::h5) and not(ancestor::h6) and not(ancestor::b) and not(ancestor::strong)]");
    if ($nodes && $nodes->length) {
        $node = $nodes->item(0);
        // Split just before the keyword
        $keynode = $node->splitText(strpos($node->textContent, $keyword));
        // Split after the keyword
        $node->nextSibling->splitText(strlen($keyword));
        // Replace keyword with <b>keyword</b>
        $replacement = $d->createElement('strong', $keynode->textContent);
        $keynode->parentNode->replaceChild($replacement, $keynode);
    }
$postarray['post_content'] = $d->saveXML($d->getElementsByTagName('p')->item(0));
//  $postarray['post_content'] = $d->saveXML($d->getElementsByTagName('body')->item(1));
//  $postarray['post_content'] = $d->saveXML($d->getElementsByTagName('body')->childNodes);
return $postarray;
}

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-16T21:22:15+0000

All of these answers are now wrong, because as of PHP 5.4 and Libxml 2.6 loadHTML now has a $option parameter which instructs Libxml about how it should parse the content.

Therefore, if we load the HTML with these options

$html->loadHTML($content, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

when doing saveHTML() there will be no doctype, no <html>, and no <body>.

LIBXML_HTML_NOIMPLIED turns off the automatic adding of implied html/body elements LIBXML_HTML_NODEFDTD prevents a default doctype being added when one is not found.

Full documentation about Libxml parameters is here

(Note that loadHTML docs say that Libxml 2.6 is needed, but LIBXML_HTML_NODEFDTD is only available in Libxml 2.7.8 and LIBXML_HTML_NOIMPLIED is available in Libxml 2.7.7)

Categories

php - How to saveHTML of DOMDocument without HTML wrapper?

php - How to saveHTML of DOMDocument without HTML wrapper?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags