Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
271 views
in Technique[技术] by (71.8m points)

xml parsing - XSLT: analyze-string and retain child nodes

I'm attempting to look for statements that refer to other statements using regular expression text matching. It works fine for instances where the text is in the same node, but I am struggling to deal with text that is a child node or split across nodes. Additionally, I want to ignore any text inside a del tag.

Starting with a document like this:

<doc>
    <sectionA>
        <statement id="1">
            <title>Titlle A</title>
            <statementtext id="a">This is referring to statement 2 about the stuff</statementtext>
            <!-- This is referring to statement <ref statementNumber="2">2</ref> about the stuff -->
        </statement>
        <statement id="2">
            <title>Title B</title>
            <statementtext id="b">This is <b>my</b> statement <b>1</b> referring to something else</statementtext>
            <!-- This is <b>my</b> statement <ref statementNumber="1"><b>1</b></ref> referring to something else -->
        </statement>
        <statement id="3">
            <title>Title 3</title>
            <statementtext id="c">This is another statement <b>1</b><i>2</i> about the stuff</statementtext>
            <!-- This is another statement <ref statementNumber="12"><b>1</b><i>2</i></ref> about the stuff -->
        </statement>
        <statement id="4">
            <title>Title 4</title>
            <statementtext id="d">This is corrected statement <del>1</del><ins>2</ins> about the stuff</statementtext>
            <!-- This is corrected statement <ref statementNumber="2"><del>1</del><ins>2</ins></ref> about the stuff -->
        </statement>        
        <statement id="5">
            <title>Title 5</title>
            <statementtext id="e">This is partially corrected statement 1<del>1</del><ins>5</ins> about the stuff</statementtext>
            <!-- This is partially corrected statement <ref statementNumber="15">1<del>1</del><ins>5</ins></ref> about the stuff -->
        </statement>
                <statement id="6">
            <title>Title 6</title>
            <statementtext  id="f">This is another
            <statementtext  id="g"> that contains a nested satementtext for statement <b>1</b><i>3</i> about </statementtext>
            the stuff</statementtext>
            <!-- This is another <statementtext id="g"> that contains a nested satementtext for statement <ref statementNumber="13"><b>1</b><i>3</i></ref> about </statementtext> -->
        </statement>
        <statement id="7">
            <title>Title 7</title>
            <statementtext id="h">This is <i>statement</i> <b>1</b> referring to something else</statementtext>
            <!-- This is my <i>statement</i> <ref statementNumber="1"><b>1</b></ref> referring to something else -->
        </statement>
        <statement id="8">
            <title>Title 8</title>
            <statementtext id="i">This is has no reference to another statement</statementtext>
            <!-- his is has no reference to another statement -->
        </statement>        
    </sectionA>     
</doc>

Using my current template

  <xsl:template match="statementtext">
      <statementtext>
          <xsl:copy-of select="./@*" />
        <xsl:variable name="thisText">
            <xsl:value-of select="./descendant-or-self::text()"/>
        </xsl:variable>

        <xsl:variable name="thisTextFiltered">
            <xsl:value-of select="./descendant-or-self::text()[not(descendant-or-self::del and comment())]"/>
        </xsl:variable>       

        <xsl:choose>
            <xsl:when test="matches($thisTextFiltered,'(statements*)(d+)','i')">
                    <xsl:analyze-string select="$thisTextFiltered"
                                    regex="(statements*)(d+)"
                                    flags="ix">
                        <xsl:matching-substring>
                        <xsl:value-of select="regex-group(1)"/>
                        <xsl:variable name="statementNumber">
                            <xsl:value-of select="regex-group(2)"></xsl:value-of>
                        </xsl:variable>
                            <ref>
                                <xsl:attribute name="statementNumber">
                                    <xsl:value-of select="$statementNumber" />
                                </xsl:attribute>
                                <xsl:value-of select="regex-group(2)"/>
                            </ref> 
                            </xsl:matching-substring>
                            <xsl:non-matching-substring>
                              <xsl:value-of select="."/>
                            </xsl:non-matching-substring>
                        </xsl:analyze-string>
                </xsl:when>
            <xsl:otherwise>
                <xsl:apply-templates />
            </xsl:otherwise>           
            </xsl:choose>
        </statementtext>
   </xsl:template>

      <xsl:template match="@*|*|processing-instruction()|comment()">
        <xsl:copy>
            <xsl:apply-templates select="*|@*|text()|processing-instruction()|comment()" mode="#current"/>
        </xsl:copy>
    </xsl:template

This is my output:

<!DOCTYPE HTML>
<doc>
   <sectionA>
      <statement id="1"><title>Titlle A</title><statementtext id="a">This is referring to statement 
            <ref statementNumber="2">2</ref> about the stuff
         </statementtext>
         <!-- This is referring to statement <ref statementNumber="2">2</ref> about the stuff -->
      </statement>
      <statement id="2"><title>Title B</title><statementtext id="b">This is my statement 
            <ref statementNumber="1">1</ref> referring to something else
         </statementtext>
         <!-- This is <b>my</b> statement <b><ref statementNumber="1">1</ref></b> referring to something else -->
      </statement>
      <statement id="3"><title>Title 3</title><statementtext id="c">This is another statement 
            <ref statementNumber="12">12</ref> about the stuff
         </statementtext>
         <!-- This is another statement <ref statementNumber="12"><b>1</b><i>2</i></ref> about the stuff -->
      </statement>
      <statement id="4"><title>Title 4</title><statementtext id="d">This is corrected statement 
            <ref statementNumber="12">12</ref> about the stuff
         </statementtext>
         <!-- This is corrected statement <ref statementNumber="2"><del>1</del><ins>2</ins></ref> about the stuff -->
      </statement>
      <statement id="5"><title>Title 5</title><statementtext id="e">This is partially corrected statement 
            <ref statementNumber="115">115</ref> about the stuff
         </statementtext>
         <!-- This is partially corrected statement <ref statementNumber="15">1<del>1</del><ins>5</ins></ref> about the stuff -->
      </statement>
      <statement id="6"><title>Title 6</title><statementtext id="f">This is another
                         that contains a nested satementtext for statement 
            <ref statementNumber="13">13</ref> about 
                        the stuff
         </statementtext>
         <!-- This is another <statementtext id="g"> that contains a nested satementtext for statement <ref statementNumber="13"><b>1</b><i>3</i></ref> about </statementtext> -->
      </statement>
      <statement id="7"><title>Title 7</title><statementtext id="h">This is statement
            <ref statementNumber="1">1</ref> referring to something else
         </statementtext>
         <!-- This is my <i>statement</i> <b><ref statementNumber="1">1</ref></b> referring to something else -->
      </statement>
      <statement id="8"><title>Title 8</title><statementtext id="i">This is has no reference to another statement</statementtext>
         <!-- his is has no reference to another statement -->
      </statement>
   </sectionA>
</doc>

Am I close or do I completely change my approach

question from:https://stackoverflow.com/questions/65885997/xslt-analyze-string-and-retain-child-nodes

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I have tried to use a preprocessing step to wrap numbers and then to use a mix of group-starting-with/group-adjacent, I think it now covers all the samples you have given but it is rather convoluted and deeply nested grouping code:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:fn="http://www.w3.org/2005/xpath-functions"
    exclude-result-prefixes="#all"
    expand-text="yes"
    version="3.0">

  <xsl:mode on-no-match="shallow-copy"/>
  
  <xsl:template match="text()" mode="analyze">
      <xsl:apply-templates select="analyze-string(., 'statements*([0-9]+)')" mode="wrap"/>
  </xsl:template>
  
  <xsl:mode name="analyze" on-no-match="shallow-copy"/>
  
  <xsl:template match="fn:group[@nr = 1]" mode="wrap">
      <n>{.}</n>
  </xsl:template>

  <xsl:template match="statementtext">
      <xsl:copy>
          <xsl:variable name="wrapped" as="node()*">
              <xsl:apply-templates mode="analyze"/>
          </xsl:variable>
 
          <xsl:for-each-group select="$wrapped" group-starting-with="node()[matches(., 'statements*$', 'i')]">
              <xsl:choose>
                  <xsl:when test="matches(., 'statements*$', 'i')">
                      <xsl:apply-templates select="."/>
                      <xsl:for-each-group select="tail(current-group())" group-adjacent="matches(., '^[0-9 ]+$')">
                          <xsl:choose>
                              <xsl:when test="current-grouping-key() and position() = 1 and matches(., '^s+$')">
                                  <xsl:apply-templates select="."/>
                                  <ref statementNumber="{string-join(tail(current-group())[not(self::del)])}">
                                      <xsl:apply-templates select="tail(current-group())"/>
                                  </ref>
                              </xsl:when>
                              <xsl:when test="current-grouping-key() and position() = 1">
                                  <ref statementNumber="{string-join(current-group()[not(self::del)])}">
                                      <xsl:apply-templates select="current-group()"/>
                                  </ref>
                              </xsl:when>
                              <xsl:otherwise>
                                  <xsl:apply-templates select="current-group()"/>
                              </xsl:otherwise>
                          </xsl:choose>
                      </xsl:for-each-group>
                  </xsl:when>
                  <xsl:otherwise>
                      <xsl:apply-templates select="current-group()"/>
                  </xsl:otherwise>
              </xsl:choose>
          </xsl:for-each-group>
      </xsl:copy>
  </xsl:template>
  
  <xsl:template match="n">
      <xsl:apply-templates/>
  </xsl:template>
  
</xsl:stylesheet>

https://xsltfiddle.liberty-development.net/bEJbVrL


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...