Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
636 views
in Technique[技术] by (71.8m points)

excel - VBA href Crawl on Browser's Source Code

I did update on my question since i know more clearly on the technicality i am trying to address.

A. If we take the resulting URL from a search on a data agency's site we get this

    https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0000010795&type=10-K&dateb=&owner=exclude&count=20

B. By entering the URL of Step A into a Browser and going to the source code we see at line No. 100 (I use Google Chrome) this charming line which is also a clickable link:

    href="/Archives/edgar/data/10795/000119312513456802/0001193125-13-456802-index.htm"

the line is contained in this code snippet of our source code:

    <tr>
<td nowrap="nowrap">10-K</td>
<td nowrap="nowrap"><a href="/Archives/edgar/data/10795/000119312513456802/0001193125-13-456802-index.htm" id="documentsbutton">&nbsp;Documents</a>&nbsp; <a href="/cgi-bin/viewer?action=view&amp;cik=10795&amp;accession_number=0001193125-13-456802&amp;xbrl_type=v" id="interactiveDataBtn">&nbsp;Interactive Data</a></td>
<td class="small" >Annual report [Section 13 and 15(d), not S-K Item 405]<br />Acc-no: 0001193125-13-456802&nbsp;(34 Act)&nbsp; Size: 15 MB            </td>
            <td>2013-11-27</td>
            <td nowrap="nowrap"><a href="/cgi-bin/browse-edgar?action=getcompany&amp;filenum=001-04802&amp;owner=exclude&amp;count=20">001-04802</a><br>131247478         </td>
         </tr>

C. If we click on line 100 the link of step A, we go to the next page and the link of step A now becomes part of the URL! So what we get is a new page assigned to this URL:

https://www.sec.gov/Archives/edgar/data/10795/000119312513456802/0001193125-13-456802-index.htm

D. With use of the same methodology we meet in line No. 182 this line of code

href="/Archives/edgar/data/10795/000119312513456802/bdx-20130930.xml"

if we click the line we get to the strXMLSite which is on the macro below. Once you take a look at the macro and run it, you will understand that it is a logical conclusion that the String could populated with the desired URL on runtime, if we could integrate a relevant procedure into our macro. That is the nucleus of the question.


We have activated the needed for the macro Microsoft XML Core Services (MSXML) (Excel --> VBE --> Tools --> References --> Microsoft XML, v6.0) needed for the procedure.

How can we make VBA Crawl from the URL which is on Step A through the source-code to the URL that is now on strXMLSite String by adding statements to the procedure? Do we need to activate a library from Tools--> References? Can you show me a code block using such a methodology? What is the line of approach on this point?

For reasons of completeness allow me to provide the macro courtesy of @user2140261

Sub GetNode()
Dim strXMLSite As String
Dim objXMLHTTP As MSXML2.XMLHTTP
Dim objXMLDoc As MSXML2.DOMDocument
Dim objXMLNodexbrl As MSXML2.IXMLDOMNode
Dim objXMLNodeDIIRSP As MSXML2.IXMLDOMNode

Set objXMLHTTP = New MSXML2.XMLHTTP
Set objXMLDoc = New MSXML2.DOMDocument

strXMLSite = "http://www.sec.gov/Archives/edgar/data/10795/000119312513456802/bdx-20130930.xml"

objXMLHTTP.Open "POST", strXMLSite, False
objXMLHTTP.send
objXMLDoc.LoadXML (objXMLHTTP.responseText)

Set objXMLNodexbrl = objXMLDoc.SelectSingleNode("xbrl")

Set objXMLNodeDIIRSP = objXMLNodexbrl.SelectSingleNode("us-gaap:DebtInstrumentInterestRateStatedPercentage")

Worksheets("Sheet1").Range("A1").Value = objXMLNodeDIIRSP.Text
End Sub

thank you for watching my question

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Add a reference to "Microsoft Internet controls". This will get you to the point where you can get the individual xml links.

Sub Tester()

    Dim IE As New InternetExplorer
    Dim els, el, colDocLinks As New Collection
    Dim lnk

    IE.Visible = True
    Loadpage IE, "https://www.sec.gov/cgi-bin/browse-edgar?" & _
                  "action=getcompany&CIK=0000010795&type=10-K" & _
                  "&dateb=&owner=exclude&count=20"

    'collect all the "Document" links on the page
    Set els = IE.Document.getelementsbytagname("a")
    For Each el In els
        If Trim(el.innerText) = "Documents" Then
            'Debug.Print el.innerText, el.href
            colDocLinks.Add el.href
        End If
    Next el

    'loop through the "document" links and check each page for xml links
    For Each lnk In colDocLinks
        Loadpage IE, CStr(lnk)
        For Each el In IE.Document.getelementsbytagname("a")
            If el.href Like "*.xml" Then
                Debug.Print el.innerText, el.href
                'work with the document from this link
            End If
        Next el
    Next lnk

End Sub

Sub Loadpage(IE As Object, URL As String)
    IE.navigate URL
    Do While IE.Busy Or IE.ReadyState <> READYSTATE_COMPLETE
        DoEvents
    Loop
End Sub

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...