Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
879 views
in Technique[技术] by (71.8m points)

vba - Use getElementById on HTMLElement instead of HTMLDocument

I've been playing around with scraping data from web pages using VBS/VBA.

If it were Javascript I'd be away as its easy, but it doesn't seem to be quite as straight forward in VBS/VBA.

This is an example I made for an answer, it works but I had planned on accessing the child nodes using getElementByTagName but I could not figure out how to use them! The HTMLElement object does not have those methods.

Sub Scrape()
Dim Browser As InternetExplorer
Dim Document As HTMLDocument
Dim Elements As IHTMLElementCollection
Dim Element As IHTMLElement

Set Browser = New InternetExplorer

Browser.navigate "http://www.hsbc.com/about-hsbc/leadership"

Do While Browser.Busy And Not Browser.readyState = READYSTATE_COMPLETE
    DoEvents
Loop

Set Document = Browser.Document

Set Elements = Document.getElementsByClassName("profile-col1")

For Each Element in Elements
    Debug.Print "[  name] " & Trim(Element.Children(1).Children(0).innerText)
    Debug.Print "[ title] " & Trim(Element.Children(1).Children(1).innerText)
Next Element

Set Document = Nothing
Set Browser = Nothing
End Sub

I have been looking at the HTMLElement.document property, seeing if it is like a fragment of the document but its either difficult to work with or just isnt what I think

Dim Fragment As HTMLDocument
Set Element = Document.getElementById("example") ' This works
Set Fragment = Element.document ' This doesn't

This also seems a long winded way to do it (although thats usually the way for vba imo). Anyone know if there is a simpler way to chain functions?

Document.getElementById("target").getElementsByTagName("tr") would be awesome...

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
Sub Scrape()
    Dim Browser As InternetExplorer
    Dim Document As htmlDocument
    Dim Elements As IHTMLElementCollection
    Dim Element As IHTMLElement

    Set Browser = New InternetExplorer
    Browser.Visible = True
    Browser.navigate "http://www.stackoverflow.com"

    Do While Browser.Busy And Not Browser.readyState = READYSTATE_COMPLETE
        DoEvents
    Loop

    Set Document = Browser.Document

    Set Elements = Document.getElementById("hmenus").getElementsByTagName("li")
    For Each Element In Elements
        Debug.Print Element.innerText
        'Questions
        'Tags
        'Users
        'Badges
        'Unanswered
        'Ask Question
    Next Element

    Set Document = Nothing
    Set Browser = Nothing
End Sub

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

56.8k users

...