Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
756 views
in Technique[技术] by (71.8m points)

excel - Scraping website data with options in combo box VBA

Hi I am trying to scrape the Product name (Cohiba Robusto), Product Size (Single Cigar, Pack of 3, Box of 25) and prices (£33.65, £90, £730) from this website: https://www.jjfox.co.uk/cohiba-robusto-621.html

I am trying to get something like this:

enter image description here

I am using the code below, which gives an error ("Object variable or with variable not set").

Will appreciate any help with this.

Sub getproducts()

Sheets("JJFox").Select

Dim oHtml       As HTMLDocument
Dim oElement    As Object

Dim Elements As IHTMLElementCollection
Dim Document As HTMLDocument

Set oHtml = New HTMLDocument


'Cells(1, 6) = Time()
lastrow = ActiveSheet.Cells(Rows.Count, "A").End(xlUp).Row
cnt = lastrow + 1

counter1 = cnt

Dim gg As String

gg = "https://www.jjfox.co.uk/cohiba-robusto-621.html"


Dim objHTTP As New WinHttp.WinHttpRequest
url = gg
    objHTTP.Open "POST", url, False
    objHTTP.setRequestHeader "Content-Type", "application/json"
    objHTTP.send ("{""key"":null,""from"":""[email protected]"",""to"":null,""cc"":null,""bcc"":null,""date"":null,""subject"":""My Subject"",""body"":null,""attachments"":null}")
   
 oHtml.body.innerHTML = objHTTP.responseText
 'Cells(rw, 2) = oHtml.getElementsByTagName("description").innerText

   '    If Not .Document.querySelector("button[aria-label='Close']") Is Nothing Then
     '       .Document.querySelector("button[aria-label='Close']").Click
      '  End If
      
      
    txttitle = oHtml.getElementsByClassName("productcart")(0).innerText
txttitlehtml = oHtml.getElementsByClassName("packsize")(0).innerHTML


txttitle = Mid(txttitle, 1, InStr(1, txttitle, Chr(10)))
'Debug.Print txttitlehtml
'txttitle2 = oHtml.getElementsByClassName("price")(0).innerText

Dim Text As String
Text = GetHTML(gg)


starts = InStr(1, Text, "spConfig =")
endS = InStr(starts + 1, Text, "spConfig")

If starts = 0 Then


    Cells(counter1, 1) = txttitle
    Cells(counter1, 2) = "Single"
    starts = InStr(starts + 1, Text, "productPrice")
    endl = InStr(starts + 1, Text, ",")
    Cells(counter1, 3) = Val(Mid(Text, starts + 14, endl - (starts + 14)))
    Cells(counter1, 4) = "JJFox"
    Cells(counter1, 5) = Now()
     
    Cells(counter1, 7) = gg ' link to the page
    counter1 = counter1 + 1
   
Else


Text = Mid(Text, starts, endS - starts)
'Debug.Print Text
'find how many pack options are avaialble

myTxt = Text
countTxt = "label"

bb = (Len(myTxt) - Len(replace(myTxt, countTxt, ""))) / Len(countTxt) - 1
'End find////////////////////////////////////

varlabel = "class=" & Chr(34) & "label" & Chr(34)


starts = InStr(1, Text, "label") + 1
Text = Mid(Text, starts, Len(Text))
        
        For i = 1 To bb
        
        
        starts = InStr(1, Text, "label")
        
        If InStr(starts, Text, "label") Then
        
        'Show the element's properties
           
        
                Cells(counter1, 1) = txttitle
                Cells(counter1, 2) = Mid(Text, starts + 8, InStr(starts, Text, " ") - (starts + 8))
                
                                       
                        starts = InStr(starts + 1, Text, "oldPrice")
                        endl = InStr(starts + 1, Text, ",")
                        

                Cells(counter1, 3).FormulaR1C1 = Val(Mid(Text, starts + 11, endl - (starts + 11)))
                'Debug.Print Val(Mid(Text, startS + chrs, 6))
                Cells(counter1, 4) = "JJFox"
                Cells(counter1, 5) = Now()
                starts = starts + 1
                Text = Mid(Text, starts, Len(Text))
                Cells(counter1, 7) = gg ' link to the page
                counter1 = counter1 + 1
           End If
        
        Next i
            
End If
'Cells(2, 6) = Time()
End Sub



Function GetHTML(url As String) As String
     With CreateObject("MSXML2.ServerXMLHTTP.6.0")
        .Open "GET", url, False
        .send
        GetHTML = .responseText
    End With
End Function
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The prices and labels are pulled dynamically from a script tag who content you can parse as json with a json parser. You need to grab the name from the html however.

With a little knowledge of html and css, it is easy enough to define a css pattern to target the script node of interest with:

.fieldset [type='text/x-magento-init']

That looks for a child script with type attribute having attribute value text/x-magento-init, and a parent with class fieldset.

I have used a tiny bit less efficient (you won't notice):

For i = 1 To optionsCollection.Count

Simply because I know the collection is small and to allow me to index into two variables with a single loop.


Json library:

I use jsonconverter.bas. Download raw code from here and add to standard module called JsonConverter . Remove the top Attribute line from the copied code.

You then need to go:

VBE > Tools > References > Add references to:

Microsoft Scripting Runtime
Microsoft HTML Object Library
Microsoft XML Library. 

In VBA for json the [] denotes a collection and the {} represents a dictionary.


Option Explicit

Public Sub GetCigarData()
    '<  VBE > Tools > References:
    'Microsoft Scripting Runtime
    'Microsoft HTML Object Library
    'Microsoft XML Library
    
    Dim json As Object, html As MSHTML.HTMLDocument, xhr As MSXML2.XMLHTTP60, ws As Worksheet

    Set ws = ThisWorkbook.Worksheets("Sheet1")
    Set xhr = New MSXML2.XMLHTTP60
    Set html = New MSHTML.HTMLDocument
    
    With xhr
        .Open "GET", "https://www.jjfox.co.uk/cohiba-robusto-621.html", False
        .setRequestHeader "User-Agent", "Mozilla/5.0"
        .send
         html.body.innerHTML = .responseText
    End With

    Set json = jsonConverter.ParseJson(html.querySelector(".fieldset [type='text/x-magento-init']").innerHTML)("#product_addtocart_form")("configurable")("spConfig")
     
    Dim prices As Scripting.Dictionary, options As Scripting.Dictionary, optionsCollection As Collection
    
    Set prices = json("optionPrices")
    Set options = json("attributes")
    Set optionsCollection = options(options.Keys(0))("options")
    
    Dim results() As Variant, headers() As Variant, i As Long, name As String
    ReDim results(1 To optionsCollection.Count, 1 To 3)
    
    name = html.querySelector(".base").innerText

    For i = 1 To optionsCollection.Count
         results(i, 1) = name
         results(i, 2) = optionsCollection.item(i)("label")
         results(i, 3) = prices(prices.Keys(i - 1))("finalPrice")("amount")
    Next
    
    headers = Array("Name", "Size", "Price")
    
    With ws
        .Cells(1, 1).Resize(1, UBound(headers) + 1) = headers
        .Cells(2, 1).Resize(UBound(results, 1), UBound(results, 2)) = results 
    End With
   
End Sub

Read about css selectors:

  1. https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_Selectors

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...