The error is occurring at the line geo_info <- read_html(geo_links[k])
. The issue is that geo_links
is empty. So when you do 1:length(geo_links)
it returns the vector c(1, 0)
and goes into the for for
loop.
Then, in geo_info <- read_html(geo_links[k])
it tries to access the first element of the vector geo_links
. Since the vector is empty, it returns NA
. When read_html
tries to read this url
it returns this error message (I think it is trying to read the "file" NA in the working directory).
So you should test for the length of geo_links
and only enter the for loop if length(geo_links) > 0
.
if (length(geo_links) > 0) {
for(k in 1:length(geo_links)){
geo_info <- read_html(geo_links[k])
lat <- geo_info%>%
html_node(xpath = '//span[@class="latitude"]')%>%
html_text()
long <- geo_info%>%
html_node(xpath = '//*[@class="longitude"]')%>%
html_text()
long_lat_list[[k]] <- list(latitude=lat, longitude=long)
}
sample$latitude <- lapply(long_lat_list, "[[", 1)
sample$longitude <- lapply(long_lat_list, "[[", 2)
}
The reason you are getting empty lists in some of these links is because the tables are not exactly the same between the different links.
You look for the geolocation data in nodes with the tag "small". It works in the first two, but it does not in the third one. In the 3rd one there is no "small" node and the geolocation data is tagged differently...
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…