I'm attempting to return the "last-modified" date for a set of URLs using Scrapy. However, I'm returning an error that states: KeyError: 'last-modified'. Specifically, the following:
File "C:spider.py", line 460, in fetch_dates
url_time = r.headers['last-modified']
File "C:structures.py", line 52, in __getitem__
return self._store[key.lower()][1]
KeyError: 'last-modified'
The code I'm using for this is:
def fetch_dates(self, response):
url = response.url
r = requests.head(response.url)
url_time = r.headers['last-modified']
url_date = parsedate(url_time)
for url in url_date:
if os.path.exists('1url-to-date.csv'):
append_write = 'a'
else:
append_write = 'w'
with open('1url-to-date.csv', append_write) as url_f:
url_f.write(url_time + "&,&" + url + "
")
return Item()
The code is also not generating my csv file or returning the information I need. Any suggestions? Thank you!
EDIT: I modified to the following;
def fetch_dates(self, response):
url = response.url
r = requests.head(response.url)
url_time = r.headers.get("last-modified", str(time.time()))
url_date = parsedate(url_time)
for url in url_date:
if os.path.exists('1url-to-date.csv'):
append_write = 'a'
else:
append_write = 'w'
with open('1url-to-date.csv', append_write) as url_f:
url_f.write(url_time + "&,&" + url + "
")
return Item()
But, now I'm getting this new error: "ValueError: year 1610477971 is out of range". Any suggestions would be very helpful. Thanks!
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…