Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
314 views
in Technique[技术] by (71.8m points)

javascript - Crawler's "shouldCrawl" event requires boolean returned from axios async function, and can't get them in sync

Event shouldCrawl belongs to js-crawler's config object, and the callback function it has as value must return boolean in order to tell the crawler whether or not to crawl the URL received as an argument.

I'm using axios and HEAD method to retrieve the resource's headers. Will return true to shouldCrawl when content-type contains text/html in order to prevent the crawler from downloading files and garbage.

My code:

this.crawler = new Crawler().configure({

    shouldCrawl: async(sUrl)=> {
        
        const crawlWhenHtml = async()=> { //return false;

            return axios({
                url: sUrl,
                method: 'head'

            }).then(res=>{ 
                return (res.headers['content-type'].indexOf('text/html') >= 0? 
                                    true:false);

            }).catch(error=>{
                return  false;
            });
        }

        return await crawlWhenHtml();
    }
});

I can't get shouldCrawl and crawlWhenHtml in sync.

If I make the callback returns false (see commented sentence), shouldCrawl ignores it and crawls the URL anyway. This happens since I made the mentioned callback async.

But without making it async I cant't wait for axios completes the request before returning a boolean to shouldCrawl.

How can I unravel this?

question from:https://stackoverflow.com/questions/65848240/crawlers-shouldcrawl-event-requires-boolean-returned-from-axios-async-functio

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...