Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
520 views
in Technique[技术] by (71.8m points)

javascript - Casperjs scraping dynamic content

I'm trying to scrape this page using Casperjs. The main function to my code works just fine, but the content is loaded dynamically and I can't figure out how to trigger that.

This is what I'm doing right now:

casper.waitFor(function() {

    this.scrollToBottom();

    var count = this.evaluate(function() {
        var match = document.querySelectorAll('.loading-msg');
        return match.length;
    });

    if (count <= 1) {
        return true;
    }
    else {
        return false
    };

}, function() { // do stuff });

The wait timeout just expires, even though I've increased it to 20s, and the new content never gets loaded. I've tried adapting this function to my case:

function tryAndScroll(casper) {
  casper.waitFor(function() {
    this.page.scrollPosition = { top: this.page.scrollPosition["top"] + 4000, left: 0 };
    return true;
  }, function() {
    var info = this.getElementInfo('p[loading-spinner="!loading"]');
    if (info["visible"] == true) {
      this.waitWhileVisible('p[loading-spinner="!loading"]', function () {
        this.emit('results.loaded');
      }, function () {
        this.echo('next results not loaded');
      }, 5000);
    }
  }, function() {
    this.echo("Scrolling failed. Sorry.").exit();
  }, 500);
}

But I couldn't figure it out and I'm not even sure it's relevant here. Any ideas?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I've looked to the page. It has such a behvior that it doesn't load the middle images when you jump to the end.

When the page is loaded the first couple of rows are completely loaded and some more are not completely loaded (image missing denoted by '.loading-msg' element). When you jump to the end with this.scrollToBottom(); there is no continous scroll. It jumps to the end and the page JavaScript doesn't detect that the middle images were in the viewport, however briefly. The page goes on to load the next rows, but not the missing images of the jumped over rows.

You have to reduce the distance of the jump in both of your snippets.

The first one can be changed like this:

var pos = 0, 
    height = casper.page.viewportSize.height;
casper.waitFor(function() {
    this.scrollTo(0, pos * height);
    return !this.exists('.loading-msg');
}, function() { // do stuff }, 20000);

The second one might work by changing

this.page.scrollPosition = { top: this.page.scrollPosition["top"] + 4000, left: 0 };

to

var height = casper.page.viewportSize.height;
this.page.scrollPosition = { top: this.page.scrollPosition.top + height, left: 0 };

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...