PHP Crawler类代码示例

OGeek|极客世界-中国程序员成长平台 › 门户 › 编程› PHP›PHP编程经验

原作者: [db:作者] 来自: [db:来源] 收藏邀请

本文整理汇总了PHP中Crawler类的典型用法代码示例。如果您正苦于以下问题：PHP Crawler类的具体用法？PHP Crawler怎么用？PHP Crawler使用的例子？那么恭喜您, 这里精选的类代码示例或许可以为您提供帮助。

在下文中一共展示了Crawler类的20个代码示例，这些例子默认根据受欢迎程度排序。您可以为喜欢或者感觉有用的代码点赞，您的评价将有助于我们的系统推荐出更棒的PHP代码示例。

示例1: go

 public function go()
 {
     $start_url = $this->url;
     $c = new Crawler($start_url);
     $c->go2linewhere('<p><a href="');
     $c->close();
     $ledak = explode('<a href="', $c->curline);
     for ($i = 1; $i < count($ledak); ++$i) {
         $aurl = Crawler::cutuntil($ledak[$i], '"');
         $aurl = str_replace('http://hentaifromhell.net/redirect.html?', '', $aurl);
         echo "<a href='{$aurl}'>{$aurl}</a><br />\n";
         flush();
         /*
         $basename = Crawler::cutuntillast($aurl, '/');
         if (!in_array($basename, $this->blacklist)) {
         	$c = new Crawler($aurl);
         	$c->go2linewhere('id="thepic"');
         	$imgurl = $c->getbetween('SRC="', '"');
         	$c->close();
         	echo "<a href='$basename/$imgurl'>".Crawler::n($i,3).".jpg</a><br />\n";
         	flush();
         } else {
         	echo "$i blacklisted server<br/>";flush();
         }
         */
     }
 }

开发者ID:JerryMaheswara，项目名称:crawler，代码行数:27，代码来源:spider_hfhgallery3.php

示例2: getHotSpots

 public function getHotSpots()
 {
     $crawler = new Crawler($this);
     $outlines = new CrawlerOutlineCollection();
     $size = $this->image->size();
     for ($x = 0; $x < $size[0]; $x++) {
         for ($y = 0; $y < $size[1]; $y++) {
             $pixel = $this->pixel($x, $y);
             // Skip white pixels
             if ($pixel->color()->compare(ImageColor::white(), 5)) {
                 continue;
             }
             // Skip crawled areas
             if ($outlines->contains($pixel)) {
                 continue;
             }
             // Start crawling
             $outline = $crawler->crawl($x, $y);
             $outlines->push($outline);
         }
     }
     $hotspots = new ImageCollection();
     foreach ($outlines as $outline) {
         $hotspots->push($this->image->sliceByOutline($outline));
     }
     return array($hotspots, $outlines);
 }

开发者ID:passbolt，项目名称:passbolt_selenium，代码行数:27，代码来源:imagepixelmatrix.php

示例3: crawl_1_page

function crawl_1_page($url)
{
    echo "URL2 {$url} <br/>\n";
    flush();
    $dirname = html_entity_decode(Crawler::cutfromlast1(substr($url, 0, strlen($url) - 1), '/'));
    $hasil = array();
    $c = new Crawler($url);
    $c->go_to('<div class="entry">');
    while ($line = $c->readline()) {
        if (Crawler::is_there($line, "href='")) {
            $img = Crawler::extract($line, "href='", "'");
            echo "<a href='{$img}'>{$dirname}</a><br/>\n";
            flush();
        } else {
            if (Crawler::is_there($line, 'href="')) {
                $img = Crawler::extract($line, 'href="', '"');
                echo "<a href='{$img}'>{$dirname}</a><br/>\n";
                flush();
            } else {
                if (Crawler::is_there($line, '</div>')) {
                    break;
                }
            }
        }
    }
    $c->close();
}

开发者ID:JerryMaheswara，项目名称:crawler，代码行数:27，代码来源:reallycuteasians.php

示例4: login

 /**
  * Log the user
  * @param Crawler $crawler
  */
 private function login($crawler)
 {
     $form = $crawler->selectButton('_submit')->form();
     // définit certaines valeurs
     $form['_username'] = 'kwizer';
     $form['_password'] = 'sslover';
     return $this->client->submit($form);
 }

开发者ID:jlm-entreprise，项目名称:fee-bundle，代码行数:12，代码来源:FeeControllerTest.php

示例5: testCrawl

 public function testCrawl()
 {
     $account = Account::login("searchzen.org", "test");
     $c = $account->collections[0];
     $crawler = new Crawler($c);
     $crawler->pageLimit = 10;
     $crawler->start();
 }

开发者ID:jacobandresen，项目名称:now，代码行数:8，代码来源:CrawlerTest.php

示例6: crawlForNews

 /**
  * Start the crawler to retrieve pages from a given news website
  * @param type $nrOfDaysBack The nr of days the crawler should go back (counting from today)
  * @param type $newsSiteUrl The root URL of the news site (the seed of the crawler)
  * @return type
  */
 public function crawlForNews($nrOfDaysBack, $newsSiteUrl, $timeToLive, $startDate = null)
 {
     $crawler = new Crawler($newsSiteUrl, $timeToLive);
     if ($startDate) {
         $crawler->crawl($nrOfDaysBack, $startDate);
     } else {
         $crawler->crawl($nrOfDaysBack);
     }
     return count($crawler->getCrawled());
 }

开发者ID:Bram9205，项目名称:WebInfo，代码行数:16，代码来源:Main.php

示例7: run

 static function run()
 {
     if (isset($_GET['site_url']) && isset($_GET['sitemap_url']) && CODOF\Access\CSRF::valid($_GET['CSRF_token'])) {
         $sitemapObject = new Crawler($_GET['site_url']);
         $sitemapPath = ABSPATH . 'sitemap.xml';
         $sitemapFile = $sitemapObject->createSitemap($sitemapPath);
         // session_write_close();
         // ob_end_flush();
         exit;
     }
 }

开发者ID:KopjeKoffie，项目名称:codoforum-sitemap-generator，代码行数:11，代码来源:sitemap.php

示例8: crawl_indowebster

function crawl_indowebster($url)
{
    //echo "'$url'";
    $craw = new Crawler($url);
    $craw->go2lineregexor('/(<\\/div><\\/a><\\/div><\\/div>)/', 1, 'href="#idws7"');
    $setring = $craw->getbetween('location.href=\'', '\'');
    $path = Crawler::extract($setring, 'path=', '&');
    $file_orig = Crawler::cutafter($setring, 'file_orig=');
    $craw->close();
    return '<a href="' . dirname($setring) . '/' . $path . '">' . rawurldecode($file_orig) . '</a>';
}

开发者ID:JerryMaheswara，项目名称:crawler，代码行数:11，代码来源:indowebster.php

示例9: mangareader_1_page

 public function mangareader_1_page($fil, $url, $prefix, $chapter)
 {
     $chapter = Crawler::pad($chapter, 3);
     $c = new Crawler($fil);
     $c->go_to('width="800"');
     $img = $c->getbetween('src="', '"');
     preg_match('/(\\d+\\.\\w+)$/', basename($img), $m);
     $iname = $m[1];
     $c->close();
     $name = $prefix . '-' . $chapter . '-' . $iname;
     return array($name => $img);
 }

开发者ID:JerryMaheswara，项目名称:crawler，代码行数:12，代码来源:Mangareader_Crawler.php

示例10: addCrawlMap

 public function addCrawlMap($src, $patterns)
 {
     if (!empty($src)) {
         $root = $this->_src_root . '/' . $src;
     } else {
         $root = $this->_src_root;
     }
     $crawler = new Crawler($root, $patterns);
     $paths = $crawler->getPaths();
     foreach ($paths as $path) {
         $this->addMap($src . '/' . $path, str_replace('site/', '', $path));
     }
 }

开发者ID:walteraries，项目名称:anahita，代码行数:13，代码来源:class.php

示例11: mangareader_1_page

 public function mangareader_1_page($fil, $url, $chapter)
 {
     $prefix = $this->prefix;
     $chapter = Crawler::pad($chapter, 3);
     $c = new Crawler($fil);
     $c->go_to('width="800"');
     $img = $c->getbetween('src="', '"');
     // if (@$_GET['show_url']) echo "<a href='$url'>URL</a> ";
     preg_match('/(\\d+\\.\\w+)$/', basename($img), $m);
     $iname = $m[1];
     echo '<li><a href="' . $img . '">' . $prefix . '-' . $chapter . '-' . $iname . '</a>' . "</li>\n";
     $c->close();
 }

开发者ID:JerryMaheswara，项目名称:crawler，代码行数:13，代码来源:mangareader.php

示例12: handleCrawling

 public function handleCrawling()
 {
     $view = new CrawlerResultViewSwe();
     if (isset($_SESSION['url'])) {
         $url = $_SESSION['url'];
         $curl = new Curl();
         $crawler = new Crawler($curl, $url);
     }
     //If user wants to book
     if ($view->bookParamExists() && isset($crawler)) {
         $group1 = $view->getBookInfo();
         $reservationInfo = $crawler->getReservations($group1);
         $view->outPutReservationResult($reservationInfo);
         //destroy session
         $_SESSION = array();
         session_destroy();
         session_unset();
     } else {
         if ($view->timeParamExists() && $view->dayParamExists() && $view->movieParamExists() && isset($crawler)) {
             $day = $view->getDay();
             $time = $view->getTime();
             $movie = $view->getMovie();
             $dinnerInfo = $crawler->getDinnerInfo($day, $time);
             $view->outPutDinnerAlts($dinnerInfo, $time, $movie);
         } else {
             if ($view->userHasSubmittedURL()) {
                 //$url ="http://localhost:8080/";
                 $url = $view->getURL();
                 $curl = new Curl();
                 $crawler = new Crawler($curl, $url);
                 $crawler->getLinks();
                 $dates = $crawler->getCalendarInfo();
                 $comparer = new compareData();
                 $matchingDay = $comparer->compareCommonDates($dates);
                 if (!is_null($matchingDay)) {
                     $filmDates = $crawler->getFilmInfo($matchingDay);
                     $view->outPutMovieResult($filmDates, $matchingDay);
                 } else {
                     $view->noMatchingDays();
                 }
                 $_SESSION['url'] = $url;
             } else {
                 $view = new formView();
             }
         }
     }
     return $view;
 }

开发者ID:henceee，项目名称:hg222dv-1DV449-Webteknik-ii，代码行数:48，代码来源:webBookingController.php

示例13: testFlickrCrawl

    public function  testFlickrCrawl() {
        $builders = $this->buildData();

        $crawler = Crawler::getInstance();
        $config = Config::getInstance();

        //use fake Flickr API key
        $plugin_builder = FixtureBuilder::build('plugins', array('id'=>'2', 'folder_name'=>'flickrthumbnails'));
        $option_builder = FixtureBuilder::build('options', array(
            'namespace' => OptionDAO::PLUGIN_OPTIONS . '-2',
            'option_name' => 'flickr_api_key',
            'option_value' => 'dummykey') );
        //$config->setValue('flickr_api_key', 'dummykey');

        $this->simulateLogin('[email protected]', true);
        $crawler->crawl();

        $ldao = DAOFactory::getDAO('LinkDAO');

        $link = $ldao->getLinkById(43);
        $this->assertEqual($link->expanded_url, 'http://farm3.static.flickr.com/2755/4488149974_04d9558212_m.jpg');
        $this->assertEqual($link->error, '');

        $link = $ldao->getLinkById(42);
        $this->assertEqual($link->expanded_url, '');
        $this->assertEqual($link->error, 'No response from Flickr API');

        $link = $ldao->getLinkById(41);
        $this->assertEqual($link->expanded_url, '');
        $this->assertEqual($link->error, 'No response from Flickr API');
    }

开发者ID:rkabir，项目名称:ThinkUp，代码行数:31，代码来源:TestOfFlickrThumbnailsPlugin.php

示例14: tearDown

 /**
  * Destroy Config, Webapp, $_SESSION, $_POST, $_GET, $_REQUEST
  */
 public function tearDown()
 {
     Config::destroyInstance();
     Webapp::destroyInstance();
     Crawler::destroyInstance();
     if (isset($_SESSION)) {
         $this->unsetArray($_SESSION);
     }
     $this->unsetArray($_POST);
     $this->unsetArray($_GET);
     $this->unsetArray($_REQUEST);
     $this->unsetArray($_SERVER);
     $this->unsetArray($_FILES);
     Loader::unregister();
     $backup_dir = FileDataManager::getBackupPath();
     if (file_exists($backup_dir)) {
         try {
             @exec('cd ' . $backup_dir . '; rm -rf *');
             rmdir($backup_dir);
             // won't delete if has files
         } catch (Exception $e) {
         }
     }
     $data_dir = FileDataManager::getDataPath();
     if (file_exists($data_dir . 'compiled_view')) {
         try {
             @exec('cd ' . $data_dir . '; rm -rf compiled_view');
         } catch (Exception $e) {
         }
     }
     parent::tearDown();
 }

开发者ID:ravi-modria，项目名称:ThinkUp，代码行数:35，代码来源:class.ThinkUpBasicUnitTestCase.php

示例15: control

 public function control()
 {
     $output = "";
     $authorized = false;
     if (isset($this->argc) && $this->argc > 1) {
         // check for CLI credentials
         $session = new Session();
         $username = $this->argv[1];
         if ($this->argc > 2) {
             $pw = $this->argv[2];
         } else {
             $pw = getenv('THINKUP_PASSWORD');
         }
         $owner_dao = DAOFactory::getDAO('OwnerDAO');
         $owner = $owner_dao->getByEmail($username);
         if ($owner_dao->isOwnerAuthorized($username, $pw)) {
             $authorized = true;
             Session::completeLogin($owner);
         } else {
             $output = "ERROR: Incorrect username and password.";
         }
     } else {
         // check user is logged in on the web
         if ($this->isLoggedIn()) {
             $authorized = true;
         } else {
             $output = "ERROR: Invalid or missing username and password.";
         }
     }
     if ($authorized) {
         $crawler = Crawler::getInstance();
         $crawler->crawl();
     }
     return $output;
 }

开发者ID:rgroves，项目名称:ThinkUp，代码行数:35，代码来源:class.CrawlerAuthController.php

示例16: perform

 function perform()
 {
     $ps = DB::prepare('INSERT INTO listings SET scraped=FALSE, code=:code, title=:title, link=:link, date=:date, price=:price, neighborhood=:neighborhood');
     array_map(function ($url) use($ps) {
         $code = substr($url, 30, 3);
         // SUPER brittle obvs
         $crawler = new Crawler(Guzzle::get($url)->getBody());
         $crawler->filter('.row > .txt')->each(function ($node) use($ps, $code) {
             try {
                 $a = $node->filter('.pl > a.hdrlnk');
                 $ps->execute([':code' => $code, ':title' => $a->text(), ':link' => $a->attr('href'), ':date' => strftime('%Y-%m-%d', strtotime($node->filter('.pl > .date')->text())), ':price' => ($n = $node->filter('.l2 > .price')) && $n->count() ? preg_replace('/\\D/', '', $n->text()) : null, ':neighborhood' => ($n = $node->filter('.l2 > .pnr > small')) && $n->count() ? $n->text() : null]);
             } catch (Exception $e) {
                 Logger::error($e->getMessage(), $ps->errorinfo());
             }
         });
     }, ['http://newyork.craigslist.org/nfa/', 'http://newyork.craigslist.org/roo/', 'http://newyork.craigslist.org/sub/']);
 }

开发者ID:dthtvwls，项目名称:padscraper，代码行数:17，代码来源:Crawl.php

示例17: testDevices

 public function testDevices()
 {
     $lines = file(__DIR__ . '/devices.txt');
     foreach ($lines as $line) {
         $test = Crawler::isCrawler($line);
         $this->assertEquals($test, false, $line);
     }
 }

开发者ID:halenharper，项目名称:Laravel-Crawler-Detect，代码行数:8，代码来源:UATests.php

示例18: testDevices

 public function testDevices()
 {
     $lines = file('https://raw.githubusercontent.com/JayBizzle/Crawler-Detect/master/tests/devices.txt');
     foreach ($lines as $line) {
         $test = Crawler::isCrawler($line);
         $this->assertEquals($test, false, $line);
     }
 }

开发者ID:jaybizzle，项目名称:laravel-crawler-detect，代码行数:8，代码来源:UATests.php

示例19: crawl_1_chapter

function crawl_1_chapter($url, $chapter)
{
    global $sitename;
    global $prefix;
    $c = new Crawler($url);
    $c->go_to('name="pagejump"');
    $pages = array();
    while ($line = $c->readline()) {
        if (Crawler::is_there($line, '<option')) {
            $pages[] = Crawler::extract($line, 'value="', '"');
        } else {
            if (Crawler::is_there($line, '</select>')) {
                break;
            }
        }
    }
    $c->go_to('id="nextpage"');
    $c->readline();
    $img = $c->getbetween('src="', '"');
    $c->close();
    $img_base = dirname($img);
    $ext = '.jpg';
    $chapter = Crawler::pad($chapter, 3);
    foreach ($pages as $page) {
        echo "<a href='{$img_base}/{$page}{$ext}'>{$prefix}-{$chapter}-{$page}{$ext}</a><br/>\n";
        flush();
    }
    //print_r($pages);flush();
}

开发者ID:JerryMaheswara，项目名称:crawler，代码行数:29，代码来源:mangashare.php

示例20: crawl_page

 public function crawl_page($url)
 {
     // crawl_page
     $c = new Crawler($url);
     // get title
     $c->go_to('<title>');
     $title = Crawler::extract($c->curline, 'PHD Comics: ', '</title>');
     $title = preg_replace('/\\W/', '_', $title);
     // get the date
     $c->go_to('date_left.gif');
     $c->readline(2);
     $line = $c->curline;
     preg_match('/([0-9]+)\\/([0-9]+)\\/([0-9]+)/mi', $line, $matches);
     //print_r($matches);flush();
     list($full, $month, $date, $year) = $matches;
     if (strlen($date) < 2) {
         $date = '0' . $date;
     }
     if (strlen($month) < 2) {
         $month = '0' . $month;
     }
     $fileprefix = "{$year}_{$month}_{$date}_{$title}";
     // get the img url
     $c->go2linewhere('<td bgcolor=#FFFFFF');
     $line = $c->curline;
     preg_match('/<img src=["\']?([^ ]+)["\']?/i', $line, $matches);
     $img = $matches[1];
     $filename = basename($img);
     $ext = substr($filename, strrpos($filename, '.'));
     echo "<a href='{$img}'>" . $fileprefix . $ext . "</a><br/>";
     flush();
     $c->close();
     unset($c);
 }

开发者ID:JerryMaheswara，项目名称:crawler，代码行数:34，代码来源:phdcomics.php

注：本文中的Crawler类示例整理自Github/MSDocs等源码及文档管理平台，相关代码片段筛选自各路编程大神贡献的开源项目，源码版权归原作者所有，传播和使用请参考对应项目的License；未经允许，请勿转载。

鲜花

握手

雷人

路过

鸡蛋

该文章已有0人参与评论

请发表评论

全部评论

专题导读

More+

10-27 六六分期app的软件客服如何联系？(六六分期

11-06 可心卡盟:win10系统火狐flash插件崩溃怎么

11-06 亲亲特价:怎么删除回收站图标

11-06 济南大学虚拟社区:鲁大师节能降温的具体办

11-06 xlueops.exe:无线网络安装向导

11-06 女斗合众国:win7系统cf与主机连接不稳定怎

11-06 0xc000022-[cf烟雾头]cf怎么调烟雾头

11-06 qizideyouhuo:应用程序无法正常启动0xc0000

11-06 ipz-185:win7系统vcf文件怎么打开

11-06 傻哥蹦迪:win10系统s4怎么打开usb调试

11-06 八神浩树gtaste:回收站清空了怎么恢复

11-06 妖尾之黑色守护:win10系统电脑没有1440x900

11-06 校园至尊魔王小说:win7系统浏览网页时字体

11-06 女斗合众国:win10系统访问共享文件夹提示请

11-06 tokyo hot n0654:恢复win7系统默认字体一招

11-06 雨酷仙境:设置win7系统转移临时文件夹腾出

11-06 阿穆纳伊之杖:win7系统开始菜单在右边还原

11-06 tunespotting:win10系统火狐flash插件总是

11-06 甘尔葛分析师：计谋网站seo关键词暴涨有什

11-06 蔡贵霖: 计谋网站seo关键词暴涨有什么秘密

11-06 博益网首页:ao3网页版进入不了解决方法

11-06 漏斗子专栏: 网站数据分析小白易懂精华篇

11-06 见证双虹怎么做:win7系统开启telnet命令的

11-06 颾狐蝶蜋:系统资源不足无法完成请求的服务

11-06 国光中学校歌:提交网站到alexa查询详细步骤

11-06 西安有情天:静态网页和动态网页的区别

11-06 红木雅尚斋:外部链接构造对网站的好处

11-06 前官礼遇：防止域名劫持–增强域安全性的10

11-06 密传二转答案: 中文分词算法有哪些

11-06 金泉家园邮编:百度快照劫持的表现及应对方

PHP CreateDocx类代码示例发布时间：2022-05-23

PHP Craft类代码示例发布时间：2022-05-23

bradtraversy/iweather: Ionic 3 mobile we

1 ejarnutowski/laravel-api-key: API Key Au

ejarnutowski/laravel-api-key: API Key Authorization for Laravel

阅读：823|2022-08-13

2 PacktPublishing/Python-Machine-Learning-

PacktPublishing/Python-Machine-Learning-Second-Edition: Python Machine Learning

阅读：937|2022-08-18

3 鲁东大学一米网:Win7系统USB驱动器RAM的操

win7系统电脑使用过程中有不少朋友表示遇到过win7系统USB驱动器RAM的状况，当出现win7

阅读：841|2022-11-06

4 VazkiiMods/AutoRegLib: Automatic registr

VazkiiMods/AutoRegLib: Automatic registry/loading library for minecraft mods.

阅读：452|2022-08-17

5 diehl/Incremental-SVM-Learning-in-MATLAB

diehl/Incremental-SVM-Learning-in-MATLAB: This MATLAB package implements the met

阅读：415|2022-08-17

6 lit-uriy/qt-material-stylesheet: QtWidge

lit-uriy/qt-material-stylesheet: QtWidgets stylesheet based on Google Material d

阅读：374|2022-08-17

7 urbanairship/java-library: Java client l

urbanairship/java-library: Java client library for the Urban Airship API

阅读：962|2022-08-15

8 elipapa/markdown-cv: a simple template t

elipapa/markdown-cv: a simple template to write your CV in a readable markdown f

阅读：491|2022-08-17

9 tboronczyk/localization-middleware: PSR-

tboronczyk/localization-middleware: PSR-15 middleware to assist primarily with l

阅读：501|2022-08-16

10 CVE-2022-36909

A missing permission check in Jenkins OpenShift Deployer Plugin 1.2.0 and earlie

阅读：687|2022-07-29

客服电话

电子邮件

PHP Crawler类代码示例

示例1: go

示例2: getHotSpots

示例3: crawl_1_page

示例4: login

示例5: testCrawl

示例6: crawlForNews

示例7: run

示例8: crawl_indowebster

示例9: mangareader_1_page

示例10: addCrawlMap

示例11: mangareader_1_page

示例12: handleCrawling

示例13: testFlickrCrawl

示例14: tearDown

示例15: control

示例16: perform

示例17: testDevices

示例18: testDevices

示例19: crawl_1_chapter

示例20: crawl_page

请发表评论

全部评论

上一篇：

下一篇：

HuangCongQing/UCAS_Course_2019: 中国科学

bradtraversy/iweather: Ionic 3 mobile we

joaomh/curso-de-matlab

断牙刷新位置时间（断牙属性及刷新位置介绍

rugk/mastodon-simplified-federation: Sim

bradtraversy/iweather: Ionic 3 mobile we

joaomh/curso-de-matlab

adafruit/Adafruit_VS1053_Library: This i

断牙刷新位置时间（断牙属性及刷新位置介绍

rugk/mastodon-simplified-federation: Sim

Tangshitao/Dense-Scene-Matching: Learnin

关于我们

产品与服务

解决方案

139-2527-9053