Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
142 views
in Technique[技术] by (71.8m points)

html - Web scraping in PHP

I'm looking for a way to make a small preview of another page from a URL given by the user in PHP.

I'd like to retrieve only the title of the page, an image (like the logo of the website) and a bit of text or a description if it's available. Is there any simple way to do this without any external libraries/classes? Thanks

So far I've tried using the DOCDocument class, loading the HTML and displaying it on the screen, but I don't think that's the proper way to do it

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I recommend you consider simple_html_dom for this. It will make it very easy.

Here is a working example of how to pull the title, and first image.

<?php
require 'simple_html_dom.php';

$html = file_get_html('http://www.google.com/');
$title = $html->find('title', 0);
$image = $html->find('img', 0);

echo $title->plaintext."<br>
";
echo $image->src;
?>

Here is a second example that will do the same without an external library. I should note that using regex on HTML is NOT a good idea.

<?php
$data = file_get_contents('http://www.google.com/');

preg_match('/<title>([^<]+)</title>/i', $data, $matches);
$title = $matches[1];

preg_match('/<img[^>]*src=['"]([^'"]+)['"][^>]*>/i', $data, $matches);
$img = $matches[1];

echo $title."<br>
";
echo $img;
?>

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...