在线时间:8:00-16:00
迪恩网络APP
随时随地掌握行业动态
扫描二维码
关注迪恩网络微信公众号
开源软件名称:wikimedia/html-metadata开源软件地址:https://github.com/wikimedia/html-metadata开源编程语言:JavaScript 69.1%开源软件介绍:html-metadata
The aim of this library is to be a comprehensive source for extracting all html embedded metadata. Currently it supports Schema.org microdata using a third party library, a native BEPress, Dublin Core, Highwire Press, JSON-LD, Open Graph, Twitter, EPrints, PRISM, and COinS implementation, and some general metadata that doesn't belong to a particular standard (for instance, the content of the title tag, or meta description tags). Planned is support for RDFa, AGLS, and other yet unheard of metadata types. Contributions and requests for other metadata types welcome! Install
UsagePromise-based: var scrape = require('html-metadata');
var url = "http://blog.woorank.com/2013/04/dublin-core-metadata-for-seo-and-usability/";
scrape(url).then(function(metadata){
console.log(metadata);
}); Callback-based: var scrape = require('html-metadata');
var url = "http://blog.woorank.com/2013/04/dublin-core-metadata-for-seo-and-usability/";
scrape(url, function(error, metadata){
console.log(metadata);
}); The scrape method used here invokes the parseAll() method, which uses all the available methods registered in method metadataFunctions(), and are available for use separately as well, for example: Promise-based: var cheerio = require('cheerio');
var preq = require('preq'); // Promisified request library
var parseDublinCore = require('html-metadata').parseDublinCore;
var url = "http://blog.woorank.com/2013/04/dublin-core-metadata-for-seo-and-usability/";
preq(url).then(function(response){
$ = cheerio.load(response.body);
return parseDublinCore($).then(function(metadata){
console.log(metadata);
});
}); Callback-based: var cheerio = require('cheerio');
var request = require('request');
var parseDublinCore = require('html-metadata').parseDublinCore;
var url = "http://blog.woorank.com/2013/04/dublin-core-metadata-for-seo-and-usability/";
request(url, function(error, response, html){
$ = cheerio.load(html);
parseDublinCore($, function(error, metadata){
console.log(metadata);
});
}); Options object: You can also pass an options object as the first argument containing extra parameters. Some websites require the user-agent or cookies to be set in order to get the response.
The method parseGeneral obtains the following general metadata: <link rel="apple-touch-icon" href="" sizes="" type="">
<link rel="icon" href="" sizes="" type="">
<meta name="author" content="">
<link rel="author" href="">
<link rel="canonical" href="">
<meta name ="description" content="">
<link rel="publisher" href="">
<meta name ="robots" content="">
<link rel="shortlink" href="">
<title></title>
<html lang="en">
<html dir="rtl"> Tests
ContributingContributions welcome! All contibutions should use bluebird promises instead of callbacks, and be .nodeify()-ed in index.js so the functions can be used as either callbacks or Promises. |
2023-10-27
2022-08-15
2022-08-17
2022-09-23
2022-08-13
请发表评论