• 设为首页
  • 点击收藏
  • 手机版
    手机扫一扫访问
    迪恩网络手机版
  • 关注官方公众号
    微信扫一扫关注
    迪恩网络公众号

Corion/www-mechanize-firefox: The API of WWW::Mechanize, combined with the Javas ...

原作者: [db:作者] 来自: 网络 收藏 邀请

开源软件名称:

Corion/www-mechanize-firefox

开源软件地址:

https://github.com/Corion/www-mechanize-firefox

开源编程语言:

Perl 67.9%

开源软件介绍:

Travis Build Status AppVeyor Build Status

NAME

WWW::Mechanize::Firefox - use Firefox as if it were WWW::Mechanize

SYNOPSIS

use WWW::Mechanize::Firefox;
my $mech = WWW::Mechanize::Firefox->new();
$mech->get('http://google.com');

$mech->eval_in_page('alert("Hello Firefox")');
my $png = $mech->content_as_png();

This module will let you automate Firefox through the Mozrepl plugin. You need to have installed that plugin in your Firefox.

For more examples see WWW::Mechanize::Firefox::Examples.

IMPORTANT NOTICE

The Mozrepl plugin that this module uses no longer works due to key technologies it depends on being retired from the Mozilla platform in November 2017.

According the github repo https://github.com/bard/mozrepl, the last known compatible version is Firefox 54.

Therefore this module cannot be used on Firefox versions greather than 54.

CONSTRUCTOR and CONFIGURATION

$mech->new( %args )

use WWW::Mechanize::Firefox;
my $mech = WWW::Mechanize::Firefox->new();

Creates a new instance and connects it to Firefox.

Note that Firefox must have the mozrepl extension installed and enabled.

The following options are recognized:

  • tab - regex for the title of the tab to reuse. If no matching tab is found, the constructor dies.

    If you pass in the string current, the currently active tab will be used instead.

    If you pass in a MozRepl::RemoteObject instance, this will be used as the new tab. This is convenient if you have an existing tab in Firefox as object already, for example created through Firefox::Application->addTab().

  • create - will create a new tab if no existing tab matching the criteria given in tab can be found.

  • activate - make the tab the active tab

  • launch - name of the program to launch if we can't connect to it on the first try.

  • frames - an array reference of ids of subframes to include when searching for elements on a page.

    If you want to always search through all frames, just pass 1. This is the default.

    To prevent searching through frames, pass

            frames => 0
    

    To whitelist frames to be searched, pass the list of frame selectors:

            frames => ['#content_frame']
    
  • autodie - whether web failures converted are fatal Perl errors. See the autodie accessor. True by default to make error checking easier.

    To make errors non-fatal, pass

      autodie => 0
    

    in the constructor.

  • agent - the name of the User Agent to use. This overrides how Firefox identifies itself.

  • log - array reference to log levels, passed through to MozRepl::RemoteObject

  • bufsize - Net::Telnet buffer size, if the default of 1MB is not enough

  • events - the set of default Javascript events to listen for while waiting for a reply. In fact, WWW::Mechanize::Firefox will almost always wait until a 'DOMContentLoaded' or 'load' event. 'pagehide' events will tell it for what frames to wait.

    The default set is

      'DOMContentLoaded','load',
      'pageshow',
      'pagehide',
      'error','abort','stop',
    
  • app - a premade Firefox::Application

  • repl - a premade MozRepl::RemoteObject instance or a connection string suitable for initializing one

  • use_queue - whether to use the command queueing of MozRepl::RemoteObject. Default is 1.

  • js_JSON - whether to use native JSON encoder of Firefox

      js_JSON => 'native', # force using the native JSON encoder
    

    The default is to autodetect whether a native JSON encoder is available and whether the transport is UTF-8 safe.

  • pre_events - the events that are sent to an input field before its value is changed. By default this is [focus].

  • post_events - the events that are sent to an input field after its value is changed. By default this is [blur, change].

$mech->agent( $product_id );

$mech->agent('wonderbot/JS 1.0');

Set the product token that is used to identify the user agent on the network. The agent value is sent as the "User-Agent" header in the requests. The default is whatever Firefox uses.

To reset the user agent to the Firefox default, pass an empty string:

$mech->agent('');

$mech->autodie( [$state] )

$mech->autodie(0);

Accessor to get/set whether warnings become fatal.

$mech->events()

$mech->events( ['load'] );

Sets or gets the set of Javascript events that WWW::Mechanize::Firefox will wait for after requesting a new page. Returns an array reference.

Changing the set of events will most likely make WWW::Mechanize::Firefox stall while waiting for a response.

This method is special to WWW::Mechanize::Firefox.

$mech->on_event()

$mech->on_event(1); # prints every page load event

# or give it a callback
$mech->on_event(sub { warn "Page loaded with $ev->{name} event" });

Gets/sets the notification handler for the Javascript event that finished a page load. Set it to 1 to output via warn, or a code reference to call it with the event.

This method is special to WWW::Mechanize::Firefox.

$mech->cookies()

my $cookie_jar = $mech->cookies();

Returns a HTTP::Cookies object that was initialized from the live Firefox instance.

Note: ->set_cookie is not yet implemented, as is saving the cookie jar.

JAVASCRIPT METHODS

$mech->allow( %options )

Enables or disables browser features for the current tab. The following options are recognized:

  • plugins - Whether to allow plugin execution.
  • javascript - Whether to allow Javascript execution.
  • metaredirects - Attribute stating if refresh based redirects can be allowed.
  • frames, subframes - Attribute stating if it should allow subframes (framesets/iframes) or not.
  • images - Attribute stating whether or not images should be loaded.

Options not listed remain unchanged.

Disable Javascript

$mech->allow( javascript => 0 );

$mech->js_errors()

print $_->{message}
    for $mech->js_errors();

An interface to the Javascript Error Console

Returns the list of errors in the JEC

Maybe this should be called js_messages or js_console_messages instead.

$mech->clear_js_errors()

$mech->clear_js_errors();

Clears all Javascript messages from the console

$mech->eval_in_page( $str [, $env [, $document]] )

$mech->eval( $str [, $env [, $document]] )

my ($value, $type) = $mech->eval( '2+2' );

Evaluates the given Javascript fragment in the context of the web page. Returns a pair of value and Javascript type.

This allows access to variables and functions declared "globally" on the web page.

The returned result needs to be treated with extreme care because it might lead to Javascript execution in the context of your application instead of the context of the webpage. This should be evident for functions and complex data structures like objects. When working with results from untrusted sources, you can only safely use simple types like string.

If you want to modify the environment the code is run under, pass in a hash reference as the second parameter. All keys will be inserted into the this object as well as this.window. Also, complex data structures are only supported if they contain no objects. If you need finer control, you'll have to write the Javascript yourself.

This method is special to WWW::Mechanize::Firefox.

Also, using this method opens a potential security risk as the returned values can be objects and using these objects can execute malicious code in the context of the Firefox application.

$mech->unsafe_page_property_access( ELEMENT )

Allows you unsafe access to properties of the current page. Using such properties is an incredibly bad idea.

This is why the function dies. If you really want to use this function, edit the source code.

UI METHODS

See also Firefox::Application for how to add more than one tab and how to manipulate windows and tabs.

$mech->application()

my $ff = $mech->application();

Returns the Firefox::Application object for manipulating more parts of the Firefox UI and application.

$mech->autoclose_tab

$mech->autoclose_tab( 0 ); # keep tab open after program end

Set whether to close the tab associated with the instance.

$mech->tab()

Gets the object that represents the Firefox tab used by WWW::Mechanize::Firefox.

This method is special to WWW::Mechanize::Firefox.

$mech->make_progress_listener( %callbacks )

my $eventlistener = $mech->progress_listener(
    onStateChange => \&onStateChange,
);

Creates an unconnected nsIWebProgressListener interface which calls the Perl subroutines you pass in.

Returns a handle. Once the handle gets released, all callbacks will get stopped. Also, all Perl callbacks will get deregistered from the Javascript bridge, so make sure not to use the same callback in different progress listeners at the same time. The sender may still call your callbacks.

$mech->progress_listener( $source, %callbacks )

my $eventlistener = progress_listener(
    $browser,
    onLocationChange => \&onLocationChange,
);

Sets up the callbacks for the nsIWebProgressListener interface to be the Perl subroutines you pass in.

$source needs to support .addProgressListener and .removeProgressListener.

Returns a handle. Once the handle gets released, all callbacks will get stopped. Also, all Perl callbacks will get deregistered from the Javascript bridge, so make sure not to use the same callback in different progress listeners at the same time.

$mech->repl()

my ($value,$type) = $mech->repl->expr('2+2');

Gets the MozRepl::RemoteObject instance that is used.

This method is special to WWW::Mechanize::Firefox.

$mech->highlight_node( @nodes )

my @links = $mech->selector('a');
$mech->highlight_node(@links);

Convenience method that marks all nodes in the arguments with

background: red;
border: solid black 1px;
display: block; /* if the element was display: none before */

This is convenient if you need visual verification that you've got the right nodes.

There currently is no way to restore the nodes to their original visual state except reloading the page.

NAVIGATION METHODS

$mech->get( $url, %options )

$mech->get( $url, ':content_file' => $tempfile );

Retrieves the URL URL into the tab.

It returns a faked HTTP::Response object for interface compatibility with WWW::Mechanize.

Recognized options:

  • :content_file - filename to store the data in

  • no_cache - if true, bypass the browser cache

  • synchronize - wait until all elements have loaded

    The default is to wait until all elements have loaded. You can switch this off by passing

      synchronize => 0
    

    for example if you want to manually poll for an element that appears fairly early during the load of a complex page.

$mech->get_local( $filename , %options )

$mech->get_local('test.html');

Shorthand method to construct the appropriate file:// URI and load it into Firefox. Relative paths will be interpreted as relative to $0.

This method accepts the same options as ->get().

This method is special to WWW::Mechanize::Firefox but could also exist in WWW::Mechanize through a plugin.

Options:

  • basedir - a reference directory to use instead of dirname($0)

$mech->post( $url, %options )

$mech->post( 'http://example.com',
    params => { param => "Hello World" },
    headers => {
      "Content-Type" => 'application/x-www-form-urlencoded',
    },
    charset => 'utf-8',
);

Sends a POST request to $url.

A Content-Length header will be automatically calculated if it is not given.

The following options are recognized:

  • headers - a hash of HTTP headers to send. If not given, the content type will be generated automatically.
  • data - the raw data to send, if you've encoded it already.

$mech->add_header( $name => $value, ... )

$mech->add_header(
    'X-WWW-Mechanize-Firefox' => "I'm using it",
    Encoding => 'text/klingon',
);

This method sets up custom headers that will be sent with every HTTP(S) request that Firefox makes.

Using multiple instances of WWW::Mechanize::Firefox objects with the same application together with changed request headers will most likely have weird effects. So don't do that.

Note that currently, we only support one value per header.

Some versions of Firefox don't work with the method that is used to set the custom headers. Please see t/60-mech-custom-headers.t for the exact versions where the implemented mechanism doesn't work. Roughly, this is for versions 17 to 24 of Firefox.

$mech->delete_header( $name , $name2... )

$mech->delete_header( 'User-Agent' );

Removes HTTP headers from the agent's list of special headers. Note that Firefox may still send a header with its default value.

$mech->reset_headers

$mech->reset_headers();

Removes all custom headers and makes Firefox send its defaults again.

$mech->synchronize( $event, $callback )

Wraps a synchronization semaphore around the callback and waits until the event $event fires on the browser. If you want to wait for one of multiple events to occur, pass an array reference as the first parameter.

Usually, you want to use it like this:

my $l = $mech->xpath('//a[@onclick]', single => 1);
$mech->synchronize('DOMFrameContentLoaded', sub {
    $mech->click( $l );
});

It is necessary to synchronize with the browser whenever a click performs an action that takes longer and fires an event on the browser object.

The DOMFrameContentLoaded event is fired by Firefox when the whole DOM and all iframes have been loaded. If your document doesn't have frames, use the DOMContentLoaded event instead.

If you leave out $event, the value of ->events() will be used instead.

$mech->res() / $mech->response(%options)

my $response = $mech->response(headers => 0);

Returns the current response as a HTTP::Response object.

The headers option tells the module whether to fetch the headers from Firefox or not. This is mainly an internal optimization hack.

$mech->success()

$mech->get('http://google.com');
print "Yay"
    if $mech->success();

Returns a boolean telling whether the last request was successful. If there hasn't been an operation yet, returns false.

This is a convenience function that wraps $mech->res->is_success.

$mech->status()

$mech->get('http://google.com');
print $mech->status();
# 200

Returns the HTTP status code of the response. This is a 3-digit number like 200 for OK, 404 for not found, and so on.

$mech->reload( [$bypass_cache] )

$mech->reload();

Reloads the current page. If $bypass_cache is a true value, the browser is not allowed to use a cached page. This is the difference between pressing F5 (cached) and shift-F5 (uncached).

Returns the (new) response.

$mech->back( [$synchronize] )

$mech->back();

Goes one page back in the page history.

Returns the (new) response.

$mech->forward( [$synchronize] )

$mech->forward();

Goes one page forward in the page history.

Returns the (new) response.

$mech->uri()

print "We are at " . $mech->uri;

Returns the current document URI.

CONTENT METHODS

$mech->document()

Returns the DOM document object.

This is WWW::Mechanize::Firefox specific.

$mech->docshell()

my $ds = $mech->docshell;

Returns the docShell Javascript object associated with the tab.

This is WWW::Mechanize::Firefox specific.

$mech->content( %options )

print $mech->content;
print $mech->content( format => 'html' ); # default
print $mech->content( format => 'text' ); # identical to ->text

This always returns the content as a Unicode string. It tries to decode the raw content according to its input encoding. This currently only works for HTML pages, not for images etc.

Recognized options:

  • document - the document to use.

    Default is $self->document.

  • format - the stuff to return

    The allowed values are html and text. The default is html.

$mech->text()

Returns the text of the current HTML content. If the content isn't HTML, $mech will die.

$mech->content_encoding()

print "The content is encoded as ", $mech->content_encoding;

Returns the encoding that the content is in. This can be used to convert the content from UTF-8 back to its native encoding.

$mech->update_html( $html )

$mech->update_html($html);

Writes $html into the current document. This is mostly implemented as a convenience method for HTML::Display::MozRepl.

$mech->save_content( $localname [, $resource_directory] [, %options ] )

$mech->get('http://google.com');
$mech->save_content('google search page','google search page files');

Saves the


鲜花

握手

雷人

路过

鸡蛋
该文章已有0人参与评论

请发表评论

全部评论

专题导读
热门推荐
阅读排行榜

扫描微信二维码

查看手机版网站

随时了解更新最新资讯

139-2527-9053

在线客服(服务时间 9:00~18:00)

在线QQ客服
地址:深圳市南山区西丽大学城创智工业园
电邮:jeky_zhao#qq.com
移动电话:139-2527-9053

Powered by 互联科技 X3.4© 2001-2213 极客世界.|Sitemap