An IPFS plugin to export additional metrics and enable real-time analysis of Bitswap traffic.
Building
You must build both the plugin and the host go-ipfs from the same sources, using the same compiler.
The recommended way of building this plugin with a matching go-ipfs version is within a Docker build environment.
Alternatively, see below for background information on the process and manual build instructions.
Docker
You can build this project together with a matching go-ipfs executable within Docker.
This is nice, because you get reproducible, matching binaries, compiled with Go 1.16 on Debian bullseye.
Building on bullseye gives us a libc version which is a bit older.
This gives us compatibility with slightly older systems (e.g. Ubuntu LTS releases), at no loss of functionality.
The builder Dockerfile implements a builder stage.
The resulting binaries are placed in /usr/local/bin/ipfs/ inside the image.
The build-in-docker.sh script executes the builder and copies the produced binaries to the out/ directory of the project.
Manually
Due to a bug in the Go compiler it is not possible to build plugins
correctly using Go 1.17.
You need to use Go 1.16 to build both this plugin and the IPFS binary.
This is an internal plugin, which needs to be built against the sources that produced the ipfs binary this plugin will
plug into.
The ipfs binary and this plugin must be built from/against the same IPFS sources, using the same version of the Go
compiler.
We build and run against go-ipfs v0.12.0, using Go 1.16 due to aforementioned bug in 1.17.
You can build against either
the official, online go-ipfs source (and recompile IPFS with Go 1.16) or
a local fork, in which case you need to a replace directive to the go.mod file.
There is a Makefile which does a lot of this for you.
It respects the IPFS_PATH and IPFS_VERSION variables, which are otherwise set to sensible defaults.
Use make build to build the plugin and make install to copy the compiled plugin to the
IPFS plugin directory.
Manually, building with go build -buildmode=plugin -o mexport.so should also work.
This will produce a mexport.so library which needs to be placed in the IPFS plugin directory, which is
$IPFS_PATH/plugins by default.
Configuration
This plugin can be configured using the usual IPFS configuration.
The plugin periodically goes through the node's peer store, open connections, and open streams to collect various metrics about them.
These metrics are then pushed to Prometheus.
This value controls the interval at which this is done, specified in seconds.
Prometheus itself only scrapes (by default) every 15 seconds, so very small values are probably not useful.
The default is ten seconds.
AgentVersionCutOff
The plugin collects metrics about the agent versions of connected peers.
This value configures a cutoff for how many agent version strings should be reported to prometheus.
The remainder (everything that doesn't fit within the cutoff) is summed and reported as others to prometheus.
TCPServerConfig
This configures the TCP server used to export a pubsub mechanism for Bitswap monitoring in real time.
If this section is missing or null, the TCP server will not be started.
Bitswap monitoring is performed regardless.
See below on how the TCP server works.
The ListenAddresses field configures the endpoints on which to listen.
HTTPServerConfig
This configures the HTTP server used to expose an RPC API.
The API is located at /metric_plugin/v1 and returns JSON-encoded messages.
See below for a list of methods.
The ListenAddresses field configures the endpoints on which to listen.
Running
In order to run with the plugin, you need to
Copy the compiled plugin to the IPFS plugin directory (which may need to be created).
Edit the IPFS configuration to configure the plugin.
Launch the matching go-ipfs in daemon mode.
If you plan on running a large monitoring node as per this paper it is recommended
to increase the limit of file descriptors for the IPFS daemon.
The IPFS daemon attempts to do this on its own, but only up to 8192 FDs by default.
This is controllable via the IPFS_FD_MAX environment variable.
Setting IPFS_FD_MAX="100000" should be sufficient.
Logs
This plugin uses the usual IPFS logging API.
To see logs produced by this plugin, either set an appropriate global log level (the default is error):
IPFS_LOGGING="info" ipfs daemon | grep -a "metric-export"
Or, after having started the daemon, configure just this component to emit logs at a certain level or above:
ipfs daemon&
ipfs log level metric-export info
ipfs log tail # or something else?
The TCP Server for Real-Time Bitswap Monitoring
This plugin comes with a TCP server that pushes Bitswap monitoring messages to clients.
The protocol consists of 4-byte, big endian, framed, gzipped, JSON-encoded messages.
Thus, on the wire, each message looks like this:
<size of following message in bytes, 4 bytes big endian><gzipped JSON-encoded message>
Each message is gzipped individually, i.e., the state of the encoder is reset for each message.
There is a client implementation in Rust which works with this.
A connection starts with an uncompressed handshake, during which both sides send a version message of this form:
// A version message, exchanged between client and server once, immediately// after the connection is established.typeversionMessagestruct {
Versionint`json:"version"`
}
This is JSON-encoded and framed.
Both sides verify that the version matches.
If there is a mismatch, the connection is closed.
The current version of the API, as described here, is 3.
After the handshake succeeds, clients are automatically subscribed to all Bitswap monitoring messages.
There is a backpressure mechanism: Slow clients will not receive all messages.
Changelog
Version 1 is the initial version of the format.
It contains the framed messages, requests, and responses.
Version 2 introduces block presences (see the Bitswap spec) to pushed Bitswap messages.
Version 3 introduces gzipping of individual messages and removes all API functionality.
Clients are now automatically subscribed.
Messages from Plugin -> Client
Messages originating from this plugin have the following format:
// The type of messages sent out via TCP.typeoutgoingTCPMessagestruct {
// If Event is not nil, this message is a pushed event.Event*event`json:"event,omitempty"`
}
A client is, by default, subscribed to events emitted by this plugin.
Events sent by this plugin are of this format:
// The type sent to via TCP for pushed events.typeeventstruct {
// The timestamp at which the event was recorded.// This defines an ordering for events.Timestamp time.Time`json:"timestamp"`// Peer is a base58-encoded string representation of the peer ID.Peerstring`json:"peer"`// BitswapMessage is not nil if this event is a bitswap message.BitswapMessage*BitswapMessage`json:"bitswap_message,omitempty"`// ConnectionEvent is not nil if this event is a connection event.ConnectionEvent*ConnectionEvent`json:"connection_event,omitempty"`
}
The BitswapMessage and ConnectionEvent structs are specified in metricplugin/api.go.
The HTTP Server
The HTTP server exposes an RPC-like API via HTTP.
Successful responses are always JSON-encoded and returned with HTTP code 200.
Unsuccessful requests are indicated by HTTP status codes other than 2xx and may return an error message.
The response format looks like this:
// A JSONResponse is the format for every response returned by the HTTP server.typeJSONResponsestruct {
Statusint`json:"status"`Resultinterface{} `json:"result,omitempty"`Err*string`json:"error,omitempty"`
}
Methods
Methods are identified by their HTTP path, which always begins with /metric_plugin/v1.
The following methods are implemented:
GET /ping
This is a no-op which returns an empty struct.
GET /monitoring_addresses
Returns a list of TCP endpoints on which the plugin is listening for Bitswap monitoring subscriptions.
If the TCP server is not enabled, this returns an empty list.
Returns a struct of this format:
These methods initiate a WANT or CANCEL broadcast to be sent via Bitswap, respectively.
They each expect a list of CIDs, JSON-encoded in a struct of this form:
This method performs a Bitswap WANT broadcast followed by a CANCEL broadcast, to the same set of peers, after a given number of seconds.
This is useful because each broadcast individually takes a while, which makes it difficult to enforce the correct timing between WANT and CANCEL broadcasts from the perspective of an API client.
请发表评论