By default, counters, histograms, and summaries export an additional series
suffixed with _created and a value of the unix timestamp for when the metric
was created. If this information is not helpful, it can be disabled by setting
the environment variable PROMETHEUS_DISABLE_CREATED_SERIES=true.
gauge.setToCurrentTime(); // Set to current unixtime.
As an advanced use case, a Gauge can also take its value from a callback by using the
setChild()
method. Keep in mind that the default inc(), dec() and set() methods on Gauge take care of thread safety, so
when using this approach ensure the value you are reporting accounts for concurrency.
Summary
Summaries and Histograms can both be used to monitor distributions, like latencies or request sizes.
The Summary class provides different utility methods for observing values, like observe(double), startTimer(); timer.observeDuration(), time(Callable), etc.
By default, Summary metrics provide the count and the sum. For example, if you measure latencies of a REST service, the count will tell you how often the REST service was called, and the sum will tell you the total aggregated response time. You can calculate the average response time using a Prometheus query dividing sum / count.
In addition to count and sum, you can configure a Summary to provide quantiles:
SummaryrequestLatency = Summary.build()
.name("requests_latency_seconds")
.help("Request latency in seconds.")
.quantile(0.5, 0.01) // 0.5 quantile (median) with 0.01 allowed error
.quantile(0.95, 0.005) // 0.95 quantile with 0.005 allowed error// ...
.register();
As an example, a 0.95 quantile of 120ms tells you that 95% of the calls were faster than 120ms, and 5% of the calls were slower than 120ms.
Tracking exact quantiles require a large amount of memory, because all observations need to be stored in a sorted list. Therefore, we allow an error to significantly reduce memory usage.
In the example, the allowed error of 0.005 means that you will not get the exact 0.95 quantile, but anything between the 0.945 quantile and the 0.955 quantile.
Experiments show that the Summary typically needs to keep less than 100 samples to provide that precision, even if you have hundreds of millions of observations.
There are a few special cases:
You can set an allowed error of 0, but then the Summary will keep all observations in memory.
You can track the minimum value with .quantile(0, 0). This special case will not use additional memory even though the allowed error is 0.
You can track the maximum value with .quantile(1, 0). This special case will not use additional memory even though the allowed error is 0.
Typically, you don't want to have a Summary representing the entire runtime of the application, but you want to look at a reasonable time interval. Summary metrics implement a configurable sliding time window:
The default buckets are intended to cover a typical web/rpc request from milliseconds to seconds.
They can be overridden with the buckets() method on the Histogram.Builder.
There are utilities for timing code:
classYourClass {
staticfinalHistogramrequestLatency = Histogram.build()
.name("requests_latency_seconds").help("Request latency in seconds.").register();
voidprocessRequest(Requestreq) {
requestLatency.time(newRunnable() {
publicabstractvoidrun() {
// Your code here.
}
});
// Or the Java 8 lambda equivalentrequestLatency.time(() -> {
// Your code here.
});
}
}
Labels
All metrics can have labels, allowing grouping of related time series.
Using the default registry with variables that are static is ideal since registering a metric with the same name
is not allowed and the default registry is also itself static. You can think of registering a metric, more like
registering a definition (as in the TYPE and HELP sections). The metric 'definition' internally holds the samples
that are reported and pulled out by Prometheus. Here is an example of registering a metric that has no labels.
To create timeseries with labels, include labelNames() with the builder. The labels() method looks up or creates
the corresponding labelled timeseries. You might also consider storing the labelled timeseries as an instance variable if it is
appropriate. It is thread safe and can be used multiple times, which can help performance.
Exemplars are a feature of the OpenMetrics format that allows applications to link metrics to example traces.
In order to see exemplars, you need to set the Accept header for the OpenMetrics format like this:
The metric builders for Counter and Histogram have methods for setting the exemplar sampler for that individual metric. This takes precedence over the global setting in ExemplarConfig.
The following calls enable the default exemplar sampler for individual metrics. This is useful if you disabled the exemplar sampler globally with ExemplarConfig.disableExemplars().
All methods for observing and incrementing values have ...withExemplar equivalents. There are versions taking the exemplar labels as a String... as shown in the example, as well as versions taking the exemplar labels as a Map<String, String>.
Built-in Support for Tracing Systems
The DefaultExemplarSampler detects if a tracing library is found on startup, and provides exemplars for that tracing library by default. Currently, only OpenTelemetry tracing is supported.
If you are a tracing vendor, feel free to open a PR and add support for your tracing library.
Documentation of the individual tracer integrations:
The Java client includes collectors for garbage collection, memory pools, classloading, and thread counts.
These can be added individually or just use the DefaultExports to conveniently register them.
DefaultExports.initialize();
Logging
There are logging collectors for log4j, log4j2 and logback.
To register the Logback collector can be added to the root level like so:
If you are using Hibernate in a JPA environment and only have access to the EntityManager
or EntityManagerFactory, you can use this code to access the underlying SessionFactory:
请发表评论