The metrics handler collects the API runtime information and reports it to Influxdb periodically (5 minutes to 15 minutes based on the volume of the API). A Grafana instance is hooked to Influxdb to output the metrics on dashboard from two different perspectives:
Client oriented - client centric info to show how many APIs to call and each API runtime info.
API oriented - API centric info to show how many clients is calling this API.
Here is an example of configuration.
# Metrics handler configuration
# If metrics handler is enabled or not
# If metrics handler is enable for JVM MBean or not
# influxdb protocal can be http, https
# influxdb hostname
# influxdb port number
# influxdb database name
# influxdb user
# influx db password
# report and reset metrics in minutes.
As you can see, there are two perspectives in collecting metrics info. The measurements started with clientId are client centric info and the measurements started with API name are API centric info.
Currently, we are collecting 5 metrics each.
This is the request count for client f7d42348-c647-4efb-a52d-4c5787421e72
The first column is timestamp and value column is the value for this time series. Other columns are tags and they
are endpoint, hostname/container id, ipAddress and version.
This is a request count for API swagger_petstore.
The first column is the timestamp and the value column is the value for this time series. Other columns are tags and they are endpoint, hostname/container id, ipAddress and version.
The default implementation is based on InfluxDB and Grafana which are the most popular combination
for docker containers. However, the database and dashboard can be replaced easily with another
reporter implementation. To replace it, change the MetricsHandler to use another reporter instead of
In the metrics.yml, you can choose reportInMinutes. This is a period the system caches and calculates statistical info about the request/response and reports it to the Influxdb. After the report, the data set is reset so that the collection is started from scratch again. Depending on the performance requirement for your services and your capacity of influxdb, you may choose a long report period or short report period. Short period like 5 minutes will generate more traffic and data to influxdb but reduce the chance of data loss when the server is restarting. Long period like 15 minutes will reduce the load to influxdb and you can keep more days’ data online in influxdb to study the long term trends of your services.