Performance instrumentation¶
Overview¶
Archivematica includes features to collect metrics related to processing and performance over time. When configured to do so, Archivematica and Archivematica Storage Service will export metrics over HTTP, in a format that can be scraped by tools like Prometheus for storage and analysis. This data can then be visualized using a tool like Grafana.
Instrumentation provides short term, near real time insight into how an Archivematica pipeline is performing.
On this page:
Prometheus¶
Prometheus is an open source tool for scraping, storing, and querying time series data. Each metric has a name and a series of values over time.
One example of a metric commonly used for Archivematica is AIPs Stored
Count, or the number of AIPs stored over time. The value would be 0
when
the pipeline is first created, increasing to 1
after the first AIP is
stored.
There are a few things to be aware of when using Prometheus with Archivematica:
- The Prometheus web interface is not authenticated; by default it runs on port
9090 and is open to anyone. Access should be restricted from the public
internet via firewall. Likewise, the metrics endpoints on Archivematica are
unauthenticated. If your Archivematica instance is publicly available over the
internet, you should restrict access to
/metrics
URLs via firewall. - Prometheus scrapes data from Archivematica at a configured interval, and doesn’t know when value changes have occurred between scrapes. If the scrape interval is set to 15 seconds, that will be the value you will see in the data, even if the previous scrape interval was 60 seconds.
- Prometheus is intended for short term data storage. By default it will store
metrics for 15 days. To change this, pass a larger value (e.g.
60d
) to--storage.tsdb.retention.time
when starting Prometheus.
For more information about configuring Prometheus, please see the Prometheus documentation.
Grafana¶
Grafana is an open source tool for visualising data. It is highly configurable and provides many options to visualise the data stored in Prometheus in useful ways. Grafana can be configured nearly any way you like; experimenting with new dashboards can be done via the web interface and is encouraged. The default Archivematica dashboard we provide is only intended as a starting point.
Available metrics¶
Archivematica makes use of various Prometheus metric types. Counters are numeric values that only ever increase. Gauges are numeric values that can increase or decrease. Histograms and summaries provide both a count and a sum of values, and can be broken out over quantiles (often thought of as percentiles, so a 0.95 quantile is the 95th percentile).
MCP Client metrics¶
mcpclient_job_total
(Counter): number of jobs processed, labeled by script.mcpclient_job_success_timestamp
(Gauge): timestamp of most recent job processed, labeled by script.mcpclient_job_error_total
(Counter): number of failures processing jobs, labeled by script.mcpclient_job_error_timestamp
(Gauge): timestamp of most recent job failure, labeled by script.mcpclient_task_execution_time_seconds
(Histogram): histogram of worker task execution times in seconds, labeled by script.mcpclient_gearman_sleep_time_seconds
(Counter): total worker sleep after gearman error times in seconds.mcpclient_transfer_started_total
(Counter): number of Transfers started, by transfer type.mcpclient_transfer_started_timestamp
(Gauge): timestamp of most recent transfer started, by transfer type.mcpclient_transfer_completed_total
(Counter): number of Transfers completed, by transfer type.mcpclient_transfer_completed_timestamp
(Gauge): timestamp of most recent transfer completed, by transfer type.mcpclient_transfer_error_total
(Counter): number of transfer failures, by transfer type, error type.mcpclient_transfer_error_timestamp
(Gauge): timestamp of most recent transfer failure, by transfer type, error type.mcpclient_transfer_files
(Histogram): histogram of number of files included in transfers, by transfer type.mcpclient_transfer_size_bytes
(Histogram): histogram of number of bytes in transfers, by transfer type.mcpclient_sip_started_total
(Counter): number of SIPs started.mcpclient_sip_started_timestamp
(Gauge): timestamp of most recent SIP started.mcpclient_sip_error_total
(Counter): number of SIP failures, by error type.mcpclient_sip_error_timestamp
(Gauge): timestamp of most recent SIP failure, by error type.mcpclient_aips_stored_total
(Counter): number of AIPs stored.mcpclient_aips_stored_timestamp
(Gauge): timestamp of most recent AIP stored.mcpclient_dips_stored_total
(Counter): number of DIPs stored.mcpclient_dips_stored_timestamp
(Gauge): timestamp of most recent DIP stored.mcpclient_aip_processing_seconds
(Histogram): histogram of AIP processing time, from first file recorded in DB to storage in SS.mcpclient_dip_processing_seconds
(Histogram): histogram of DIP processing time, from first file recorded in DB to storage in SS.mcpclient_aip_files_stored
(Histogram): histogram of number of files stored in AIPs. Note, this includes metadata, derivatives, etc.mcpclient_dip_files_stored
(Histogram): histogram of number of files stored in DIPs.mcpclient_aip_size_bytes
(Histogram): histogram of number of bytes stored in AIPs. Note, this includes metadata, derivatives, etc.mcpclient_dip_size_bytes
(Histogram): histogram of number of bytes stored in DIPs. Note, this includes metadata, derivatives, etc.mcpclient_aip_files_stored_by_file_group_and_format_total
(Counter): number of original files stored in AIPs labeled by file group, format name. Note: format labels are intentionally not zeroed, so be aware of that when querying.mcpclient_aip_original_file_timestamps
(Histogram): histogram of modification times for files stored in AIPs, bucketed by year.archivematica_info
(Info): Archivematica version information.environment_info
(Info): environment variable (configuration) information.
MCP Server metrics¶
mcpserver_gearman_active_jobs
(Gauge): Number of gearman jobs currently being processed.mcpserver_gearman_pending_jobs
(Gauge): Number of gearman jobs pending submission.mcpserver_task_total
(Counter): Number of tasks processed, labeled by task group, task name.mcpserver_task_error_total
(Counter): Number of failures processing tasks, labeled by task group, task name.mcpserver_task_success_timestamp
(Gauge): Most recent successfully processed task, labeled by task group, task name.mcpserver_task_error_timestamp
(Gauge): Most recent failure when processing a task, labeled by task group, task name.mcpserver_task_duration_seconds
(Histogram): Histogram of task processing durations in seconds, labeled by task group, task name, script name.archivematica_info
(Info): Archivematica version information.environment_info
(Info): Environment variable (configuration) information.
Dashboard metrics¶
The dashboard includes a large number of metrics provided by the
django-prometheus package. These break down the number of HTTP requests, as
well as request size, latency, errors, etc in a number of ways. They are
primarily useful for developers working on the dashboard, although a number
(e.g. django_http_requests_latency_including_middlewares_seconds
for request
latency or django_http_responses_total_by_status
for number of HTTP
responses) are useful for general reporting.
django_http_responses_before_middlewares_total
(Counter): Total count of requests before middlewares rundjango_http_requests_before_middlewares_total
(Counter): Total count of responses before middlewares rundjango_http_requests_latency_including_middlewares_seconds
(Histogram): Histogram of requests processing time (including middleware processing time)django_http_requests_unknown_latency_including_middlewares_total
(Counter): Count of requests for which the latency was unknown (when computing * django_http_requests_latency_including_middlewares_seconds)django_http_requests_latency_seconds_by_view_method
(Histogram): Histogram of request processing time labelled by viewdjango_http_requests_unknown_latency_total
(Counter): Count of requests for which the latency was unknowndjango_http_ajax_requests_total
(Counter): Count of AJAX requestsdjango_http_requests_total_by_method
(Counter): Count of requests by methoddjango_http_requests_total_by_transport
(Counter): Count of requests by transportdjango_http_requests_total_by_view_transport_method
(Counter): Count of requests by view, transport, methoddjango_http_requests_body_total_bytes
(Histogram): Histogram of requests by body sizedjango_http_responses_total_by_templatename
(Counter): Count of responses by template namedjango_http_responses_total_by_status
(Counter): Count of responses by statusdjango_http_responses_total_by_status_view_method
(Counter): Count of responses by status, view, methoddjango_http_responses_body_total_bytes
(Histogram): Histogram of responses by body sizedjango_http_responses_total_by_charset
(Counter): Count of responses by charsetdjango_http_responses_streaming_total
(Counter): Count of streaming responsesdjango_http_exceptions_total_by_type
(Counter): Count of exceptions by object typedjango_http_exceptions_total_by_view
(Counter): Count of exceptions by viewdjango_model_save_total
(Counter): Total model save calls labeled by model classdjango_model_delete_total
(Counter): Total model save calls labeled by model class
The dashboard also includes additional metrics: common_db_retry_time_seconds (Counter): Total time waiting to retry database transactions in seconds, labeled by operation description common_ss_api_request_duration_seconds (Counter): Total time waiting on the Storage Service API in seconds, labeled by function
Storage Service metrics¶
The Storage Service includes all the same Django-based metrics as the Dashboard.
Configuration¶
The following environment variables will turn on metrics capturing and export in Archivematica (provided with example values):
ARCHIVEMATICA_MCPSERVER_MCPSERVER_PROMETHEUS_BIND_PORT=7998
ARCHIVEMATICA_MCPSERVER_MCPSERVER_PROMETHEUS_BIND_ADDRESS=0.0.0.0
ARCHIVEMATICA_MCPCLIENT_MCPCLIENT_PROMETHEUS_BIND_PORT=7999
ARCHIVEMATICA_MCPCLIENT_MCPCLIENT_PROMETHEUS_BIND_ADDRESS=0.0.0.0
ARCHIVEMATICA_DASHBOARD_DASHBOARD_PROMETHEUS_ENABLED=true
SS_PROMETHEUS_ENABLED=true
For more documentation on these, see the README files for the Dashboard, MCPClient, MCPServer, and Storage Service (make sure to select the correct branch from the branch dropdown menu if you are not using Archivematica qa/1.x / Storage Service qa/0.x). Note that you don’t need to export metrics for all services.
If you are running multiple MCPClients, each of them must have a different port set.
Once the environment variables have been set, restart the Archivematica
services and confirm that you can access the /metrics
path on each of the
ports given (e.g. http://<am-hostname>:62080/metrics
). It should return a
block of text that looks like this:
# HELP django_http_requests_latency_seconds_by_view_method Histogram of request processing time labelled by view.
# TYPE django_http_requests_latency_seconds_by_view_method histogram
django_http_requests_latency_seconds_by_view_method_bucket{le="0.005",method="POST",view="components.api.views.wrapper"} 0.0
django_http_requests_latency_seconds_by_view_method_bucket{le="0.01",method="POST",view="components.api.views.wrapper"} 0.0
Configure Prometheus¶
Once Archivematica is configured to export variables, you can set up Prometheus to scrape data from it.
First, download and install Prometheus.
Create a Prometheus configuration file (in this example, it’s a YAML file called
prometheus.yml
). If you are running all Archivematica components on a single
host, a good starting point is:
scrape_configs:
- job_name: archivematica
scrape_interval: 5s
static_configs:
- targets:
- localhost:7998
- localhost:7999
- localhost:62080
- localhost:62081
Start Prometheus, passing in the path to the config file: ./prometheus
--config.file=prometheus.yml
.
You should be able to view the Prometheus web interface at
http://<hostname>:9090
. Under Status -> Targets in the menu bar, you’ll be
able to see the status of the various Archivematica services.
Configure Grafana¶
Once you have Prometheus storing metrics, Grafana can be set up so that you can view them.
First, download and install Grafana.
After following the installation guide for your platform, you can login to the
web interface (http://<hostname>:3000/
, username: admin, password: admin). Note
that the Grafana web interface can be exposed to the internet (after changing
the default password), although you may wish to limit access.
Once you have logged into the Grafana interface, add Prometheus as a data source. After clicking Save & Test, you should see a “Data source is working” message.
Now you need to import a dashboard. There is a basic Archivematica dashboard available in Archivematica’s Docker compose environment, or you can download the dashboard JSON file directly from GitHub. In Grafana, under Dashboards -> Manage, choose Import, then either paste the JSON into the text box or save the dashboard as a file and upload it.