Using Grafana with Prometheus on wolf

Reference in daos src tree

Use daos admin node to install and run Grafana

In order to run Grafana,  daos server must have telemetry port defined in daos_server.yml  (9191)

Install Prometheus on wolf daos admin node

mkdir ~/prometheus
dmg telemetry config -i /home/mjean/prometheus
		Downloading and installing Prometheus...                                                                                                                                                                                                 
		fetching prometheus/prometheus v2.28.1                                                                                                                                                                                                   
		Installed prometheus to /home/mjean/prometheus/prometheus                                                                                                                                                                                
		Configuring Prometheus for DAOS monitoring...                                                                                                                                                                                            
		Wrote DAOS monitoring config to /home/mjean/.prometheus.yml)

To collect data from the server nodes; they will need to be added to the ~/.prometheus.yml


  - targets:
    - localhost:9191
  - targets:
    - wolf-118:9191
    - wolf-119:9191
    - wolf-120:9191
    - wolf-121:9191

(mjmac) It's worth noting here that if you add the hosts to your ~/.daos_control.yml or some other control config file that you use with dmg -o /path/to/config.yml, they will automatically be added to your prometheus configuration.

Starting Prometheus

due to DAOS-8104 ; prometheus should be manually started with

./prometheus --config_file=~/.prometheus.yml;

Only use the cmd below after DAOS-8104 is fixed

dmg telemetry run -i /home/mjean/prometheus
		Downloading and installing Prometheus...                                                                                                                                                                                              
		fetching prometheus/prometheus v2.28.1                                                                                                                                                                                                
		Installed prometheus to /home/mjean/prometheus/prometheus
		Configuring Prometheus for DAOS monitoring...
		Wrote DAOS monitoring config to /home/mjean/.prometheus.yml)
		Starting Prometheus monitoring...
		level=info ts=2021-07-16T12:59:06.771Z caller=main.go:389 msg="No time or size retention was set so using the default time retention" duration=15d
		level=info ts=2021-07-16T12:59:06.771Z caller=main.go:443 msg="Starting Prometheus" version="(version=2.28.1, branch=HEAD, revision=b0944590a1c9a6b35dc5a696869f75f422b107a1)"
		level=info ts=2021-07-16T12:59:06.771Z caller=main.go:448 build_context="(go=go1.16.5, user=root@2915dd495090, date=20210701-15:20:10)"
		level=info ts=2021-07-16T12:59:06.771Z caller=main.go:449 host_details="(Linux 3.10.0-1160.24.1.el7.x86_64 #1 SMP Thu Apr 8 19:51:47 UTC 2021 x86_64"
		level=info ts=2021-07-16T12:59:06.771Z caller=main.go:450 fd_limits="(soft=1048576, hard=1048576)"
		level=info ts=2021-07-16T12:59:06.771Z caller=main.go:451 vm_limits="(soft=unlimited, hard=unlimited)"
		level=info ts=2021-07-16T12:59:06.775Z caller=web.go:541 component=web msg="Start listening for connections" address=
		level=info ts=2021-07-16T12:59:06.776Z caller=main.go:824 msg="Starting TSDB ..."
		level=info ts=2021-07-16T12:59:06.777Z caller=tls_config.go:191 component=web msg="TLS is disabled." http2=false
		level=info ts=2021-07-16T12:59:06.778Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1626298627558 maxt=1626350400000 ulid=01FANHRSJ125WPM5XQP3JN7548
		level=info ts=2021-07-16T12:59:06.779Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1626415202422 maxt=1626422400000 ulid=01FAQ8PQVY9T7Q6YHB3Q6T2N3H
		level=info ts=2021-07-16T12:59:06.780Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1626350400000 maxt=1626415200000 ulid=01FAQ8PR2JC5TWCEXPH51DKRNA
		level=info ts=2021-07-16T12:59:06.781Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1626422402421 maxt=1626429600000 ulid=01FAQFJF3YM6BH9GX6DM87M53F
		level=info ts=2021-07-16T12:59:06.805Z caller=head.go:780 component=tsdb msg="Replaying on-disk memory mappable chunks if any"
		level=info ts=2021-07-16T12:59:06.805Z caller=head.go:794 component=tsdb msg="On-disk memory mappable chunks replay completed" duration=49.128µs
		level=info ts=2021-07-16T12:59:06.805Z caller=head.go:800 component=tsdb msg="Replaying WAL, this may take a while"
		level=info ts=2021-07-16T12:59:06.809Z caller=head.go:826 component=tsdb msg="WAL checkpoint loaded"
		level=info ts=2021-07-16T12:59:06.888Z caller=head.go:854 component=tsdb msg="WAL segment loaded" segment=18 maxSegment=23
		level=info ts=2021-07-16T12:59:06.971Z caller=head.go:854 component=tsdb msg="WAL segment loaded" segment=19 maxSegment=23
		level=info ts=2021-07-16T12:59:07.047Z caller=head.go:854 component=tsdb msg="WAL segment loaded" segment=20 maxSegment=23
		level=info ts=2021-07-16T12:59:07.108Z caller=head.go:854 component=tsdb msg="WAL segment loaded" segment=21 maxSegment=23
		level=info ts=2021-07-16T12:59:07.119Z caller=head.go:854 component=tsdb msg="WAL segment loaded" segment=22 maxSegment=23
		level=info ts=2021-07-16T12:59:07.120Z caller=head.go:854 component=tsdb msg="WAL segment loaded" segment=23 maxSegment=23
		level=info ts=2021-07-16T12:59:07.120Z caller=head.go:860 component=tsdb msg="WAL replay completed" checkpoint_replay_duration=3.35109ms wal_replay_duration=311.254676ms total_replay_duration=314.690735ms
		level=warn ts=2021-07-16T12:59:07.122Z caller=main.go:849 fs_type=NFS_SUPER_MAGIC msg="This filesystem is not supported and may lead to data corruption and data loss. Please carefully read to learn more about supported filesystems."
		level=info ts=2021-07-16T12:59:07.122Z caller=main.go:854 msg="TSDB started"
		level=info ts=2021-07-16T12:59:07.122Z caller=main.go:981 msg="Loading configuration file" filename=/home/mjean/.prometheus.yml
		level=info ts=2021-07-16T12:59:07.124Z caller=main.go:1012 msg="Completed loading of configuration file" filename=/home/mjean/.prometheus.yml totalDuration=2.234767ms remote_storage=5.111µs web_handler=456ns query_engine=1.803µs scrape=1.283008ms scrape_sd=36.14µs notify=1.113µs notify_sd=2.39µs rules=2.893µs
		level=info ts=2021-07-16T12:59:07.124Z caller=main.go:796 msg="Server is ready to receive web requests."
		level=info ts=2021-07-16T13:00:02.439Z caller=compact.go:518 component=tsdb msg="write block" mint=1626429600000 maxt=1626436800000 ulid=01FAQPE1FGXW6QWCK6C40X2GQG duration=22.37061ms
		level=info ts=2021-07-16T13:00:02.450Z caller=head.go:967 component=tsdb msg="Head GC completed" duration=2.069077ms
		level=info ts=2021-07-16T13:00:02.452Z caller=checkpoint.go:97 component=tsdb msg="Creating checkpoint" from_segment=18 to_segment=20 mint=1626436800000
		level=info ts=2021-07-16T13:00:02.464Z caller=head.go:1064 component=tsdb msg="WAL checkpoint complete" first=18 last=20 duration=13.152763ms

Install Grafana on wolf daos admin node

Download package from the grafana web site

	For Centos:
	sudo yum install grafana-8.0.6-1.x86_64.rpm
	sudo rpm -i --nodeps grafana-8.0.6-1.x86_64.rpm

After it is installed, start grafana services on daos admin node:

	sudo systemctl daemon-reload
	sudo systemctl start grafana-server
	sudo systemctl status grafana-server
	Configure the Grafana server to start at boot:
	sudo systemctl enable grafana-server 

Prometheus does not start on boot so will need to manaually re-start

Starting Grafana on a wolf daos admin node:

In order to monitor metrics on wolf and running  grafana from windows you will need to setup port forwarding.  Install Firefox on windows machine and setup a manual proxy in network settings   Port must match the forwarded port in putty setup below

Use putty to setup port forwarding/tunnel

Open the connection to wolf and login as you would normally log into wolf

Open a connection to the wolf node running the Grafana service  using the FireFox browser

Startup Grafana dashboard using FireFox 


 Login with admin/admin.   It will prompt you to change the admin password

Before adding the daos dashboard to grafana; you will need to add prometheus data source. (It should prompt you to add the source initially)

 NOTE: Prometheus uses wolf-xx:9090

Add prometheus  data source

import grafana metrics in the Prometheus Dashbord

 import DAOS-Grafana-Dashboard.json from github

Open the daos dashboard and the monitor should start collecting metrics when the daos servers have started

This is a snapshot of daos metrics while soak was runing on shared cluster 8 servers (4 nodes) and 5 clients