/
Using Grafana with Prometheus on wolf

Using Grafana with Prometheus on wolf

Reference README.md in daos src tree  https://github.com/daos-stack/daos/tree/master/utils/grafana

Use daos admin node to install and run Grafana

In order to run Grafana,  daos server must have telemetry port defined in daos_server.yml  (9191)


Install Prometheus on wolf daos admin node


mkdir ~/prometheus
dmg telemetry config -i /home/mjean/prometheus
		Downloading and installing Prometheus...                                                                                                                                                                                                 
		fetching prometheus/prometheus v2.28.1                                                                                                                                                                                                   
		Installed prometheus to /home/mjean/prometheus/prometheus                                                                                                                                                                                
		Configuring Prometheus for DAOS monitoring...                                                                                                                                                                                            
		Wrote DAOS monitoring config to /home/mjean/.prometheus.yml)



To collect data from the server nodes; they will need to be added to the ~/.prometheus.yml

ex: 

  - targets:
    - localhost:9191
 to 
  - targets:
    - wolf-118:9191
    - wolf-119:9191
    - wolf-120:9191
    - wolf-121:9191


(mjmac) It's worth noting here that if you add the hosts to your ~/.daos_control.yml or some other control config file that you use with dmg -o /path/to/config.yml, they will automatically be added to your prometheus configuration.

Starting Prometheus


due to DAOS-8104 ; prometheus should be manually started with

./prometheus --config_file=~/.prometheus.yml;


Only use the cmd below after DAOS-8104 is fixed

dmg telemetry run -i /home/mjean/prometheus
		Downloading and installing Prometheus...                                                                                                                                                                                              
		fetching prometheus/prometheus v2.28.1                                                                                                                                                                                                
		Installed prometheus to /home/mjean/prometheus/prometheus
		Configuring Prometheus for DAOS monitoring...
		Wrote DAOS monitoring config to /home/mjean/.prometheus.yml)
		Starting Prometheus monitoring...
		level=info ts=2021-07-16T12:59:06.771Z caller=main.go:389 msg="No time or size retention was set so using the default time retention" duration=15d
		level=info ts=2021-07-16T12:59:06.771Z caller=main.go:443 msg="Starting Prometheus" version="(version=2.28.1, branch=HEAD, revision=b0944590a1c9a6b35dc5a696869f75f422b107a1)"
		level=info ts=2021-07-16T12:59:06.771Z caller=main.go:448 build_context="(go=go1.16.5, user=root@2915dd495090, date=20210701-15:20:10)"
		level=info ts=2021-07-16T12:59:06.771Z caller=main.go:449 host_details="(Linux 3.10.0-1160.24.1.el7.x86_64 #1 SMP Thu Apr 8 19:51:47 UTC 2021 x86_64 wolf-80.wolf.hpdd.intel.com wolf.hpdd.intel.com)"
		level=info ts=2021-07-16T12:59:06.771Z caller=main.go:450 fd_limits="(soft=1048576, hard=1048576)"
		level=info ts=2021-07-16T12:59:06.771Z caller=main.go:451 vm_limits="(soft=unlimited, hard=unlimited)"
		level=info ts=2021-07-16T12:59:06.775Z caller=web.go:541 component=web msg="Start listening for connections" address=0.0.0.0:9090
		level=info ts=2021-07-16T12:59:06.776Z caller=main.go:824 msg="Starting TSDB ..."
		level=info ts=2021-07-16T12:59:06.777Z caller=tls_config.go:191 component=web msg="TLS is disabled." http2=false
		level=info ts=2021-07-16T12:59:06.778Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1626298627558 maxt=1626350400000 ulid=01FANHRSJ125WPM5XQP3JN7548
		level=info ts=2021-07-16T12:59:06.779Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1626415202422 maxt=1626422400000 ulid=01FAQ8PQVY9T7Q6YHB3Q6T2N3H
		level=info ts=2021-07-16T12:59:06.780Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1626350400000 maxt=1626415200000 ulid=01FAQ8PR2JC5TWCEXPH51DKRNA
		level=info ts=2021-07-16T12:59:06.781Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1626422402421 maxt=1626429600000 ulid=01FAQFJF3YM6BH9GX6DM87M53F
		level=info ts=2021-07-16T12:59:06.805Z caller=head.go:780 component=tsdb msg="Replaying on-disk memory mappable chunks if any"
		level=info ts=2021-07-16T12:59:06.805Z caller=head.go:794 component=tsdb msg="On-disk memory mappable chunks replay completed" duration=49.128µs
		level=info ts=2021-07-16T12:59:06.805Z caller=head.go:800 component=tsdb msg="Replaying WAL, this may take a while"
		level=info ts=2021-07-16T12:59:06.809Z caller=head.go:826 component=tsdb msg="WAL checkpoint loaded"
		level=info ts=2021-07-16T12:59:06.888Z caller=head.go:854 component=tsdb msg="WAL segment loaded" segment=18 maxSegment=23
		level=info ts=2021-07-16T12:59:06.971Z caller=head.go:854 component=tsdb msg="WAL segment loaded" segment=19 maxSegment=23
		level=info ts=2021-07-16T12:59:07.047Z caller=head.go:854 component=tsdb msg="WAL segment loaded" segment=20 maxSegment=23
		level=info ts=2021-07-16T12:59:07.108Z caller=head.go:854 component=tsdb msg="WAL segment loaded" segment=21 maxSegment=23
		level=info ts=2021-07-16T12:59:07.119Z caller=head.go:854 component=tsdb msg="WAL segment loaded" segment=22 maxSegment=23
		level=info ts=2021-07-16T12:59:07.120Z caller=head.go:854 component=tsdb msg="WAL segment loaded" segment=23 maxSegment=23
		level=info ts=2021-07-16T12:59:07.120Z caller=head.go:860 component=tsdb msg="WAL replay completed" checkpoint_replay_duration=3.35109ms wal_replay_duration=311.254676ms total_replay_duration=314.690735ms
		level=warn ts=2021-07-16T12:59:07.122Z caller=main.go:849 fs_type=NFS_SUPER_MAGIC msg="This filesystem is not supported and may lead to data corruption and data loss. Please carefully read https://prometheus.io/docs/prometheus/latest/storage/ to learn more about supported filesystems."
		level=info ts=2021-07-16T12:59:07.122Z caller=main.go:854 msg="TSDB started"
		level=info ts=2021-07-16T12:59:07.122Z caller=main.go:981 msg="Loading configuration file" filename=/home/mjean/.prometheus.yml
		level=info ts=2021-07-16T12:59:07.124Z caller=main.go:1012 msg="Completed loading of configuration file" filename=/home/mjean/.prometheus.yml totalDuration=2.234767ms remote_storage=5.111µs web_handler=456ns query_engine=1.803µs scrape=1.283008ms scrape_sd=36.14µs notify=1.113µs notify_sd=2.39µs rules=2.893µs
		level=info ts=2021-07-16T12:59:07.124Z caller=main.go:796 msg="Server is ready to receive web requests."
		level=info ts=2021-07-16T13:00:02.439Z caller=compact.go:518 component=tsdb msg="write block" mint=1626429600000 maxt=1626436800000 ulid=01FAQPE1FGXW6QWCK6C40X2GQG duration=22.37061ms
		level=info ts=2021-07-16T13:00:02.450Z caller=head.go:967 component=tsdb msg="Head GC completed" duration=2.069077ms
		level=info ts=2021-07-16T13:00:02.452Z caller=checkpoint.go:97 component=tsdb msg="Creating checkpoint" from_segment=18 to_segment=20 mint=1626436800000
		level=info ts=2021-07-16T13:00:02.464Z caller=head.go:1064 component=tsdb msg="WAL checkpoint complete" first=18 last=20 duration=13.152763ms
		


Install Grafana on wolf daos admin node 

https://grafana.com/docs/grafana/latest/installation/


Download package from the grafana web site


	Downloads:
	https://grafana.com/grafana/download
	
	For Centos:
	
	wget https://dl.grafana.com/oss/release/grafana-8.0.6-1.x86_64.rpm
	sudo yum install grafana-8.0.6-1.x86_64.rpm
	
	For SLES
	
	wget https://dl.grafana.com/oss/release/grafana-8.0.6-1.x86_64.rpm
	sudo rpm -i --nodeps grafana-8.0.6-1.x86_64.rpm


After it is installed, start grafana services on daos admin node:


	sudo systemctl daemon-reload
	sudo systemctl start grafana-server
	sudo systemctl status grafana-server
	
	Configure the Grafana server to start at boot:
	sudo systemctl enable grafana-server 

Prometheus does not start on boot so will need to manaually re-start


Starting Grafana on a wolf daos admin node:


In order to monitor metrics on wolf and running  grafana from windows you will need to setup port forwarding.  Install Firefox on windows machine and setup a manual proxy in network settings   Port must match the forwarded port in putty setup below



Use putty to setup port forwarding/tunnel




Open the connection to wolf and login as you would normally log into wolf


Open a connection to the wolf node running the Grafana service  using the FireFox browser


Startup Grafana dashboard using FireFox 


 http://wolf-xx:3000

 Login with admin/admin.   It will prompt you to change the admin password


Before adding the daos dashboard to grafana; you will need to add prometheus data source. (It should prompt you to add the source initially)


 NOTE: Prometheus uses wolf-xx:9090


Add prometheus  data source




import grafana metrics in the Prometheus Dashbord






 import DAOS-Grafana-Dashboard.json from github https://github.com/daos-stack/daos/tree/master/utils/grafana





Open the daos dashboard and the monitor should start collecting metrics when the daos servers have started

This is a snapshot of daos metrics while soak was runing on shared cluster 8 servers (4 nodes) and 5 clients








Related content

Erasure_Code_Performance (Wolf vs Frontera)
Erasure_Code_Performance (Wolf vs Frontera)
Read with this
Agent configuration file details
Agent configuration file details
More like this
Erasure Code Performance
Erasure Code Performance
Read with this
DAOS Tour
More like this
DAOS on Frontera
DAOS on Frontera
Read with this
Server configuration file details
Server configuration file details
More like this