Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

Reference README.md in daos src tree  https://github.com/daos-stack/daos/tree/master/utils/grafana

Use daos admin node to install and run Grafana

In order to run Grafana,  daos server must have telemetry port defined in daos_server.yml  (9191)


Install Prometheus


mkdir ~/prometheus
dmg telemetry config -i /home/mjean/prometheus
		Downloading and installing Prometheus...                                                                                                                                                                                                 
		fetching prometheus/prometheus v2.28.1                                                                                                                                                                                                   
		Installed prometheus to /home/mjean/prometheus/prometheus                                                                                                                                                                                
		Configuring Prometheus for DAOS monitoring...                                                                                                                                                                                            
		Wrote DAOS monitoring config to /home/mjean/.prometheus.yml)



To collect data from the server nodes; they will need to be added to the ~/.prometheus.yml

ex: 

  - targets:
    - localhost:9191
 to 
  - targets:
    - wolf-118:9191
    - wolf-119:9191
    - wolf-120:9191
    - wolf-121:9191

due to DAOS-8104 ; prometheus should be manually started with


./prometheus --config_file=~/.prometheus.yml;


Only use the cmd below after DAOS-8104 is fixed

dmg telemetry run -i /home/mjean/prometheus
		Downloading and installing Prometheus...                                                                                                                                                                                              
		fetching prometheus/prometheus v2.28.1                                                                                                                                                                                                
		Installed prometheus to /home/mjean/prometheus/prometheus
		Configuring Prometheus for DAOS monitoring...
		Wrote DAOS monitoring config to /home/mjean/.prometheus.yml)
		Starting Prometheus monitoring...
		level=info ts=2021-07-16T12:59:06.771Z caller=main.go:389 msg="No time or size retention was set so using the default time retention" duration=15d
		level=info ts=2021-07-16T12:59:06.771Z caller=main.go:443 msg="Starting Prometheus" version="(version=2.28.1, branch=HEAD, revision=b0944590a1c9a6b35dc5a696869f75f422b107a1)"
		level=info ts=2021-07-16T12:59:06.771Z caller=main.go:448 build_context="(go=go1.16.5, user=root@2915dd495090, date=20210701-15:20:10)"
		level=info ts=2021-07-16T12:59:06.771Z caller=main.go:449 host_details="(Linux 3.10.0-1160.24.1.el7.x86_64 #1 SMP Thu Apr 8 19:51:47 UTC 2021 x86_64 wolf-80.wolf.hpdd.intel.com wolf.hpdd.intel.com)"
		level=info ts=2021-07-16T12:59:06.771Z caller=main.go:450 fd_limits="(soft=1048576, hard=1048576)"
		level=info ts=2021-07-16T12:59:06.771Z caller=main.go:451 vm_limits="(soft=unlimited, hard=unlimited)"
		level=info ts=2021-07-16T12:59:06.775Z caller=web.go:541 component=web msg="Start listening for connections" address=0.0.0.0:9090
		level=info ts=2021-07-16T12:59:06.776Z caller=main.go:824 msg="Starting TSDB ..."
		level=info ts=2021-07-16T12:59:06.777Z caller=tls_config.go:191 component=web msg="TLS is disabled." http2=false
		level=info ts=2021-07-16T12:59:06.778Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1626298627558 maxt=1626350400000 ulid=01FANHRSJ125WPM5XQP3JN7548
		level=info ts=2021-07-16T12:59:06.779Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1626415202422 maxt=1626422400000 ulid=01FAQ8PQVY9T7Q6YHB3Q6T2N3H
		level=info ts=2021-07-16T12:59:06.780Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1626350400000 maxt=1626415200000 ulid=01FAQ8PR2JC5TWCEXPH51DKRNA
		level=info ts=2021-07-16T12:59:06.781Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1626422402421 maxt=1626429600000 ulid=01FAQFJF3YM6BH9GX6DM87M53F
		level=info ts=2021-07-16T12:59:06.805Z caller=head.go:780 component=tsdb msg="Replaying on-disk memory mappable chunks if any"
		level=info ts=2021-07-16T12:59:06.805Z caller=head.go:794 component=tsdb msg="On-disk memory mappable chunks replay completed" duration=49.128µs
		level=info ts=2021-07-16T12:59:06.805Z caller=head.go:800 component=tsdb msg="Replaying WAL, this may take a while"
		level=info ts=2021-07-16T12:59:06.809Z caller=head.go:826 component=tsdb msg="WAL checkpoint loaded"
		level=info ts=2021-07-16T12:59:06.888Z caller=head.go:854 component=tsdb msg="WAL segment loaded" segment=18 maxSegment=23
		level=info ts=2021-07-16T12:59:06.971Z caller=head.go:854 component=tsdb msg="WAL segment loaded" segment=19 maxSegment=23
		level=info ts=2021-07-16T12:59:07.047Z caller=head.go:854 component=tsdb msg="WAL segment loaded" segment=20 maxSegment=23
		level=info ts=2021-07-16T12:59:07.108Z caller=head.go:854 component=tsdb msg="WAL segment loaded" segment=21 maxSegment=23
		level=info ts=2021-07-16T12:59:07.119Z caller=head.go:854 component=tsdb msg="WAL segment loaded" segment=22 maxSegment=23
		level=info ts=2021-07-16T12:59:07.120Z caller=head.go:854 component=tsdb msg="WAL segment loaded" segment=23 maxSegment=23
		level=info ts=2021-07-16T12:59:07.120Z caller=head.go:860 component=tsdb msg="WAL replay completed" checkpoint_replay_duration=3.35109ms wal_replay_duration=311.254676ms total_replay_duration=314.690735ms
		level=warn ts=2021-07-16T12:59:07.122Z caller=main.go:849 fs_type=NFS_SUPER_MAGIC msg="This filesystem is not supported and may lead to data corruption and data loss. Please carefully read https://prometheus.io/docs/prometheus/latest/storage/ to learn more about supported filesystems."
		level=info ts=2021-07-16T12:59:07.122Z caller=main.go:854 msg="TSDB started"
		level=info ts=2021-07-16T12:59:07.122Z caller=main.go:981 msg="Loading configuration file" filename=/home/mjean/.prometheus.yml
		level=info ts=2021-07-16T12:59:07.124Z caller=main.go:1012 msg="Completed loading of configuration file" filename=/home/mjean/.prometheus.yml totalDuration=2.234767ms remote_storage=5.111µs web_handler=456ns query_engine=1.803µs scrape=1.283008ms scrape_sd=36.14µs notify=1.113µs notify_sd=2.39µs rules=2.893µs
		level=info ts=2021-07-16T12:59:07.124Z caller=main.go:796 msg="Server is ready to receive web requests."
		level=info ts=2021-07-16T13:00:02.439Z caller=compact.go:518 component=tsdb msg="write block" mint=1626429600000 maxt=1626436800000 ulid=01FAQPE1FGXW6QWCK6C40X2GQG duration=22.37061ms
		level=info ts=2021-07-16T13:00:02.450Z caller=head.go:967 component=tsdb msg="Head GC completed" duration=2.069077ms
		level=info ts=2021-07-16T13:00:02.452Z caller=checkpoint.go:97 component=tsdb msg="Creating checkpoint" from_segment=18 to_segment=20 mint=1626436800000
		level=info ts=2021-07-16T13:00:02.464Z caller=head.go:1064 component=tsdb msg="WAL checkpoint complete" first=18 last=20 duration=13.152763ms
		





  • No labels