普罗米修斯
参考文章
部署prometheus(master节点上)
安装prometheus主程序
wget https://github.com/prometheus/prometheus/releases/download/v2.52.0/prometheus-2.52.0.linux-amd64.tar.gz
tar xf prometheus-2.52.0.linux-amd64.tar.gz -C /usr/local/
cd /usr/local/prometheus-2.52.0.linux-amd64.tar.gz
执行
./prometheus --config.file=prometheus.yml &
然后就可以通过浏览器测试了,建议通过vscode将服务器的端口映射到本地,然后在自己电脑上通过浏览器访问,具体可以查看上述参考文章
监控一个远端业务机器(worker节点上)
安装监控客户端
wget https://github.com/prometheus/node_exporter/releases/download/v1.8.1/node_exporter-1.8.1.linux-amd64.tar.gz
tar xf node_exporter-1.8.1.linux-amd64.tar.gz -C /usr/local/
cd /usr/local/node_exporter-1.8.1.linux-amd64/
后台运行
nohup /usr/local/node_exporter-1.8.1.linux-amd64/node_exporter & [1] 7281
是否运行正确也可以参考上述文章
然后需要在master节点上修改prometheus的配置文件
master节点上运行
nano prometheus.yml
具体内容如下
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: "prometheus"
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ["localhost:9090"]
- job_name: "workers" #定义名称
static_configs:
- targets: ["192.168.1.14:9100"] //两个workers节点,都要执行前面安装node_exporter的过程
- targets: ["192.168.1.15:9100"]
在master节点上将prometheus部署成服务
nano /usr/lib/systemd/system/prometheus.service
内容如下,具体的路径配置请自行修改
[Unit]
Description=Prometheus Server
Documentation=https://prometheus.io
After=network.target
[Service]
Type=simple
ExecStart=/usr/local/prometheus-2.52.0.linux-amd64/prometheus \
--config.file=/usr/local/prometheus-2.52.0.linux-amd64/prometheus.yml \
--storage.tsdb.path=/usr/local/prometheus-2.52.0.linux-amd64/data/ \
--storage.tsdb.retention=15d \
--web.enable-lifecycle
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-failure
[Install]
WantedBy=multi-user.target
systemctl daemon-reload
systemctl start prometheus
systemctl reload prometheus
systemctl status prometheus
如果服务启动失败,可以查看一下是否配置路径写错,或者是9090端口已被占用
lsof -i :9090
如果被占用,通过kill指令杀死相应进程,并重新执行上述指令
上述做完后,也可以在浏览器中查看是否成功,具体可以参考前面的参考文章
nerdctl run -d –net flannel –name prometheus ubuntu:20.04 /bin/bash