Prometheus监控Nginx服务：资源优化与告警配置

1. 监控方案部署与数据采集

1.1 启用Nginx监控模块

Nginx需通过nginx-module-vts或nginx-prometheus-exporter暴露性能指标。推荐使用官方插件nginx-prometheus-exporter，通过以下步骤部署：

编译Nginx时添加模块：

./configure --add-module=nginx-module-vts make && make install

在Nginx配置中启用状态页：

http {
    vhost_traffic_status_zone;
    server {
        location /status {
            stub_status;
            allow 127.0.0.1; # 限制访问IP
        }
    }
}

1.2 Prometheus采集器配置

通过nginx-prometheus-exporter将数据导入Prometheus：

# prometheus.yml配置示例
scrape_configs:
  - job_name: 'nginx'
    static_configs:
      - targets: ['192.168.1.100:9113'] # exporter默认端口

启动命令需指定Nginx状态页地址：

./nginx-prometheus-exporter -nginx.scrape-uri=http://localhost/status

2. 资源优化策略

2.1 性能瓶颈分析工具链

工具类型	推荐方案	监控指标示例
系统层	Node Exporter + top/htop	CPU使用率、内存占用
网络层	iptables/iftop	连接数、带宽利用率
应用层	Nginx Exporter	请求延迟、5xx错误率
可视化	Grafana Dashboard(ID:11199)	实时请求流量热力图

2.2 动态调优实践

连接数优化
根据nginx_connections_active指标调整worker_connections参数：
```
events {
    worker_connections 2048; # 建议为max_open_files的75%
}
```

缓存加速
启用响应缓存降低后端压力：

http {
    proxy_cache_path /data/nginx/cache levels=1:2 keys_zone=mycache:10m;
    server {
        location / {
            proxy_cache mycache;
            proxy_pass http://backend;
        }
    }
}

3. 告警规则配置

3.1 核心告警策略

# alert_rules.yml示例
groups:
- name: nginx-alerts
  rules:
  - alert: HighErrorRate
    expr: rate(nginx_http_requests_total{status=~"5.."}[5m]) > 0.05
    for: 10m
    labels:
      severity: critical
    annotations:
      summary: "Nginx 5xx错误率超过5%"
      description: "实例 {{ $labels.instance }} 当前错误率: {{ $value }}%"
  
  - alert: SlowResponse
    expr: nginx_http_request_duration_seconds_bucket{le="1"} < 0.9
    for: 15m
    labels:
      severity: warning

3.2 证书与资源告警

- alert: SSLCertExpiry
  expr: probe_ssl_earliest_cert_expiry - time() < 86400 * 30
  labels:
    severity: warning
  annotations:
    summary: "SSL证书30天内过期"

- alert: WorkerOverload
  expr: nginx_workers_active / nginx_workers_total > 0.8
  for: 5m
  labels:
    severity: critical

4. 可视化与高级分析

4.1 Grafana仪表盘配置

推荐使用预置模板：

基础监控：Dashboard ID 11199
包含请求量、响应码分布、连接状态等基础指标
深度分析：Dashboard ID 7587
提供上下游服务依赖关系图、JVM线程分析等高级功能

4.2 日志关联分析

通过Loki集成实现日志-指标联动：

# promtail配置片段
scrape_configs:
  - job_name: nginx
    static_configs:
      - targets: [localhost]
        labels:
          job: nginx-logs
          __path__: /var/log/nginx/*.log