Grafana 系列文章(十四):Helm 安装Loki

grafana,系列,文章,十四,helm,安装,loki · 浏览次数 : 465

小编点评

**标题:配置 Grafana Dashboards 在同一个 NS 下** **正文:** 配置 Grafana Dashboards 在同一个 NS 下需要以下步骤: 1. 创建 ConfigMap: (只要打上 grafana_dashboard 这个 label 就会被 Grafana 的 sidecar 自动导入) ```apiVersion: v1kind: ConfigMapmetadata: name: sample-grafana-dashboard labels: grafana_dashboard: \"1\"data: k8s-dashboard.json: |- [...]Grafana 增加 DataSource在同一个 NS 下,创建如下 ConfigMap: (只要打上grafana_datasource 这个 label 就会被 Grafana 的 sidecar 自动导入) ```apiVersion: v1kind: ConfigMapmetadata: name: loki-loki-stack labels: grafana_datasource: '1'data: loki-stack-datasource.yaml: |- ```apiVersion: 1 datasources: - name: Loki type: loki access: proxy url: http://loki:3100 version: 1Traefik 配置 Grafana IngressRoute因为我是用的 Traefik 2, 通过 CRD IngressRoute 配置 Ingress, 配置如下:apiVersion: traefik.containo.us/v1alpha1kind: IngressRoutemetadata: name: grafanaspec: entryPoints: - web - websecure routes: - kind: Rule match: Host(`grafana.ewhisper.cn`) middlewares: - name: hsts-header namespace: kube-system - name: redirectshttps namespace: kube-system services: - name: loki-grafana namespace: monitoring port: 80 tls: {}最终效果如下:🎉🎉🎉📚️参考文档helm-charts/charts at main · grafana/helm-charts (github.com)Grafana 系列文章Grafana 系列文章三人行, 必有我师; 知识共享, 天下为公. **排版:** ``` apiVersion: v1kind: ConfigMapmetadata: name: sample-grafana-dashboard labels: grafana_dashboard: \"1\"data: k8s-dashboard.json: |- [...]Grafana 增加 DataSource在同一个 NS 下,创建如下 ConfigMap: (只要打上grafana_datasource 这个 label 就会被 Grafana 的 sidecar 自动导入) apiVersion: v1kind: ConfigMapmetadata: name: loki-loki-stack labels: grafana_datasource: '1'data: loki-stack-datasource.yaml: |- apiVersion: 1 datasources: - name: Loki type: loki access: proxy url: http://loki:3100 version: 1Traefik 配置 Grafana IngressRoute因为我是用的 Traefik 2, 通过 CRD IngressRoute 配置 Ingress, 配置如下:apiVersion: traefik.containo.us/v1alpha1kind: IngressRoutemetadata: name: grafanaspec: entryPoints: - web - websecure routes: - kind: Rule match: Host(`grafana.ewhisper.cn`) middlewares: - name: hsts-header namespace: kube-system - name: redirectshttps namespace: kube-system services: - name: loki-grafana namespace: monitoring port: 80 tls: {}最终效果如下:🎉🎉🎉📚️参考文档helm-charts/charts at main · grafana/helm-charts (github.com)Grafana 系列文章Grafana 系列文章三人行, 必有我师; 知识共享, 天下为公. ```

正文

前言

写或者翻译这么多篇 Loki 相关的文章了, 发现还没写怎么安装 😓

现在开始介绍如何使用 Helm 安装 Loki.

前提

有 Helm, 并且添加 Grafana 的官方源:

helm repo add grafana https://grafana.github.io/helm-charts
helm repo update

🐾Warning:

网络受限, 需要保证网络通畅.

部署

架构

Promtail(收集) + Loki(存储及处理) + Grafana(展示)

Loki 架构图

Promtail

  1. 启用 Prometheus Operator Service Monitor 做监控
  2. 增加external_labels - cluster, 以识别是哪个 K8S 集群;
  3. pipeline_stages 改为 cri, 以对 cri 日志做处理(因为我的集群用的 Container Runtime 是 CRI, 而 Loki Helm 默认配置是 docker)
  4. 增加对 systemd-journal 的日志收集:
promtail:
  config:
    snippets:
      pipelineStages:
        - cri: {}

  extraArgs: 
    - -client.external-labels=cluster=ctyun
  # systemd-journal 额外配置:
  # Add additional scrape config
  extraScrapeConfigs:
    - job_name: journal
      journal:
        path: /var/log/journal
        max_age: 12h
        labels:
          job: systemd-journal
      relabel_configs:
        - source_labels: ['__journal__systemd_unit']
          target_label: 'unit'
        - source_labels: ['__journal__hostname']
          target_label: 'hostname'

  # Mount journal directory into Promtail pods
  extraVolumes:
    - name: journal
      hostPath:
        path: /var/log/journal

  extraVolumeMounts:
    - name: journal
      mountPath: /var/log/journal
      readOnly: true

Loki

  1. 启用持久化存储
  2. 启用 Prometheus Operator Service Monitor 做监控
    1. 并配置 Loki 相关 Prometheus Rule 做告警
  3. 因为个人集群日志量较小, 适当调大 ingester 相关配置

Grafana

  1. 启用持久化存储
  2. 启用 Prometheus Operator Service Monitor 做监控
  3. sidecar 都配置上, 方便动态更新 dashboards/datasources/plugins/notifiers;

Helm 安装

通过如下命令安装:

helm upgrade --install loki --namespace=loki --create-namespace grafana/loki-stack -f values.yaml

自定义 values.yaml 如下:

loki:
  enabled: true
  persistence:
    enabled: true
    storageClassName: local-path
    size: 20Gi
  serviceScheme: https
  user: admin
  password: changit!
  config:
    ingester:
      chunk_idle_period: 1h
      max_chunk_age: 4h
    compactor:
      retention_enabled: true
  serviceMonitor:
    enabled: true
    prometheusRule:
      enabled: true
      rules:
        #  Some examples from https://awesome-prometheus-alerts.grep.to/rules.html#loki
        - alert: LokiProcessTooManyRestarts
          expr: changes(process_start_time_seconds{job=~"loki"}[15m]) > 2
          for: 0m
          labels:
            severity: warning
          annotations:
            summary: Loki process too many restarts (instance {{ $labels.instance }})
            description: "A loki process had too many restarts (target {{ $labels.instance }})\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
        - alert: LokiRequestErrors
          expr: 100 * sum(rate(loki_request_duration_seconds_count{status_code=~"5.."}[1m])) by (namespace, job, route) / sum(rate(loki_request_duration_seconds_count[1m])) by (namespace, job, route) > 10
          for: 15m
          labels:
            severity: critical
          annotations:
            summary: Loki request errors (instance {{ $labels.instance }})
            description: "The {{ $labels.job }} and {{ $labels.route }} are experiencing errors\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
        - alert: LokiRequestPanic
          expr: sum(increase(loki_panic_total[10m])) by (namespace, job) > 0
          for: 5m
          labels:
            severity: critical
          annotations:
            summary: Loki request panic (instance {{ $labels.instance }})
            description: "The {{ $labels.job }} is experiencing {{ printf \"%.2f\" $value }}% increase of panics\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
        - alert: LokiRequestLatency
          expr: (histogram_quantile(0.99, sum(rate(loki_request_duration_seconds_bucket{route!~"(?i).*tail.*"}[5m])) by (le)))  > 1
          for: 5m
          labels:
            severity: critical
          annotations:
            summary: Loki request latency (instance {{ $labels.instance }})
            description: "The {{ $labels.job }} {{ $labels.route }} is experiencing {{ printf \"%.2f\" $value }}s 99th percentile latency\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

promtail:
  enabled: true
  config:
    snippets:
      pipelineStages:
        - cri: {}  
  extraArgs:
    - -client.external-labels=cluster=ctyun        
  serviceMonitor:
    # -- If enabled, ServiceMonitor resources for Prometheus Operator are created
    enabled: true

  # systemd-journal 额外配置:
  # Add additional scrape config
  extraScrapeConfigs:
    - job_name: journal
      journal:
        path: /var/log/journal
        max_age: 12h
        labels:
          job: systemd-journal
      relabel_configs:
        - source_labels: ['__journal__systemd_unit']
          target_label: 'unit'
        - source_labels: ['__journal__hostname']
          target_label: 'hostname'

  # Mount journal directory into Promtail pods
  extraVolumes:
    - name: journal
      hostPath:
        path: /var/log/journal

  extraVolumeMounts:
    - name: journal
      mountPath: /var/log/journal
      readOnly: true

fluent-bit:
  enabled: false

grafana:
  enabled: true
  adminUser: caseycui
  adminPassword: changit!
  ## Sidecars that collect the configmaps with specified label and stores the included files them into the respective folders
  ## Requires at least Grafana 5 to work and can't be used together with parameters dashboardProviders, datasources and dashboards
  sidecar:
    image:
      repository: quay.io/kiwigrid/k8s-sidecar
      tag: 1.15.6
      sha: ''
    dashboards:
      enabled: true
      SCProvider: true
      label: grafana_dashboard
    datasources:
      enabled: true
      # label that the configmaps with datasources are marked with
      label: grafana_datasource
    plugins:
      enabled: true
      # label that the configmaps with plugins are marked with
      label: grafana_plugin
    notifiers:
      enabled: true
      # label that the configmaps with notifiers are marked with
      label: grafana_notifier
  image:
    tag: 8.3.5
  persistence:
    enabled: true
    size: 2Gi
    storageClassName: local-path
  serviceMonitor:
    enabled: true
  imageRenderer:
    enabled: disable

filebeat:
  enabled: false

logstash:
  enabled: false

安装后的资源拓扑如下:

Loki K8S 资源拓扑

Day 2 配置(按需)

Grafana 增加 Dashboards

在同一个 NS 下, 创建如下 ConfigMap: (只要打上grafana_dashboard 这个 label 就会被 Grafana 的 sidecar 自动导入)

apiVersion: v1
kind: ConfigMap
metadata:
  name: sample-grafana-dashboard
  labels:
     grafana_dashboard: "1"
data:
  k8s-dashboard.json: |-
  [...]

Grafana 增加 DataSource

在同一个 NS 下, 创建如下 ConfigMap: (只要打上grafana_datasource 这个 label 就会被 Grafana 的 sidecar 自动导入)

apiVersion: v1
kind: ConfigMap
metadata:
  name: loki-loki-stack
  labels:
    grafana_datasource: '1'
data:
  loki-stack-datasource.yaml: |-
    apiVersion: 1
    datasources:
    - name: Loki
      type: loki
      access: proxy
      url: http://loki:3100
      version: 1

Traefik 配置 Grafana IngressRoute

因为我是用的 Traefik 2, 通过 CRD IngressRoute 配置 Ingress, 配置如下:

apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
  name: grafana
spec:
  entryPoints:
    - web
    - websecure
  routes:
    - kind: Rule
      match: Host(`grafana.ewhisper.cn`)
      middlewares:
        - name: hsts-header
          namespace: kube-system
        - name: redirectshttps
          namespace: kube-system
      services:
        - name: loki-grafana
          namespace: monitoring
          port: 80
  tls: {}

最终效果

如下:

Grafana Explore Logs

🎉🎉🎉

📚️参考文档

Grafana 系列文章

Grafana 系列文章

三人行, 必有我师; 知识共享, 天下为公. 本文由东风微鸣技术博客 EWhisper.cn 编写.

与Grafana 系列文章(十四):Helm 安装Loki相似的内容:

Grafana 系列文章(十四):Helm 安装Loki

前言 写或者翻译这么多篇 Loki 相关的文章了, 发现还没写怎么安装 😓 现在开始介绍如何使用 Helm 安装 Loki. 前提 有 Helm, 并且添加 Grafana 的官方源: helm repo add grafana https://grafana.github.io/helm-cha

Grafana系列-统一展示-9-Jaeger数据源

系列文章 Grafana 系列文章 配置 Jaeger data source Grafana内置了对Jaeger的支持,它提供了开源的端到端分布式跟踪。本文解释了针对Jaeger数据源的配置和查询。 关键的配置如下: URL: Jaeger 实例的 URL, 如: http://localhost

Grafana 系列文章(一):基于 Grafana 的全栈可观察性 Demo

📚️Reference: https://github.com/grafana/intro-to-mlt 这是关于 Grafana 中可观察性的三个支柱的一系列演讲的配套资源库。 它以一个自我封闭的 Docker 沙盒的形式出现,包括在本地机器上运行和实验所提供的服务所需的所有组件。 Grafan

Grafana 系列文章(二):使用 Grafana Agent 和 Grafana Tempo 进行 Tracing

👉️URL: https://grafana.com/blog/2020/11/17/tracing-with-the-grafana-cloud-agent-and-grafana-tempo/ ✍Author: Robert Fratto • 17 Nov 2020 📝Description

Grafana 系列文章(三):Tempo-使用 HTTP 推送 Spans

👉️URL: https://grafana.com/docs/tempo/latest/api_docs/pushing-spans-with-http/ 📝Description: 有时,使用追踪系统是令人生畏的,因为它似乎需要复杂的应用程序仪器或 span 摄取管道,以便 ... 有时,使

Grafana 系列文章(四):Grafana Explore

👉️URL: https://grafana.com/docs/grafana/latest/explore/ 📝Description: Explore Grafana 的仪表盘 UI 是关于构建可视化的仪表盘。Explore 剥离了仪表盘和面板选项,这样你就可以。.. Grafana 的仪表

Grafana 系列文章(五):Grafana Explore 查询管理

👉️URL: https://grafana.com/docs/grafana/latest/explore/query-management/ 📝Description: Explore 中的查询管理 为了帮助调试查询,Explore 允许你调查查询请求和响应,以及查询统计数据,... Exp

Grafana 系列文章(六):Grafana Explore 中的日志

👉️URL: https://grafana.com/docs/grafana/latest/explore/logs-integration/#labels-and-detected-fields 📝Description: Explore 中的日志 除了指标之外,Explore 还允许你在以

Grafana 系列文章(七):Grafana Explore 中的 Tracing

👉️URL: https://grafana.com/docs/grafana/latest/explore/trace-integration/ 📝Description: Tracing in Explore Explore 允许你将 tracing 数据源的痕迹可视化。这在 Grafana

Grafana 系列文章(八):Grafana Explore 中的 Inspector

👉️URL: https://grafana.com/docs/grafana/latest/explore/explore-inspector/ 📝Description: Explore 中的检查器 (Inspector). 检查器可以帮助你理解你的查询并排除故障。你可以检查原始数据,把这些