2019 IT邦幫忙鐵人賽

各位好,我是Che-Chia Chang,社群上常用的名子是 David Chang。是個軟體工程師,專長的領域是後端開發,開發維運,容器化應用,以及Kubernetes開發管理。目前為 Golang Taiwan Meetup 的 organizer。

受到友人們邀請(推坑)參加了2020 It邦幫忙鐵人賽,挑戰在30天內,每天發一篇技術分享文章。一方面將工作上遇到的問題與解法分享給社群,另一方面也是給自己一點成長的壓力,把這段時間的心得沈澱下來,因此也了這系列文章。

本系列文章重點有三:

  1. 提供的解決方案,附上一步步的操作步驟。希望讓讀者可以重現完整操作步驟,直接使用,或是加以修改

  2. 著重 Google Cloud Platform,特別是Google Compute Engine (GCE) 與Google Kubernetes Engine (GKE) 兩大服務。這也是我最熟悉的平台,順便推廣,並分享一些雷點。

  3. 從維運的角度除錯,分析問題,提升穩定性。

預定的主題如下(可能會依照實際撰寫狀況微調)

文章發表於鐵人挑戰頁面,同時發布與本站備份。有任何謬誤,還煩請各方大德<3透過底下的聯絡方式聯絡我,感激不盡。


Features

  • step-by-step guide for deployment: guarentee a running deployment on GCP
  • basic configuration, usage, monitoring, networking on GKE
  • debugging, stability analysis in an aspect of devop

Topics

  • ELK stack(8)
    • Deploy self-hosted ELK stack on GCE instance
    • Secure ELK stack with SSL and role-based authentication
    • Monitoring services on Kubernetes with ELK beats
    • Monitoring services on GCE instances
    • Logstash pipelines and debugging walk through
    • Elasticsearch operations: house-cleaning, tuning, pernament storage
    • Elasticsearch maitainence, trouble shooting
    • Get-Started with Elastic Cloud SASS
  • General operations on Kubernetes(4)
    • Kubernetes Debug SOP
    • Kubectl cheat sheet
    • Secure services with SSL by cert-manager
    • Speed up container updating with operator
      • My operator example
  • Deploy Kafka HA on Kubernetes(4)
    • deploy kafka-ha on Kubernertes with helm
    • in-cluster networking configuration for high availability
    • basic app-side usage, performance tuning
    • Operate Kafka: update config, upgrade version, migrate data
  • Promethus / grafana(5)
    • Deploy Prometheus / Grafana stack on GCE instance
    • Monitoring services on Kubernetes with exporters
    • Export Kubernetes metrics to Prometheus
    • Export Redis-ha metrics to Prometheus
    • Export Kafka metrics to Prometheus
  • GCP networking(4)
    • Firewall basic concept for private network with GCE instances & Kubernetes
    • Load balancer for Kubernetes service & ingress
    • DNS on GCP from Kube-dns to GCP DNS service
  • GCP log management(3)
    • Basic usage about GCP logging & GCP Error Report
    • Stackdriver, metrics, alerts
    • Logging on GKE from gcp-fluentd to stackdriver
張哲嘉
張哲嘉
Site Reliability Engineer

我的研究領域包括網站可靠性工程、DevOps、Container和Kubernetes。