從零開始的 Infrastructure as Code: Terraform - 01
Infrastucture as Code: introduce Terraform from scratch
This article is part of 從零開始的 Infrastructu as Code: Terraform
- Get-started examples / SOP on Github
- Introducation to Terraform Iac: Speaker transcript
- Presentation
Check my website chechia.net for other blog. Follow my page to get notification. Like my page if you really like it :)
Outlline
- our story: issues, steps, & results
- basics IaC, terraform
- benefits
- risks and 坑
- to be or not to be
experience oriented
Our stories
- 100+ devs, many teams
- 25+ projects
- 50+ GKEs
- 80+ SQLs
- IAMs, redis, VPCs, load-balancers, …
Issues
- Ops manually create resources through GUI by SOP.
- We have many isolated, separeated resources, VPCs. It’s our culture, and we (devops) want to change.
- Some projects have short life-cycle. Rapid resources created & destroy.
Our user story
As a devops, I would like to introduce terraform (IaC) so that I can
- review all existing resources
- minimize error from manual operation
- ASAP!!
As a devops, I would like to fully enforce terraform (IaC) so that I can
- minimize efforts to operate infra
- delegate infra operations to junior team members
- minimize IAM privilges
Introduction
- import existing resources
- review existing resources code
- plan best practice resource templates
- create new resources with templates
- introduce git workflow, plan, commit, PR, and review
- add wrapper handler
- automation pipeline
- repeat 2-4
IaC
- Programatic way to operate infra
- declarative (functional) vs. imperative (procedural)
- Perfect for public cloud, cloud native, virtualized resources
- Benefits: cost (reduction), speed (faster execution) and risk (remove errors and security violations)
Terraform
- Declarative (functional) IaC
- Invoke API delegation
- State management
- providers: azure / aws / gcp /alicloud / …
Demo
https://github.com/chechiachang/terraform-playground
Scope
-
Compute Instances
-
Kubernetes
-
Databases
-
IAM
-
Networking
-
Load Balancer
Expected benefits
-
Minimize manual operation.
-
Zero manual operation error
- Standarized infra. Infra as a (stable) product.
-
fast, really fast to duplicate envs
-
Infra workflow with infra review
- Easy to create identical dev, staging, prod envs
- Reviewed infra. Better workflow. Code needs reviews, so do infra.
-
Fully automized infra pipeline.
Other Benefits
- Don’t afraid to change prod sites anymore
- We made a massive infra migration in this quater!!
- Better readability to GUI. Allow comment everywhere.
Risks
- Incorrect usage could cause massive destruction.
- 如果看見 destroy 的提示,請雙手離開鍵盤。 ~ first line in our SOP
- If see “destroy”, cancel operation & call for help.
- State management
- A little latency between infra version and terraform provider version
Reduce Risks
- Sufficient understanding to infra & terraform
- Sufficient training to juniors
- Minimize IAM privilege: remove update / delete permissions
Git-flow
- edit tf
- push new branch commit
- PR, review & discussion
- merge & apply
- revert to previous tag if necessary
(Utility) Provide template
- wrap resources for
- better accesibility
- lower operation risks
- uniform naming convention
- best practice
- suggested default value
About introducing new tool
- The hardest part is always people
- Focus on critical issues (痛點) instead of tool itself. “We introduce tool to solve…”
- Put result into statistics “The outage due to misconfig is reduced by…”
Overall, my IaC experience is GREAT!
- IaC to automation.
- Comment (for infra) is important. You have to write doc anyway. Why not put in IaC?
Q&A
- Full transcript
- Presentation file
- Source Code on Github
- chechia.net <- full contents
- Follow my page to get notification
- Like it if you really like it :)
Appendix.I more about terraform
terraform validate terraform import terraform module terraform cloud & state management
Appendix.I understand State conflict
- Shared but synced
- watch out for state conflicts when colaborating
- state diff. could cause terraform mis-plan
- Solution: synced state lock
- Colatorative edit (git branch & PR), synchronized terraform plan & apply
- or better: automation
Appendix.II understand resources from API aspect
GCP Load Balancer
GCP Load Balancing
-
understand resources from API aspect
- how terraform work with GCP API
-
internal
- regional
- pass-through: tcp / udp -> internal TCP/UDP
- proxy: http / https -> internal HTTP(S)
- regional
-
external
- regional
- pass-through: tcp / udp -> tcp/udp network
- global / effective regional
- proxy
- tcp -> TCP Proxy
- ssl -> SSL Proxy
- http / https -> External HTTP(S)
- proxy
- regional
Terraform Resource
-
forwarding_rule
- forwarding_rule: tcp & http
- global_forwarding_rule: only http
-
backend_service
- backend_service
- health_check
- http_health_check
- https_health_check
- region_backend_service
- region_health_check
- region_http_health_check
- region_https_health_check
- backend_service
Some ways to do IaC
- Cloud Formation
- bash script with API / client
引言 Infrastructure as Code
從字面上解釋,IaC 就是用程式碼描述 infrastructure。那為何會出現這個概念?
如果不 IaC 是什麼狀況?我們還是可以透過 GUI 或是 API 操作。隨叫隨用
雲端運算風行,工程師可以很在 GUI 介面上,很輕易的部署資料中心的架構。輸入基本資訊,滑鼠點個一兩下,就可以在遠端啟用運算機器,啟用資料庫,設置虛擬網路與路由,幾分鐘就可以完成架設服務的基礎建設(infrastructure),開始運行服務。
然而隨著
- 雲平台提供更多新的(複雜的)服務
- 服務彼此可能是有相依性(dependency),服務需要仰賴其他服務
- 或是動態耦合,更改服務會連動其他服務,一髮動全身
- 需要縝密的存取控管(access control)
- 防火牆,路由規則
- 雲平台上,團隊成員的存取權限
- 專案的規模與複雜度增加
- 多環境的部署
- 多個備援副本設定
- 大量機器形成的集群
IaC 的實際需求
以下這些對話是不是很耳熟?
IaC 的實際需求
沒有需求,就不需要找尋新的解決方案。
有看上面目錄的朋友,應該知道這系列文章的後面,我會實際分享於公司內部導入 Terraform 與 IaC 方法的過程。
各位讀者會找到這篇文,大概都是因為實際搜尋了 Terraform 或是 IaC 的關鍵字才找到這篇。
如果沒有需求,自己因為覺得有趣而拉下來研究,
如果沒有明確需求,就貿然導入 無謂增加亂度
IaC 的實現工具
為何選擇 Terraform
建議
- 如果不熟,從 import 現有最好的資源開始。把 70 分保住,再向 80 90 邁進。
- 善用 module 封裝,只露出會用到的參數。