提取vms虚拟货币不见了

币圈行情阅读 3 2023-06-18 00:27:09

Bitget下载

注册下载Bitget下载，邀请好友，即有机会赢取 3,000 USDT

APP下载官网注册

写在前面

集群电源不稳定，或者节点动不动就宕机,一定要做好备份，ETCD 的快照文件很容易受影响损坏。
重置了很多次集群，才认识到备份的重要
博文内容涉及 etcd 运维基础知识了解静态 Pod 方式 etcd 集群灾备与恢复 Demo定时备份的任务编写二进制 etcd 集群灾备恢复 Demo
理解不足小伙伴帮忙指正

我所渴求的，無非是將心中脫穎語出的本性付諸生活，為何竟如此艱難呢 ------赫尔曼·黑塞《德米安》

etcd 概述

etcd 是 CoreOS团队于2013年6月发起的开源项目，它的目标是构建一个高可用的分布式键值(key-value)数据库。

etcd 内部采用 raft 协议作为一致性算法，etcd基于Go语言实现。

完全复制：集群中的每个节点都可以使用完整的存档
高可用性：Etcd可用于避免硬件的单点故障或网络问题
一致性：每次读取都会返回跨多主机的最新写入
简单：包括一个定义良好、面向用户的API(gRPC)
安全：实现了带有可选的客户端证书身份验证的自动化TLS
快速：每秒10000次写入的基准速度
可靠：使用Raft算法实现了强一致、高可用的服务存储目录

ETCD 集群运维相关的基本知识：

读写端口为： 2379，数据同步端口： 2380
ETCD集群是一个分布式系统,使用Raft协议来维护集群内各个节点状态的一致性。
主机状态 Leader, Follower, Candidate
当集群初始化时候，每个节点都是Follower角色，通过心跳与其他节点同步数据
通过Follower读取数据，通过Leader写入数据
当Follower在一定时间内没有收到来自主节点的心跳，会将自己角色改变为Candidate，并发起一次选主投票
配置etcd集群，建议尽可能是奇数个节点，而不要偶数个节点,推荐的数量为 3、5 或者 7 个节点构成一个集群。
使用 etcd 的内置备份/恢复工具从源部署备份数据并在新部署中恢复数据。恢复前需要清理数据目录
数据目录下 snap: 存放快照数据,etcd防止WAL文件过多而设置的快照，存储etcd数据状态。
数据目录下 wal: 存放预写式日志,最大的作用是记录了整个数据变化的全部历程。在etcd中，所有数据的修改在提交前，都要先写入到WAL中。
一个 etcd 集群可能不应超过七个节点,写入性能会受影响，建议运行五个节点。一个 5 成员的 etcd 集群可以容忍两个成员故障，三个成员可以容忍1个故障。

常用配置参数：

ETCD_NAME 节点名称，默认为defaul
ETCD_DATA_DIR 服务运行数据保存的路
ETCD_LISTEN_PEER_URLS 监听的同伴通信的地址，比如http://ip:2380，如果有多个，使用逗号分隔。需要所有节点都能够访问，所以不要使用 localhost
ETCD_LISTEN_CLIENT_URLS 监听的客户端服务地址
ETCD_ADVERTISE_CLIENT_URLS 对外公告的该节点客户端监听地址，这个值会告诉集群中其他节点
ETCD_INITIAL_ADVERTISE_PEER_URLS 对外公告的该节点同伴监听地址，这个值会告诉集群中其他节
ETCD_INITIAL_CLUSTER 集群中所有节点的信息
ETCD_INITIAL_CLUSTER_STATE 新建集群的时候，这个值为 new；假如加入已经存在的集群，这个值为existing
ETCD_INITIAL_CLUSTER_TOKEN 集群的ID，多个集群的时候，每个集群的ID必须保持唯一

静态 Pod方式集群备份恢复

单节点ETCD备份恢复

如果 etcd 为单节点部署，可以直接物理备份，直接备份对应的数据文件目录即可，恢复的话可以直接把备份的 etcd 数据目录复制到 etcd 指定的目录。恢复完成需要恢复 /etc/kubernetes/manifests 内 etcd.yaml 文件原来的状态。

也可以基于快照进行备份

备份命令

┌──[root@vms81.liruilongs.github.io]-[/backup_20230127]└─$ETCDCTL_API=3 etcdctl --endpoints="https://127.0.0.1:2379" \ --cert="/etc/kubernetes/pki/etcd/server.crt"  \ --key="/etc/kubernetes/pki/etcd/server.key"  \ --cacert="/etc/kubernetes/pki/etcd/ca.crt"   snapshot save snap-$(date +%Y%m%d%H%M).dbSnapshot saved at snap-202301272133.db

恢复命令

┌──[root@vms81.liruilongs.github.io]-[/backup_20230127]└─$ETCDCTL_API=3 etcdctl snapshot restore ./snap-202301272133.db \ --name vms81.liruilongs.github.io  \ --cert="/etc/kubernetes/pki/etcd/server.crt"  \ --key="/etc/kubernetes/pki/etcd/server.key"  \ --cacert="/etc/kubernetes/pki/etcd/ca.crt" \ --initial-advertise-peer-urls=https://192.168.26.81:2380  \ --initial-cluster="vms81.liruilongs.github.io=https://192.168.26.81:2380"  \ --data-dir=/var/lib/etcd2023-01-27 21:40:01.193420 I | mvcc: restore compact to 4843252023-01-27 21:40:01.199682 I | etcdserver/membership: added member cbf506fa2d16c7 [https://192.168.26.81:2380] to cluster 46c9df5da345274b┌──[root@vms81.liruilongs.github.io]-[/backup_20230127]└─$

具体对应的参数值，可以通过 etcd 静态 pod 的 yaml 文件获取

┌──[root@vms81.liruilongs.github.io]-[/var/lib/etcd/member]└─$kubectl describe  pods etcd-vms81.liruilongs.github.io | grep -e "--"      --advertise-client-urls=https://192.168.26.81:2379      --cert-file=/etc/kubernetes/pki/etcd/server.crt      --client-cert-auth=true      --data-dir=/var/lib/etcd      --initial-advertise-peer-urls=https://192.168.26.81:2380      --initial-cluster=vms81.liruilongs.github.io=https://192.168.26.81:2380      --key-file=/etc/kubernetes/pki/etcd/server.key      --listen-client-urls=https://127.0.0.1:2379,https://192.168.26.81:2379      --listen-metrics-urls=http://127.0.0.1:2381      --listen-peer-urls=https://192.168.26.81:2380      --name=vms81.liruilongs.github.io      --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt      --peer-client-cert-auth=true      --peer-key-file=/etc/kubernetes/pki/etcd/peer.key      --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt      --snapshot-count=10000      --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt┌──[root@vms81.liruilongs.github.io]-[/var/lib/etcd/member]└─$

集群ETCD备份恢复

集群节点状态

┌──[root@vms100.liruilongs.github.io]-[~/ansible/helm]└─$ETCDCTL_API=3 etcdctl  --endpoints https://127.0.0.1:2379  --cert="/etc/kubernetes/pki/etcd/server.crt"  --key="/etc/kubernetes/pki/etcd/server.key"  --cacert="/etc/kubernetes/pki/etcd/ca.crt" member list -w table+------------------+---------+-----------------------------+-----------------------------+-----------------------------+|        ID        | STATUS  |            NAME             |         PEER ADDRS          |        CLIENT ADDRS         |+------------------+---------+-----------------------------+-----------------------------+-----------------------------+|  ee392e5273e89e2 | started | vms100.liruilongs.github.io | https://192.168.26.100:2380 | https://192.168.26.100:2379 || 11486647d7f3a17b | started | vms102.liruilongs.github.io | https://192.168.26.102:2380 | https://192.168.26.102:2379 || e00e3877df8f76f4 | started | vms101.liruilongs.github.io | https://192.168.26.101:2380 | https://192.168.26.101:2379 |+------------------+---------+-----------------------------+-----------------------------+-----------------------------+┌──[root@vms100.liruilongs.github.io]-[~/ansible/helm]

version 及 leader 信息。

┌──[root@vms100.liruilongs.github.io]-[~/ansible/kubescape]└─$ETCDCTL_API=3 etcdctl  --endpoints https://127.0.0.1:2379  --cert="/etc/kubernetes/pki/etcd/server.crt"  --key="/etc/kubernetes/pki/etcd/server.key"  --cacert="/etc/kubernetes/pki/etcd/ca.crt" endpoint status --cluster  -w table+-----------------------------+------------------+---------+---------+-----------+-----------+------------+|          ENDPOINT           |        ID        | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |+-----------------------------+------------------+---------+---------+-----------+-----------+------------+| https://192.168.26.100:2379 |  ee392e5273e89e2 |   3.5.4 |   37 MB |     false |       100 |    3152364 || https://192.168.26.102:2379 | 11486647d7f3a17b |   3.5.4 |   36 MB |     false |       100 |    3152364 || https://192.168.26.101:2379 | e00e3877df8f76f4 |   3.5.4 |   36 MB |      true |       100 |    3152364 |+-----------------------------+------------------+---------+---------+-----------+-----------+------------+┌──[root@vms100.liruilongs.github.io]-[~/ansible/kubescape]└─$

集群情况下，备份可以单节点备份，前面我们也讲过，etcd 集群为完全复制，单节点备份

┌──[root@vms100.liruilongs.github.io]-[~]└─$yum -y install etcd

没有 etcdctl 工具，需要安装一下 etcd 或者从其他的地方单独拷贝一下。这里我们安装下，然后把 etcetl 拷贝到其他集群节点。

备份

┌──[root@vms100.liruilongs.github.io]-[~]└─$ENDPOINT=https://127.0.0.1:2379┌──[root@vms100.liruilongs.github.io]-[~]└─$ETCDCTL_API=3 etcdctl --endpoints $ENDPOINT  --cert="/etc/kubernetes/pki/etcd/server.crt" --key="/etc/kubernetes/pki/etcd/server.key"  --cacert="/etc/kubernetes/pki/etcd/ca.crt" snapshot save snapshot.dbSnapshot saved at snapshot.db

校验快照 hash 值

┌──[root@vms100.liruilongs.github.io]-[~]└─$ETCDCTL_API=3 etcdctl --write-out=table snapshot status snapshot.db+----------+----------+------------+------------+|   HASH   | REVISION | TOTAL KEYS | TOTAL SIZE |+----------+----------+------------+------------+| 46aa26ed |   217504 |       2711 |      27 MB |+----------+----------+------------+------------+┌──[root@vms100.liruilongs.github.io]-[~]└─$

恢复

这里的 etcd 集群部署，采用堆叠的方式，通过静态 pod 运行，位于每个控制节点的上。

一定要备份，恢复前需要把原来的数据文件备份清理，在恢复前需要确保 etcd 和 api-Service 已经停掉。获取必要的参数

┌──[root@vms100.liruilongs.github.io]-[~]└─$kubectl describe pod etcd-vms100.liruilongs.github.io -n kube-system  | grep -e '--'      --advertise-client-urls=https://192.168.26.100:2379      --cert-file=/etc/kubernetes/pki/etcd/server.crt      --client-cert-auth=true      --data-dir=/var/lib/etcd      --experimental-initial-corrupt-check=true      --experimental-watch-progress-notify-interval=5s      --initial-advertise-peer-urls=https://192.168.26.100:2380      --initial-cluster=vms100.liruilongs.github.io=https://192.168.26.100:2380      --key-file=/etc/kubernetes/pki/etcd/server.key      --listen-client-urls=https://127.0.0.1:2379,https://192.168.26.100:2379      --listen-metrics-urls=http://127.0.0.1:2381      --listen-peer-urls=https://192.168.26.100:2380      --name=vms100.liruilongs.github.io      --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt      --peer-client-cert-auth=true      --peer-key-file=/etc/kubernetes/pki/etcd/peer.key      --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt      --snapshot-count=10000      --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt┌──[root@vms100.liruilongs.github.io]-[~]└─$

恢复的时候：停掉所有 Master 节点的 kube-apiserver和 etcd 这两个静态pod 。 kubelet 每隔 20s 会扫描一次这个目录确定是否发生静态 pod 变动。移动Yaml文件即可停掉。

这是使用 Ansible ，集群所有节点执行。

┌──[root@vms100.liruilongs.github.io]-[~/ansible]└─$ansible k8s_master -m command -a "mv  /etc/kubernetes/manifests/etcd.yaml  /tmp/ " -i host.yaml192.168.26.102 | CHANGED | rc=0 >>192.168.26.101 | CHANGED | rc=0 >>192.168.26.100 | CHANGED | rc=0 >>┌──[root@vms100.liruilongs.github.io]-[~/ansible]└─$ansible k8s_master -m command -a "mv   /etc/kubernetes/manifests/kube-apiserver.yaml  /tmp/ " -i host.yaml192.168.26.101 | CHANGED | rc=0 >>192.168.26.102 | CHANGED | rc=0 >>192.168.26.100 | CHANGED | rc=0 >>

确实静态 Yaml 文件发生移动

┌──[root@vms100.liruilongs.github.io]-[~/ansible]└─$ansible k8s_master -m command -a "ls /etc/kubernetes/manifests/" -i host.yaml192.168.26.102 | CHANGED | rc=0 >>haproxy.yamlkeepalived.yamlkube-controller-manager.yamlkube-scheduler.yaml192.168.26.100 | CHANGED | rc=0 >>haproxy.yamlkeepalived.yamlkube-controller-manager.yamlkube-scheduler.yaml192.168.26.101 | CHANGED | rc=0 >>haproxy.yamlkeepalived.yamlkube-controller-manager.yamlkube-scheduler.yaml┌──[root@vms100.liruilongs.github.io]-[~/ansible]└─$

清空所有集群节点的 etcd 数据目录

┌──[root@vms100.liruilongs.github.io]-[~/ansible]└─$ansible k8s_master -m command -a "rm -rf /var/lib/etcd/" -i host.yaml[WARNING]: Consider using the file module with state=absent rather than running 'rm'.  If you need to use command because file is insufficient you can add 'warn:false' to this command task or set 'command_warnings=False' in ansible.cfg to get rid of this message.192.168.26.101 | CHANGED | rc=0 >>192.168.26.102 | CHANGED | rc=0 >>192.168.26.100 | CHANGED | rc=0 >>

复制快照备份文件到集群所有节点

┌──[root@vms100.liruilongs.github.io]-[~/ansible]└─$ansible k8s_master  -m copy -a "src=snap-202302070000.db dest=/root/" -i host.yaml

在 vms100.liruilongs.github.io 上面恢复

┌──[root@vms100.liruilongs.github.io]-[~/ansible]└─$ETCDCTL_API=3 etcdctl snapshot restore snap-202302070000.db \ --name vms100.liruilongs.github.io  \ --cert="/etc/kubernetes/pki/etcd/server.crt" \ --key="/etc/kubernetes/pki/etcd/server.key"  \ --cacert="/etc/kubernetes/pki/etcd/ca.crt"   \ --endpoints="https://127.0.0.1:2379" \ --initial-advertise-peer-urls="https://192.168.26.100:2380"  \ --initial-cluster="vms100.liruilongs.github.io=https://192.168.26.100:2380,vms101.liruilongs.github.io=https://192.168.26.101:2380,vms102.liruilongs.github.io=https://192.168.26.102:2380" \ --data-dir=/var/lib/etcd2023-02-08 12:50:27.598250 I | mvcc: restore compact to 28379932023-02-08 12:50:27.609440 I | etcdserver/membership: added member ee392e5273e89e2 [https://192.168.26.100:2380] to cluster 4816f346663d82a72023-02-08 12:50:27.609480 I | etcdserver/membership: added member 70059e836d19883d [https://192.168.26.101:2380] to cluster 4816f346663d82a72023-02-08 12:50:27.609487 I | etcdserver/membership: added member b8cb9f66c2e63b91 [https://192.168.26.102:2380] to cluster 4816f346663d82a7

在 vms101.liruilongs.github.io 上恢复

┌──[root@vms100.liruilongs.github.io]-[~/ansible]└─$ssh 192.168.26.101Last login: Wed Feb  8 12:48:31 2023 from 192.168.26.100┌──[root@vms101.liruilongs.github.io]-[~]└─$ETCDCTL_API=3 etcdctl snapshot restore snap-202302070000.db --name vms101.liruilongs.github.io  --cert="/etc/kubernetes/pki/etcd/server.crt" --key="/etc/kubernetes/pki/etcd/server.key"  --cacert="/etc/kubernetes/pki/etcd/ca.crt"   --endpoints="https://127.0.0.1:2379" --initial-advertise-peer-urls="https://192.168.26.101:2380"  --initial-cluster="vms100.liruilongs.github.io=https://192.168.26.100:2380,vms101.liruilongs.github.io=https://192.168.26.101:2380,vms102.liruilongs.github.io=https://192.168.26.102:2380" --data-dir=/var/lib/etcd2023-02-08 12:52:21.976748 I | mvcc: restore compact to 28379932023-02-08 12:52:21.991588 I | etcdserver/membership: added member ee392e5273e89e2 [https://192.168.26.100:2380] to cluster 4816f346663d82a72023-02-08 12:52:21.991622 I | etcdserver/membership: added member 70059e836d19883d [https://192.168.26.101:2380] to cluster 4816f346663d82a72023-02-08 12:52:21.991629 I | etcdserver/membership: added member b8cb9f66c2e63b91 [https://192.168.26.102:2380] to cluster 4816f346663d82a7

在 vms102.liruilongs.github.io 上恢复

┌──[root@vms100.liruilongs.github.io]-[~/ansible]└─$ssh 192.168.26.102Last login: Wed Feb  8 12:48:31 2023 from 192.168.26.100┌──[root@vms102.liruilongs.github.io]-[~]└─$ETCDCTL_API=3 etcdctl snapshot restore snap-202302070000.db --name vms102.liruilongs.github.io  --cert="/etc/kubernetes/pki/etcd/server.crt" --key="/etc/kubernetes/pki/etcd/server.key"  --cacert="/etc/kubernetes/pki/etcd/ca.crt"   --endpoints="https://127.0.0.1:2379" --initial-advertise-peer-urls="https://192.168.26.102:2380"--initial-cluster="vms100.liruilongs.github.io=https://192.168.26.100:2380,vms101.liruilongs.github.io=https://192.168.26.101:2380,vms102.liruilongs.github.io=https://192.168.26.102:2380" --data-dir=/var/lib/etcd2023-02-08 12:53:32.338663 I | mvcc: restore compact to 28379932023-02-08 12:53:32.354619 I | etcdserver/membership: added member ee392e5273e89e2 [https://192.168.26.100:2380] to cluster 4816f346663d82a72023-02-08 12:53:32.354782 I | etcdserver/membership: added member 70059e836d19883d [https://192.168.26.101:2380] to cluster 4816f346663d82a72023-02-08 12:53:32.354790 I | etcdserver/membership: added member b8cb9f66c2e63b91 [https://192.168.26.102:2380] to cluster 4816f346663d82a7┌──[root@vms102.liruilongs.github.io]-[~]└─$

恢复完成后移动 etcd,api-service 静态pod 配置文件

┌──[root@vms100.liruilongs.github.io]-[~/ansible]└─$ansible k8s_master -m command -a "mv /tmp/kube-apiserver.yaml  /etc/kubernetes/manifests/ " -i host.yaml192.168.26.101 | CHANGED | rc=0 >>192.168.26.102 | CHANGED | rc=0 >>192.168.26.100 | CHANGED | rc=0 >>┌──[root@vms100.liruilongs.github.io]-[~/ansible]└─$ansible k8s_master -m command -a "mv /tmp/etcd.yaml  /etc/kubernetes/manifests/etcd.yaml " -i host.yaml192.168.26.101 | CHANGED | rc=0 >>192.168.26.102 | CHANGED | rc=0 >>192.168.26.100 | CHANGED | rc=0 >>┌──[root@vms100.liruilongs.github.io]-[~/ansible]└─$

确认移动成功。

┌──[root@vms100.liruilongs.github.io]-[~/ansible]└─$ansible k8s_master -m command -a "ls /etc/kubernetes/manifests/" -i host.yaml192.168.26.100 | CHANGED | rc=0 >>etcd.yamlhaproxy.yamlkeepalived.yamlkube-apiserver.yamlkube-controller-manager.yamlkube-scheduler.yaml192.168.26.101 | CHANGED | rc=0 >>etcd.yamlhaproxy.yamlkeepalived.yamlkube-apiserver.yamlkube-controller-manager.yamlkube-scheduler.yaml192.168.26.102 | CHANGED | rc=0 >>etcd.yamlhaproxy.yamlkeepalived.yamlkube-apiserver.yamlkube-controller-manager.yamlkube-scheduler.yaml┌──[root@vms100.liruilongs.github.io]-[~/ansible]

任意节点查看 etcd 集群信息。恢复成功

┌──[root@vms100.liruilongs.github.io]-[~/ansible]└─$kubectl get podsThe connection to the server 192.168.26.99:30033 was refused - did you specify the right host or port?┌──[root@vms100.liruilongs.github.io]-[~/ansible]└─$ETCDCTL_API=3 etcdctl  --endpoints https://127.0.0.1:2379  --cert="/etc/kubernetes/pki/etcd/server.crt"  --key="/etc/kubernetes/pki/etcd/server.key"  --cacert="/etc/kubernetes/pki/etcd/ca.crt" endpoint status --cluster  -w table+-----------------------------+------------------+---------+---------+-----------+-----------+------------+|          ENDPOINT           |        ID        | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |+-----------------------------+------------------+---------+---------+-----------+-----------+------------+| https://192.168.26.100:2379 |  ee392e5273e89e2 |   3.5.4 |   37 MB |     false |         2 |        146 || https://192.168.26.101:2379 | 70059e836d19883d |   3.5.4 |   37 MB |      true |         2 |        146 || https://192.168.26.102:2379 | b8cb9f66c2e63b91 |   3.5.4 |   37 MB |     false |         2 |        146 |+-----------------------------+------------------+---------+---------+-----------+-----------+------------+┌──[root@vms100.liruilongs.github.io]-[~/ansible]└─$

遇到的问题：

如果某一节点有下面的报错，或者集群节点添加不成功，添加了两个，需要按照上面的步骤重复进行。

panic: tocommit(258) is out of range [lastIndex(0)]. Was the raft log corrupted, truncated, or lost? 问题处理

┌──[root@vms100.liruilongs.github.io]-[~/back]└─$ETCDCTL_API=3 etcdctl  --endpoints https://127.0.0.1:2379  --cert="/etc/kubernetes/pki/etcd/server.crt"  --key="/etc/kubernetes/pki/etcd/server.key"  --cacert="/etc/kubernetes/pki/etcd/ca.crt" endpoint status --cluster  -w table+-----------------------------+------------------+---------+---------+-----------+-----------+------------+|          ENDPOINT           |        ID        | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |+-----------------------------+------------------+---------+---------+-----------+-----------+------------+| https://192.168.26.100:2379 |  ee392e5273e89e2 |   3.5.4 |   37 MB |      true |         2 |      85951 || https://192.168.26.101:2379 | 70059e836d19883d |   3.5.4 |   37 MB |     false |         2 |      85951 |+-----------------------------+------------------+---------+---------+-----------+-----------+------------+

备份定时任务编写

这里的定时备份通过，systemd.service 和 systemd.timer 实现，定时运行 etcd_back.sh 备份脚本，并设置开机自启

很简单没啥说的

┌──[root@vms81.liruilongs.github.io]-[~/back]└─$systemctl cat etcd-backup# /usr/lib/systemd/system/etcd-backup.service[Unit]Description= "ETCD 备份"After=network-online.target[Service]Type=oneshotEnvironment=ETCDCTL_API=3ExecStart=/usr/bin/bash /usr/lib/systemd/system/etcd_back.sh[Install]WantedBy=multi-user.target

每天午夜执行一次

┌──[root@vms81.liruilongs.github.io]-[~/back]└─$systemctl cat etcd-backup.timer# /usr/lib/systemd/system/etcd-backup.timer[Unit]Description="每天备份一次 ETCD"[Timer]OnBootSec=3sOnCalendar=*-*-* 00:00:00Unit=etcd-backup.service[Install]WantedBy=multi-user.target

备份脚本

┌──[root@vms100.liruilongs.github.io]-[~/ansible/backup]└─$cat etcd_back.sh#!/bin/bash#@File    :   erct_break.sh#@Time    :   2023/01/27 23:00:27#@Author  :   Li Ruilong#@Version :   1.0#@Desc    :   ETCD 备份#@Contact :   1224965096@qq.comif [ ! -d /root/back/ ];then   mkdir -p /root/back/fiSTR_DATE=$(date +%Y%m%d%H%M)ETCDCTL_API=3 etcdctl \--endpoints="https://127.0.0.1:2379"  \--cert="/etc/kubernetes/pki/etcd/server.crt"  \--key="/etc/kubernetes/pki/etcd/server.key"  \--cacert="/etc/kubernetes/pki/etcd/ca.crt"   \snapshot save /root/back/snap-${STR_DATE}.dbETCDCTL_API=3 etcdctl --write-out=table snapshot status /root/back/snap-${STR_DATE}.dbsudo chmod  o-w,u-w,g-w  /root/back/snap-${STR_DATE}.db

服务和定时任务的备份部署

┌──[root@vms100.liruilongs.github.io]-[~/ansible/backup]└─$cat deply.sh#!/bin/bash#@File    :   erct_break.sh#@Time    :   2023/01/27 23:00:27#@Author  :   Li Ruilong#@Version :   1.0#@Desc    :   ETCD 备份部署#@Contact :   1224965096@qq.comcp ./* /usr/lib/systemd/system/systemctl enable etcd-backup.timer --nowsystemctl enable etcd-backup.service --nowls /root/back/

日志查看

┌──[root@vms100.liruilongs.github.io]-[~/ansible/backup]└─$journalctl -u etcd-backup.service -o cat...................Starting "ETCD 备份"...Snapshot saved at /root/back/snap-202301290120.db+----------+----------+------------+------------+|   HASH   | REVISION | TOTAL KEYS | TOTAL SIZE |+----------+----------+------------+------------+| 74323316 |   640319 |       2250 |      27 MB |+----------+----------+------------+------------+Started "ETCD 备份".Starting "ETCD 备份"...Snapshot saved at /root/back/snap-202301290120.db+----------+----------+------------+------------+|   HASH   | REVISION | TOTAL KEYS | TOTAL SIZE |+----------+----------+------------+------------+| e75a16bf |   640325 |       2255 |      27 MB |+----------+----------+------------+------------+Started "ETCD 备份".Starting "ETCD 备份"...Snapshot saved at /root/back/snap-202301290121.db+----------+----------+------------+------------+|   HASH   | REVISION | TOTAL KEYS | TOTAL SIZE |+----------+----------+------------+------------+| eb5e9e86 |   640388 |       2318 |      27 MB |+----------+----------+------------+------------+Started "ETCD 备份".Starting "ETCD 备份"...Snapshot saved at /root/back/snap-202301290121.db+----------+----------+------------+------------+|   HASH   | REVISION | TOTAL KEYS | TOTAL SIZE |+----------+----------+------------+------------+| 30a91bb6 |   640402 |       2333 |      27 MB |+----------+----------+------------+------------+Started "ETCD 备份".

二进制集群备份恢复

二进制集群的备份恢复和静态 pod 的方式基本相同。

这里不同的是，下面的恢复方式使用，先恢复前两个节点，构成集群，第三个节点加入集群的方式。当前集群信息

┌──[root@vms81.liruilongs.github.io]-[~/ansible]└─$ansible etcd -m shell -a "etcdctl member list"192.168.26.101 | CHANGED | rc=0 >>2fd4f9ba70a04579: name=etcd-102 peerURLs=http://192.168.26.102:2380 clientURLs=http://192.168.26.102:2379,http://localhost:2379 isLeader=false6f2038a018db1103: name=etcd-100 peerURLs=http://192.168.26.100:2380 clientURLs=http://192.168.26.100:2379,http://localhost:2379 isLeader=falsebd330576bb637f25: name=etcd-101 peerURLs=http://192.168.26.101:2380 clientURLs=http://192.168.26.101:2379,http://localhost:2379 isLeader=true192.168.26.102 | CHANGED | rc=0 >>2fd4f9ba70a04579: name=etcd-102 peerURLs=http://192.168.26.102:2380 clientURLs=http://192.168.26.102:2379,http://localhost:2379 isLeader=false6f2038a018db1103: name=etcd-100 peerURLs=http://192.168.26.100:2380 clientURLs=http://192.168.26.100:2379,http://localhost:2379 isLeader=falsebd330576bb637f25: name=etcd-101 peerURLs=http://192.168.26.101:2380 clientURLs=http://192.168.26.101:2379,http://localhost:2379 isLeader=true192.168.26.100 | CHANGED | rc=0 >>2fd4f9ba70a04579: name=etcd-102 peerURLs=http://192.168.26.102:2380 clientURLs=http://192.168.26.102:2379,http://localhost:2379 isLeader=false6f2038a018db1103: name=etcd-100 peerURLs=http://192.168.26.100:2380 clientURLs=http://192.168.26.100:2379,http://localhost:2379 isLeader=falsebd330576bb637f25: name=etcd-101 peerURLs=http://192.168.26.101:2380 clientURLs=http://192.168.26.101:2379,http://localhost:2379 isLeader=true┌──[root@vms81.liruilongs.github.io]-[~/ansible]└─$

准备数据

┌──[root@vms81.liruilongs.github.io]-[~/ansible]└─$ansible 192.168.26.100 -a  "etcdctl put name liruilong"192.168.26.100 | CHANGED | rc=0 >>OK┌──[root@vms81.liruilongs.github.io]-[~/ansible]└─$ansible etcd -a "etcdctl get name"192.168.26.102 | CHANGED | rc=0 >>nameliruilong192.168.26.100 | CHANGED | rc=0 >>nameliruilong192.168.26.101 | CHANGED | rc=0 >>nameliruilong

在任意一台主机上对 etcd 做快照

#在任何一台主机上对 etcd 做快照┌──[root@vms81.liruilongs.github.io]-[~/ansible]└─$ansible 192.168.26.101 -a "etcdctl snapshot save snap20211010.db"192.168.26.101 | CHANGED | rc=0 >>Snapshot saved at snap20211010.db┌──[root@vms81.liruilongs.github.io]-[~/ansible]└─$

此快照里包含了刚刚写的数据 name=liruilong，然后把快照文件复制到所有节点

┌──[root@vms81.liruilongs.github.io]-[~/ansible]└─$ansible 192.168.26.101 -a "scp /root/snap20211010.db root@192.168.26.100:/root/"192.168.26.101 | CHANGED | rc=0 >>┌──[root@vms81.liruilongs.github.io]-[~/ansible]└─$ansible 192.168.26.101 -a "scp /root/snap20211010.db root@192.168.26.102:/root/"192.168.26.101 | CHANGED | rc=0 >>┌──[root@vms81.liruilongs.github.io]-[~/ansible]└─$

清空数据所有节点数据

┌──[root@vms81.liruilongs.github.io]-[~/ansible]└─$ansible etcd -a "etcdctl del name"192.168.26.101 | CHANGED | rc=0 >>1192.168.26.102 | CHANGED | rc=0 >>0192.168.26.100 | CHANGED | rc=0 >>0┌──[root@vms81.liruilongs.github.io]-[~/ansible]└─$

在所有节点上关闭 etcd，并删除/var/lib/etcd/里所有数据：

┌──[root@vms81.liruilongs.github.io]-[~/ansible]└─$# 在所有节点上关闭 etcd，并删除/var/lib/etcd/里所有数据：┌──[root@vms81.liruilongs.github.io]-[~/ansible]└─$ansible etcd -a "systemctl stop etcd"192.168.26.100 | CHANGED | rc=0 >>192.168.26.102 | CHANGED | rc=0 >>192.168.26.101 | CHANGED | rc=0 >>┌──[root@vms81.liruilongs.github.io]-[~/ansible]└─$ansible etcd -m shell -a "rm -rf /var/lib/etcd/*"[WARNING]: Consider using the file module with state=absent rather than running 'rm'.  If you need touse command because file is insufficient you can add 'warn: false' to this command task or set'command_warnings=False' in ansible.cfg to get rid of this message.192.168.26.102 | CHANGED | rc=0 >>192.168.26.100 | CHANGED | rc=0 >>192.168.26.101 | CHANGED | rc=0 >>

在所有节点上把快照文件的所有者和所属组设置为 etcd：

┌──[root@vms81.liruilongs.github.io]-[~/ansible]└─$ansible etcd -a "chown etcd.etcd /root/snap20211010.db"[WARNING]: Consider using the file module with owner rather than running 'chown'.  If you need to usecommand because file is insufficient you can add 'warn: false' to this command task or set'command_warnings=False' in ansible.cfg to get rid of this message.192.168.26.100 | CHANGED | rc=0 >>192.168.26.102 | CHANGED | rc=0 >>192.168.26.101 | CHANGED | rc=0 >>┌──[root@vms81.liruilongs.github.io]-[~/ansible]└─$# 在每台节点上开始恢复数据：

在 100,101 节点上开始恢复数据

┌──[root@vms81.liruilongs.github.io]-[~/ansible]└─$ansible 192.168.26.100 -m script  -a "./snapshot_restore.sh"192.168.26.100 | CHANGED => {    "changed": true,    "rc": 0,    "stderr": "Shared connection to 192.168.26.100 closed.\r\n",    "stderr_lines": [        "Shared connection to 192.168.26.100 closed."    ],    "stdout": "2021-10-10 12:14:30.726021 I | etcdserver/membership: added member 6f2038a018db1103 [http://192.168.26.100:2380] to cluster af623437f584d792\r\n2021-10-10 12:14:30.726234 I | etcdserver/membership: added member bd330576bb637f25 [http://192.168.26.101:2380] to cluster af623437f584d792\r\n",    "stdout_lines": [        "2021-10-10 12:14:30.726021 I | etcdserver/membership: added member 6f2038a018db1103 [http://192.168.26.100:2380] to cluster af623437f584d792",        "2021-10-10 12:14:30.726234 I | etcdserver/membership: added member bd330576bb637f25 [http://192.168.26.101:2380] to cluster af623437f584d792"    ]}┌──[root@vms81.liruilongs.github.io]-[~/ansible]└─$cat -n ./snapshot_restore.sh     1  #!/bin/bash     2     3  # 每台节点恢复镜像     4     5  etcdctl snapshot restore /root/snap20211010.db \     6  --name etcd-100 \     7  --initial-advertise-peer-urls="http://192.168.26.100:2380" \     8  --initial-cluster="etcd-100=http://192.168.26.100:2380,etcd-101=http://192.168.26.101:2380" \     9  --data-dir="/var/lib/etcd/cluster.etcd"    10        ┌──[root@vms81.liruilongs.github.io]-[~/ansible]└─$sed '6,7s/100/101/g' ./snapshot_restore.sh#!/bin/bash# 每台节点恢复镜像etcdctl snapshot restore /root/snap20211010.db \--name etcd-101 \--initial-advertise-peer-urls="http://192.168.26.101:2380" \--initial-cluster="etcd-100=http://192.168.26.100:2380,etcd-101=http://192.168.26.101:2380" \--data-dir="/var/lib/etcd/cluster.etcd"┌──[root@vms81.liruilongs.github.io]-[~/ansible]└─$sed -i '6,7s/100/101/g' ./snapshot_restore.sh┌──[root@vms81.liruilongs.github.io]-[~/ansible]└─$cat ./snapshot_restore.sh#!/bin/bash# 每台节点恢复镜像etcdctl snapshot restore /root/snap20211010.db \--name etcd-101 \--initial-advertise-peer-urls="http://192.168.26.101:2380" \--initial-cluster="etcd-100=http://192.168.26.100:2380,etcd-101=http://192.168.26.101:2380" \--data-dir="/var/lib/etcd/cluster.etcd"┌──[root@vms81.liruilongs.github.io]-[~/ansible]└─$ansible 192.168.26.101 -m script  -a "./snapshot_restore.sh"192.168.26.101 | CHANGED => {    "changed": true,    "rc": 0,    "stderr": "Shared connection to 192.168.26.101 closed.\r\n",    "stderr_lines": [        "Shared connection to 192.168.26.101 closed."    ],    "stdout": "2021-10-10 12:20:26.032754 I | etcdserver/membership: added member 6f2038a018db1103 [http://192.168.26.100:2380] to cluster af623437f584d792\r\n2021-10-10 12:20:26.032930 I | etcdserver/membership: added member bd330576bb637f25 [http://192.168.26.101:2380] to cluster af623437f584d792\r\n",    "stdout_lines": [        "2021-10-10 12:20:26.032754 I | etcdserver/membership: added member 6f2038a018db1103 [http://192.168.26.100:2380] to cluster af623437f584d792",        "2021-10-10 12:20:26.032930 I | etcdserver/membership: added member bd330576bb637f25 [http://192.168.26.101:2380] to cluster af623437f584d792"    ]}┌──[root@vms81.liruilongs.github.io]-[~/ansible]└─$

所有节点把/var/lib/etcd 及里面内容的所有者和所属组改为 etcd:etcd 然后分别启动 etcd

┌──[root@vms81.liruilongs.github.io]-[~/ansible]└─$ansible etcd -a "chown -R etcd.etcd /var/lib/etcd/"[WARNING]: Consider using the file module with owner rather than running 'chown'.  If you need to usecommand because file is insufficient you can add 'warn: false' to this command task or set'command_warnings=False' in ansible.cfg to get rid of this message.192.168.26.100 | CHANGED | rc=0 >>192.168.26.101 | CHANGED | rc=0 >>192.168.26.102 | CHANGED | rc=0 >>┌──[root@vms81.liruilongs.github.io]-[~/ansible]└─$ansible etcd -a "systemctl start etcd"192.168.26.102 | FAILED | rc=1 >>Job for etcd.service failed because the control process exited with error code. See "systemctl status etcd.service" and "journalctl -xe" for details.non-zero return code192.168.26.101 | CHANGED | rc=0 >>192.168.26.100 | CHANGED | rc=0 >>┌──[root@vms81.liruilongs.github.io]-[~/ansible]└─$

把剩下的 102 节点添加进集群

# etcdctl member add etcd_name –peer-urls=”https://peerURLs”[root@vms100 cluster.etcd]# etcdctl member add etcd-102 --peer-urls="http://192.168.26.102:2380"Member fbd8a96cbf1c004d added to cluster af623437f584d792ETCD_NAME="etcd-102"ETCD_INITIAL_CLUSTER="etcd-100=http://192.168.26.100:2380,etcd-101=http://192.168.26.101:2380,etcd-102=http://192.168.26.102:2380"ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.26.102:2380"ETCD_INITIAL_CLUSTER_STATE="existing"[root@vms100 cluster.etcd]#

测试恢复结果

┌──[root@vms81.liruilongs.github.io]-[~/ansible]└─$ansible 192.168.26.102 -m copy -a "src=./etcd.conf dest=/etc/etcd/etcd.conf force=yes"192.168.26.102 | SUCCESS => {    "ansible_facts": {        "discovered_interpreter_python": "/usr/bin/python"    },    "changed": false,    "checksum": "2d8fa163150e32da563f5e591134b38cc356d237",    "dest": "/etc/etcd/etcd.conf",    "gid": 0,    "group": "root",    "mode": "0644",    "owner": "root",    "path": "/etc/etcd/etcd.conf",    "size": 574,    "state": "file",    "uid": 0}┌──[root@vms81.liruilongs.github.io]-[~/ansible]└─$ansible 192.168.26.102 -m shell -a "systemctl enable etcd --now"192.168.26.102 | CHANGED | rc=0 >>┌──[root@vms81.liruilongs.github.io]-[~/ansible]└─$ansible etcd -m shell -a "etcdctl member list"192.168.26.101 | CHANGED | rc=0 >>6f2038a018db1103, started, etcd-100, http://192.168.26.100:2380, http://192.168.26.100:2379,http://localhost:2379bd330576bb637f25, started, etcd-101, http://192.168.26.101:2380, http://192.168.26.101:2379,http://localhost:2379fbd8a96cbf1c004d, started, etcd-102, http://192.168.26.102:2380, http://192.168.26.102:2379,http://localhost:2379192.168.26.100 | CHANGED | rc=0 >>6f2038a018db1103, started, etcd-100, http://192.168.26.100:2380, http://192.168.26.100:2379,http://localhost:2379bd330576bb637f25, started, etcd-101, http://192.168.26.101:2380, http://192.168.26.101:2379,http://localhost:2379fbd8a96cbf1c004d, started, etcd-102, http://192.168.26.102:2380, http://192.168.26.102:2379,http://localhost:2379192.168.26.102 | CHANGED | rc=0 >>6f2038a018db1103, started, etcd-100, http://192.168.26.100:2380, http://192.168.26.100:2379,http://localhost:2379bd330576bb637f25, started, etcd-101, http://192.168.26.101:2380, http://192.168.26.101:2379,http://localhost:2379fbd8a96cbf1c004d, started, etcd-102, http://192.168.26.102:2380, http://192.168.26.102:2379,http://localhost:2379┌──[root@vms81.liruilongs.github.io]-[~/ansible]└─$ansible etcd -a "etcdctl get name"192.168.26.102 | CHANGED | rc=0 >>nameliruilong192.168.26.101 | CHANGED | rc=0 >>nameliruilong192.168.26.100 | CHANGED | rc=0 >>nameliruilong┌──[root@vms81.liruilongs.github.io]-[~/ansible]└─$