f10@t's blog

Kubernetes 集群环境搭建使用及问题记录(一)

字数统计: 3.6k阅读时长: 18 min
2021/08/03

本篇将搭建一个单master节点的集群(后续补多master)并测试其可用性。整体上过程不是很复杂,就是中间几个报错耗点时间,所有我搭建中出现的问题也有记录

本文操作系统Centos 7;kubernetes版本:v1.21.3;docker版本:20.10.7。

安装一个单Master节点的Kubernetes集群

大体流程

  • 第一步:安装三台虚拟机(本文选择Centos 7最小安装)
  • 对每一个虚拟机进行配置操作(关闭SELinux、关闭防火墙、关闭swap、配置hosts信息、配置静态IP、配置软件源)
  • 在三个节点上装docker、kubelet、kubeadm、kubectl并进行配置(配置docker、kubernetes为阿里源)
  • 在master节点上使用kubeadm方式进行初始化并执行相关操作
  • 各个node节点加入集群
  • 配置网络组件(本文使用flannel
  • 通过建立一个nginx的pod来测试集群

准备工作

基本工作

首先安装三台Linux机器,这个没啥要求,我这里是Centos7下面对每一台机器进行以下处理:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
# 1. 更换软件源
## 备份
cp -a /etc/yum.repos.d/CentOS-Base.repo /etc/yum.repos.d/CentOS-Base.repo.bak
## 更换为华为源
sed -i "s/#baseurl/baseurl/g" /etc/yum.repos.d/CentOS-Base.repo
sed -i "s/mirrorlist=http/#mirrorlist=http/g" /etc/yum.repos.d/CentOS-Base.repo
sed -i "s@http://mirror.centos.org@https://repo.huaweicloud.com@g" /etc/yum.repos.d/CentOS-Base.repo
## 清除缓存并更新
yum clean all
yum makecache

# 2. 关闭防火墙
systemctl stop firewalld
systemctl disable firewalld

# 3. 永久关闭selinux
sed -i 's/enforcing/disabled' /etc/selinux/config
setenforce 0

# 4. 关闭swap
swapoff -a # 临时
sed -ri 's/.*swap.*/#&/' /etc/fstab

# 5. 设置主机名
hostnamectl set-hostname <hostname>

# 6. 为每一台机器添加hosts,比如我的环境:/etc/hosts
10.10.10.101 k8s-master01
10.10.10.102 k8s-node01
10.10.10.103 k8s-node02

# 7. 将IPv4流量传递到iptables
cat > /etc/sysctl.d/k8s.conf << EOF
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
# 生效
sysctl --system

# 8. 同步时间
yum install ntpdate -y
ntpdate time.windows.com

# 9. 安装docker
## 安装必要系统工具
yum install -y yum-utils device-mapper-persistent-data lvm2
## 添加软件源信息(阿里源)
wget -O /etc/yum.repos.d/docker-ce.repo https://mirrors。aliyun.com/docker-ce/linux/centos/docker-ce.repo
## 安装Docker并启动、设置自启
yum install docker-ce
systemctl start docker && systemctl enable docker

# 10. 安装kubernetes
## 添加软件源信息
cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF
## 安装kubelet、kubeadm、kubectl,并设置自启(暂时不需要启动)
yum install -y kubelet kubeadm kubectl
systemctl enable kubelet

配置工作

主要是配置Docker镜像加速器,这个建议自己去阿里云注册账号,然后去控制台里看自己的镜像加速器地址。注意,也需要吧docker的cgroupdriver修改为systemd形式,与k8s保持一致。(不然后续会报错)

1
2
3
4
5
6
cat > /etc/docker/daemon.json << EOF
{
"exec-opts": ["native.cgroupdriver=systemd"]
"registry-mirrors": ["https://xxx.mirror.aliyuncs.com"]
}
EOF

初始化master节点

我的初始化命令如下,注意,这是在master机器上,node节点时无需init的,后面直接join就行了。

1
kubeadm init --apiserver-advertise-address=10.10.10.101 --image-repository registry.aliyuncs.com/google_containers --kubernetes-version v1.21.3 --service-cidr=10.10.11.0/24 --pod-network-cidr=10.10.12.0/24

其中每一个参数含义如下:

  • apiserver-advertise-address:本节点的IP
  • image-repository:镜像地址,就用阿里云的就行了
  • kubernetes-version:Kubernetes的版本号,用kubelet --version就能看到
  • service-cidr:服务的CIDR网段
  • pod-network-cidr:pod节点之间互通的的CIDR网段

终端输入记录如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
[root@k8s-master01 ~]# kubeadm init --apiserver-advertise-address=10.10.10.101 --image-repository registry.aliyuncs.com/google_containers --kubernetes-version v1.21.3 --service-cidr=10.10.11.0/24 --pod-network-cidr=10.10.12.0/24
[init] Using Kubernetes version: v1.21.3
[preflight] Running pre-flight checks
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR ImagePull]: failed to pull image registry.aliyuncs.com/google_containers/coredns:v1.8.0: output: Error response from daemon: manifest for registry.aliyuncs.com/google_containers/coredns:v1.8.0 not found: manifest unknown: manifest unknown
, error: exit status 1
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher

# 重新拉取失败镜像
[root@k8s-master01 ~]# docker pull registry.aliyuncs.com/google_containers/coredns:1.8.0
1.8.0: Pulling from google_containers/coredns
c6568d217a00: Pull complete
5984b6d55edf: Pull complete
Digest: sha256:cc8fb77bc2a0541949d1d9320a641b82fd392b0d3d8145469ca4709ae769980e
Status: Downloaded newer image for registry.aliyuncs.com/google_containers/coredns:1.8.0
registry.aliyuncs.com/google_containers/coredns:1.8.0

[root@k8s-master01 ~]# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
registry.aliyuncs.com/google_containers/kube-apiserver v1.21.3 3d174f00aa39 2 weeks ago 126MB
registry.aliyuncs.com/google_containers/kube-scheduler v1.21.3 6be0dc1302e3 2 weeks ago 50.6MB
registry.aliyuncs.com/google_containers/kube-controller-manager v1.21.3 bc2bb319a703 2 weeks ago 120MB
registry.aliyuncs.com/google_containers/kube-proxy v1.21.3 adb2816ea823 2 weeks ago 103MB
registry.aliyuncs.com/google_containers/pause 3.4.1 0f8457a4c2ec 6 months ago 683kB
registry.aliyuncs.com/google_containers/coredns 1.8.0 296a6d5035e2 9 months ago 42.5MB
registry.aliyuncs.com/google_containers/etcd 3.4.13-0 0369cf4303ff 11 months ago 253MB

# 重新tag
[root@k8s-master01 ~]# docker tag registry.aliyuncs.com/google_containers/coredns:1.8.0 registry.aliyuncs.com/google_containers/coredns:v1.8.0
[root@k8s-master01 ~]# docker rmi registry.aliyuncs.com/google_containers/coredns:1.8.0
Untagged: registry.aliyuncs.com/google_containers/coredns:1.8.0

# 重新初始化
[root@k8s-master01 ~]# kubeadm init --apiserver-advertise-address=10.10.10.101 --image-repository registry.aliyuncs.com/google_containers --kubernetes-version v1.21.3 --service-cidr=10.10.11.0/24 --pod-network-cidr=10.10.12.0/24
[init] Using Kubernetes version: v1.21.3
[preflight] Running pre-flight checks
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [k8s-master01 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.10.11.1 10.10.10.101]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [k8s-master01 localhost] and IPs [10.10.10.101 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [k8s-master01 localhost] and IPs [10.10.10.101 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 28.122070 seconds
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.21" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --upload-certs
[mark-control-plane] Marking the node k8s-master01 as control-plane by adding the labels: [node-role.kubernetes.io/master(deprecated) node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers]
[mark-control-plane] Marking the node k8s-master01 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: 42umpk.lqtmjnryhec3oj4f
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 10.10.10.101:6443 --token 42umpk.lqtmjnryhec3oj4f \
--discovery-token-ca-cert-hash sha256:2a05b1260021049e27aa798119b78582f02f0ce4dc80652b71fa361d69ed56d7

最后一行可以看到给出了加入该集群的token,我们直接复制这个去子节点执行就行了。

子节点加入

直接执行上面那个命令,终端输出如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
[root@k8s-node01 ~]# kubeadm join 10.10.10.101:6443 --token 42umpk.lqtmjnryhec3oj4f --discovery-token-ca-cert-hash sha256:2a05b1260021049e27aa798119b78582f02f0ce4dc80652b71fa361d69ed56d7
[preflight] Running pre-flight checks
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...

This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the control-plane to see this node join the cluster.

两个节点都加入后的结果:

image-20210804095015676

安装网络插件(Flannel)

你可以使用kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml来执行,但是如果初始化的时候你的--pod-network-cidr参数不是flannel默认的10.244.0.0/16的话,我建议你下载下来手动修改一下网络参数,然后apply -f kube-flannel.yml就可以了,结果如下:

image-20210804095357877
image-20210804095412669

创建pod来测试集群

我们先创建一个nginx的pod,命令如下:

1
2
3
4
5
6
// 创建pod
kubectl create deployment nginx --image=nginx
// 等待pod运行起来了后,暴露其端口
kubectl expose deployment nginx --port=80 --type=NodePort
// 查看服务状态,结果如下图
kubectl get pods,svc -o wide
image-20210804105802347

我们可以看到nginx这个pod被分到了k8s-node01节点(本机10.10.10.101,k8s-node01为10.10.10.102)上,且暴露到外部的端口为32700,我们访问即可:

image-20210804105910221

出现的问题

kubeadm初始化 - failed to pull image registry.aliyuncs.com/google_containers/coredns:v1.8.0

问题:即拉不到镜像

这算是个小bug吧,我们用的是阿里云的源,阿里那边给这个镜像打的标签是1.8.0而不是v1.8.0所以拉不到,所以我们需要手动从阿里云拉取coredns(标签是:1.8.0),然后手动把标签改成v1.8.0,再次初始化就好了。

1
2
3
4
5
6
# 先拉取这个需要的镜像
docker pull registry.aliyuncs.com/google_containers/coredns:1.8.0
# 重新打tag
docker tag registry.aliyuncs.com/google_containers/coredns:1.8.0 registry.aliyuncs.com/google_containers/coredns:v1.8.0
# 删除原有镜像
docker rmi registry.aliyuncs.com/google_containers/coredns:1.8.0

kubectl get nodes报错The connection to the server localhost:8080 was refused - did you specify the right host or port?

问题:不能使用Kubectl

解决方法:首先这个命令是在主节点上执行的,不是在Node节点上运行的。其次,注意要使用这个命令必须你的家目录下有.kube文件夹。用当时建立.kube的用户来执行这个命令。我们刚初始化完主节点的时候有一个提示:

1
2
3
4
5
To start using your cluster, you need to run the following as a regular user:

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

比如我用float用户建立的那就用这个用户执行这个命令。如果你用root的话可以执行这个命令:export KUBECONFIG=/etc/kubernetes/admin.conf,或者直接用root创建.kube就行了。

Flannel网络插件 - 安装Flannel网络时出现镜像不能成功拉取

kubectl apply -f kube-flannel.yml加载flannel模块后pod起不来的问题,可能出现ImagePullBackOff等错误

问题如图:

image-20210803183425153

这是因为默认镜像来源quay.io访问不到的问题,阿里云暂时也没有对kubernetes的组件做镜像,所以这里你只能去github手动下载然后docker load镜像,flannel地址https://github.com/flannel-io/flannel#flannel。导入结果如下:

image-20210803195123781

然后你可以手动把镜像拉取失败的pod删掉:kubectl delete pod -n kube-system xxxx,然后它会自动重建pod,然后等待初始化完成。

Flannel网络插件 - Error registering network: failed to acquire lease: node "xxx" pod cidr not assigned

子节点不能获取pod CIDR地址

问题如下图:

image-20210804093745889

出现这个问题的原因是你的子节点没有收到podCIDR参数。要解决这个问题,主要从两方面进行排查,首先你自己要确保两方面:

  • kubeadm --init ...的时候带上了--pod-network-cidr这个参数,注意这个参数不能和你当前机器所在的LAN冲突
  • 其次在安装flannel网络插件的时候,配置文件要中更改网络信息中的网段,如下图。注意直接去apply raw,githubusercontent.com官方那个话,人家那个网段写的是10.244.0.0/16,所以你要么把你--pod-network-cidr改成10.244.0.0/16。要么你就把那个yml下载下来像我一样改成自己想要的。
image-20210804094134460

但是实际上,我上面两步都做了,但是依旧出现了这个问题,如果你也一样,那你可以使用下面的命令来解决这个问题:

kubectl patch node k8s-node01 -p '{"spec":{"podCIDR":"10.10.12.0/24"}}',成功结果如下:

image-20210804094610830

没有问题的话,你每一个节点应该都是可以看到如下信息的。但是这只是事后的解决办法,我还暂时不知道本质原因

image-20210804095517000

参考文章

环境搭建

k8s教程由浅入深-尚硅谷_哔哩哔哩_bilibili

问题排查

初始化 Kubernetes 主节点 failed to pull image registry.aliyuncs.com/google_containers/coredns:v1.8.0_a749227859的博客-CSDN博客

kubernetes - Flannel is crashing for Slave node - Stack Overflow

k8s集群flannel部署错误异常排查:pod cidr not assigned | 滩之南 (hyhblog.cn)

CATALOG
  1. 1. 安装一个单Master节点的Kubernetes集群
    1. 1.1. 大体流程
    2. 1.2. 准备工作
      1. 1.2.1. 基本工作
      2. 1.2.2. 配置工作
    3. 1.3. 初始化master节点
    4. 1.4. 子节点加入
    5. 1.5. 安装网络插件(Flannel)
    6. 1.6. 创建pod来测试集群
  2. 2. 出现的问题
    1. 2.1. kubeadm初始化 - failed to pull image registry.aliyuncs.com/google_containers/coredns:v1.8.0
    2. 2.2. kubectl get nodes报错The connection to the server localhost:8080 was refused - did you specify the right host or port?
    3. 2.3. Flannel网络插件 - 安装Flannel网络时出现镜像不能成功拉取
    4. 2.4. Flannel网络插件 - Error registering network: failed to acquire lease: node "xxx" pod cidr not assigned
  3. 3. 参考文章
    1. 3.1. 环境搭建
    2. 3.2. 问题排查