Kubernetes 集群环境搭建使用及问题记录（一）

开发技术 Kubernete 分布式技术

字数统计: 3.6k阅读时长: 18 min

 2021/08/03 

本篇将搭建一个单master节点的集群(后续补多master)并测试其可用性。整体上过程不是很复杂，就是中间几个报错耗点时间，所有我搭建中出现的问题也有记录

本文操作系统Centos 7；kubernetes版本：v1.21.3；docker版本：20.10.7。

安装一个单Master节点的Kubernetes集群

大体流程

第一步：安装三台虚拟机（本文选择Centos 7最小安装）
对每一个虚拟机进行配置操作（关闭SELinux、关闭防火墙、关闭swap、配置hosts信息、配置静态IP、配置软件源）
在三个节点上装docker、kubelet、kubeadm、kubectl并进行配置（配置docker、kubernetes为阿里源）
在master节点上使用kubeadm方式进行初始化并执行相关操作
各个node节点加入集群
配置网络组件（本文使用flannel）
通过建立一个nginx的pod来测试集群

准备工作

基本工作

首先安装三台Linux机器，这个没啥要求，我这里是Centos7下面对每一台机器进行以下处理：

# 1. 更换软件源
## 备份
cp -a /etc/yum.repos.d/CentOS-Base.repo /etc/yum.repos.d/CentOS-Base.repo.bak
## 更换为华为源
sed -i "s/#baseurl/baseurl/g" /etc/yum.repos.d/CentOS-Base.repo
sed -i "s/mirrorlist=http/#mirrorlist=http/g" /etc/yum.repos.d/CentOS-Base.repo
sed -i "s@http://mirror.centos.org@https://repo.huaweicloud.com@g" /etc/yum.repos.d/CentOS-Base.repo
## 清除缓存并更新
yum clean all
yum makecache

# 2. 关闭防火墙
systemctl stop firewalld
systemctl disable firewalld

# 3. 永久关闭selinux
sed -i 's/enforcing/disabled' /etc/selinux/config
setenforce 0

# 4. 关闭swap
swapoff -a # 临时
sed -ri 's/.*swap.*/#&/' /etc/fstab

# 5. 设置主机名
hostnamectl set-hostname <hostname>

# 6. 为每一台机器添加hosts，比如我的环境：/etc/hosts
10.10.10.101 k8s-master01
10.10.10.102 k8s-node01
10.10.10.103 k8s-node02

# 7. 将IPv4流量传递到iptables
cat > /etc/sysctl.d/k8s.conf << EOF
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
# 生效
sysctl --system

# 8. 同步时间
yum install ntpdate -y
ntpdate time.windows.com

# 9. 安装docker
## 安装必要系统工具
yum install -y yum-utils device-mapper-persistent-data lvm2
## 添加软件源信息(阿里源)
wget -O /etc/yum.repos.d/docker-ce.repo https://mirrors。aliyun.com/docker-ce/linux/centos/docker-ce.repo
## 安装Docker并启动、设置自启
yum install docker-ce
systemctl start docker && systemctl enable docker

# 10. 安装kubernetes
## 添加软件源信息
cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF
## 安装kubelet、kubeadm、kubectl，并设置自启(暂时不需要启动)
yum install -y kubelet kubeadm kubectl
systemctl enable kubelet

配置工作

主要是配置Docker镜像加速器，这个建议自己去阿里云注册账号，然后去控制台里看自己的镜像加速器地址。注意，也需要吧docker的cgroupdriver修改为systemd形式，与k8s保持一致。(不然后续会报错)

cat > /etc/docker/daemon.json << EOF
{
	"exec-opts": ["native.cgroupdriver=systemd"]
	"registry-mirrors": ["https://xxx.mirror.aliyuncs.com"]
}
EOF

初始化master节点

我的初始化命令如下，注意，这是在master机器上，node节点时无需init的，后面直接join就行了。

kubeadm init --apiserver-advertise-address=10.10.10.101 --image-repository registry.aliyuncs.com/google_containers --kubernetes-version v1.21.3 --service-cidr=10.10.11.0/24 --pod-network-cidr=10.10.12.0/24

其中每一个参数含义如下：

apiserver-advertise-address：本节点的IP
image-repository：镜像地址，就用阿里云的就行了
kubernetes-version：Kubernetes的版本号，用kubelet --version就能看到
service-cidr：服务的CIDR网段
pod-network-cidr：pod节点之间互通的的CIDR网段

终端输入记录如下：

[root@k8s-master01 ~]# kubeadm init --apiserver-advertise-address=10.10.10.101 --image-repository registry.aliyuncs.com/google_containers --kubernetes-version v1.21.3 --service-cidr=10.10.11.0/24 --pod-network-cidr=10.10.12.0/24
[init] Using Kubernetes version: v1.21.3
[preflight] Running pre-flight checks
        [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
error execution phase preflight: [preflight] Some fatal errors occurred:
        [ERROR ImagePull]: failed to pull image registry.aliyuncs.com/google_containers/coredns:v1.8.0: output: Error response from daemon: manifest for registry.aliyuncs.com/google_containers/coredns:v1.8.0 not found: manifest unknown: manifest unknown
, error: exit status 1
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher

# 重新拉取失败镜像
[root@k8s-master01 ~]# docker pull registry.aliyuncs.com/google_containers/coredns:1.8.0
1.8.0: Pulling from google_containers/coredns
c6568d217a00: Pull complete
5984b6d55edf: Pull complete
Digest: sha256:cc8fb77bc2a0541949d1d9320a641b82fd392b0d3d8145469ca4709ae769980e
Status: Downloaded newer image for registry.aliyuncs.com/google_containers/coredns:1.8.0
registry.aliyuncs.com/google_containers/coredns:1.8.0

[root@k8s-master01 ~]# docker images
REPOSITORY                                                        TAG        IMAGE ID       CREATED         SIZE
registry.aliyuncs.com/google_containers/kube-apiserver            v1.21.3    3d174f00aa39   2 weeks ago     126MB
registry.aliyuncs.com/google_containers/kube-scheduler            v1.21.3    6be0dc1302e3   2 weeks ago     50.6MB
registry.aliyuncs.com/google_containers/kube-controller-manager   v1.21.3    bc2bb319a703   2 weeks ago     120MB
registry.aliyuncs.com/google_containers/kube-proxy                v1.21.3    adb2816ea823   2 weeks ago     103MB
registry.aliyuncs.com/google_containers/pause                     3.4.1      0f8457a4c2ec   6 months ago    683kB
registry.aliyuncs.com/google_containers/coredns                   1.8.0      296a6d5035e2   9 months ago    42.5MB
registry.aliyuncs.com/google_containers/etcd                      3.4.13-0   0369cf4303ff   11 months ago   253MB

# 重新tag
[root@k8s-master01 ~]# docker tag registry.aliyuncs.com/google_containers/coredns:1.8.0 registry.aliyuncs.com/google_containers/coredns:v1.8.0
[root@k8s-master01 ~]# docker rmi registry.aliyuncs.com/google_containers/coredns:1.8.0
Untagged: registry.aliyuncs.com/google_containers/coredns:1.8.0

# 重新初始化
[root@k8s-master01 ~]# kubeadm init --apiserver-advertise-address=10.10.10.101 --image-repository registry.aliyuncs.com/google_containers --kubernetes-version v1.21.3 --service-cidr=10.10.11.0/24 --pod-network-cidr=10.10.12.0/24
[init] Using Kubernetes version: v1.21.3
[preflight] Running pre-flight checks
        [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [k8s-master01 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.10.11.1 10.10.10.101]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [k8s-master01 localhost] and IPs [10.10.10.101 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [k8s-master01 localhost] and IPs [10.10.10.101 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 28.122070 seconds
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.21" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --upload-certs
[mark-control-plane] Marking the node k8s-master01 as control-plane by adding the labels: [node-role.kubernetes.io/master(deprecated) node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers]
[mark-control-plane] Marking the node k8s-master01 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: 42umpk.lqtmjnryhec3oj4f
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 10.10.10.101:6443 --token 42umpk.lqtmjnryhec3oj4f \
        --discovery-token-ca-cert-hash sha256:2a05b1260021049e27aa798119b78582f02f0ce4dc80652b71fa361d69ed56d7

最后一行可以看到给出了加入该集群的token，我们直接复制这个去子节点执行就行了。

子节点加入

直接执行上面那个命令，终端输出如下：

[root@k8s-node01 ~]# kubeadm join 10.10.10.101:6443 --token 42umpk.lqtmjnryhec3oj4f --discovery-token-ca-cert-hash sha256:2a05b1260021049e27aa798119b78582f02f0ce4dc80652b71fa361d69ed56d7
[preflight] Running pre-flight checks
        [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...

This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the control-plane to see this node join the cluster.

两个节点都加入后的结果：

安装网络插件（Flannel）

你可以使用kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml来执行，但是如果初始化的时候你的--pod-network-cidr参数不是flannel默认的10.244.0.0/16的话，我建议你下载下来手动修改一下网络参数，然后apply -f kube-flannel.yml就可以了，结果如下：

创建pod来测试集群

我们先创建一个nginx的pod，命令如下：

// 创建pod
kubectl create deployment nginx --image=nginx
// 等待pod运行起来了后，暴露其端口
kubectl expose deployment nginx --port=80 --type=NodePort
// 查看服务状态，结果如下图
kubectl get pods,svc -o wide

我们可以看到nginx这个pod被分到了k8s-node01节点(本机10.10.10.101，k8s-node01为10.10.10.102)上，且暴露到外部的端口为32700，我们访问即可：

出现的问题

kubeadm初始化 - failed to pull image registry.aliyuncs.com/google_containers/coredns:v1.8.0

问题：即拉不到镜像

这算是个小bug吧，我们用的是阿里云的源，阿里那边给这个镜像打的标签是1.8.0而不是v1.8.0所以拉不到，所以我们需要手动从阿里云拉取coredns(标签是:1.8.0)，然后手动把标签改成v1.8.0，再次初始化就好了。

# 先拉取这个需要的镜像
docker pull registry.aliyuncs.com/google_containers/coredns:1.8.0
# 重新打tag
docker tag registry.aliyuncs.com/google_containers/coredns:1.8.0 registry.aliyuncs.com/google_containers/coredns:v1.8.0
# 删除原有镜像
docker rmi registry.aliyuncs.com/google_containers/coredns:1.8.0

kubectl get nodes报错The connection to the server localhost:8080 was refused - did you specify the right host or port?

问题：不能使用Kubectl

解决方法：首先这个命令是在主节点上执行的，不是在Node节点上运行的。其次，注意要使用这个命令必须你的家目录下有.kube文件夹。用当时建立.kube的用户来执行这个命令。我们刚初始化完主节点的时候有一个提示：

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

比如我用float用户建立的那就用这个用户执行这个命令。如果你用root的话可以执行这个命令：export KUBECONFIG=/etc/kubernetes/admin.conf，或者直接用root创建.kube就行了。

Flannel网络插件 - 安装Flannel网络时出现镜像不能成功拉取

kubectl apply -f kube-flannel.yml加载flannel模块后pod起不来的问题，可能出现ImagePullBackOff等错误

问题如图：

这是因为默认镜像来源quay.io访问不到的问题，阿里云暂时也没有对kubernetes的组件做镜像，所以这里你只能去github手动下载然后docker load镜像，flannel地址https://github.com/flannel-io/flannel#flannel。导入结果如下：

然后你可以手动把镜像拉取失败的pod删掉：kubectl delete pod -n kube-system xxxx，然后它会自动重建pod，然后等待初始化完成。

Flannel网络插件 - Error registering network: failed to acquire lease: node "xxx" pod cidr not assigned

子节点不能获取pod CIDR地址

问题如下图：

出现这个问题的原因是你的子节点没有收到podCIDR参数。要解决这个问题，主要从两方面进行排查，首先你自己要确保两方面：

在kubeadm --init ...的时候带上了--pod-network-cidr这个参数，注意这个参数不能和你当前机器所在的LAN冲突
其次在安装flannel网络插件的时候，配置文件要中更改网络信息中的网段，如下图。注意直接去apply raw,githubusercontent.com官方那个话，人家那个网段写的是10.244.0.0/16，所以你要么把你--pod-network-cidr改成10.244.0.0/16。要么你就把那个yml下载下来像我一样改成自己想要的。