众客华禹

搜索
查看: 582|回复: 0

Centos7 使用RKE部署高可用k8s集群

[复制链接]

70

主题

70

帖子

378

积分

管理员

Rank: 9Rank: 9Rank: 9

积分
378
发表于 2021-12-30 14:46:56 | 显示全部楼层 |阅读模式
一、RKE介绍
1、介绍:RKE是经过CNCF认证的Kubernetes发行版,并且全部组件完全在Docker容器内运行
Rancher Server只能在使用RKE或K3s安装的Kubernetes集群中运行
2、节点环境准备
  1. firewall-cmd --permanent --add-port=22/tcp
  2. firewall-cmd --permanent --add-port=80/tcp
  3. firewall-cmd --permanent --add-port=443/tcp
  4. firewall-cmd --permanent --add-port=30000-32767/tcp
  5. firewall-cmd --permanent --add-port=30000-32767/udp
  6. firewall-cmd --reload
复制代码
2、同步节点时间
  1. yum install ntpdate -y
  2. ntpdate time.windows.com
复制代码
3、安装docker
  1. 任何运行Rancher Server的节点上都需要安装Docker
  2. sudo  yum install -y yum-utils device-mapper-persistent-data lvm2
  3. sudo  yum-config-manager --add-repo   <a href="https://download.docker.com/linux/centos/docker-ce.repo" target="_blank">https://download.docker.com/linux/centos/docker-ce.repo</a>
  4. sudo  yum install  docker-ce-18.09.3-3.el7
复制代码
4、安装kubectl
  1. <span style="background-color: rgb(255, 255, 255);">c</span>at << EOF > /etc/yum.repos.d/kubernetes.repo
  2. [kubernetes]
  3. name=Kubernetes
  4. baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
  5. enabled=1
  6. gpgcheck=1
  7. repo_gpgcheck=1
  8. gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg <a href="https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg" target="_blank">https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg</a>
  9. EOF

  10. sudo yum install -y kubectl
复制代码
5、安装RKE
  1. Rancher Kubernetes Engine用于构建Kubernetes集群的CLI
  2. 下载地址: <a href="https://github.com/rancher/rke/releases/tag/v1.1.3" target="_blank">https://github.com/rancher/rke/releases/tag/v1.1.3</a>
  3. mv rke_linux-amd64 rke
  4. chmod +x rke
  5. mv rke /usr/local/bin
  6. rke
复制代码
6、安装Helm
  1. Kubernetes的软件包管理工具
  2. 下载地址:<a href="https://github.com/helm/helm" target="_blank">https://github.com/helm/helm</a>
  3. tar -zxvf helm-v3.3.1-linux-amd64.tar.gz
  4. cd linux-amd64
  5. mv helm /usr/local/bin
  6. chown -R admin:admin /usr/local/bin/helm
  7. helm version
复制代码
7、配置ssh免密连接
  1. 用户加入docker用户组,让其可以执行docker命令
  2. sudo usermod -aG docker admin

  3. 切换成admin用户,在执行rke up命令的主机上执行创建ssh公私钥 并把公钥分发到各个节点上
  4. ssh-keygen -t rsa
  5. ssh-copy-id 192.168.112.120
  6. ssh-copy-id 192.168.112.121
复制代码
8、配置操作系统参数支持k8s集群 (所有节点上都要执行)
  1. sudo swapoff -a
  2. sudo vi /etc/sysctl.conf
  3. net.ipv4.ip_forward = 1
  4. net.bridge.bridge-nf-call-ip6tables = 1
  5. net.bridge.bridge-nf-call-iptables = 1
  6. sudo sysctl -p
复制代码
9、使用rke创建集群初始化配置文件
  1. rke config --name  cluster.yml
  2. RKE使用一个名为cluster.yml确定如何在集群中的节点上部署Kubernetes
  3. # If you intened to deploy Kubernetes in an air-gapped environment,
  4. # please consult the documentation on how to configure custom RKE images.
  5. nodes:
  6. - address: "192.168.30.110"
  7.   port: "22"
  8.   internal_address: ""
  9.   role: [controlplane,etcd,worker]
  10.   hostname_override: "node1"
  11.   user: admin
  12.   docker_socket: /var/run/docker.sock
  13.   ssh_key: ""
  14.   ssh_key_path: ~/.ssh/id_rsa
  15.   ssh_cert: ""
  16.   ssh_cert_path: ""
  17.   labels: {}
  18.   taints: []
  19. - address: "192.168.30.129"
  20.   port: "22"
  21.   internal_address: ""
  22.   role: [controlplane,etcd,worker]
  23.   hostname_override: "node2"
  24.   user: admin
  25.   docker_socket: /var/run/docker.sock
  26.   ssh_key: ""
  27.   ssh_key_path: ~/.ssh/id_rsa
  28.   ssh_cert: ""
  29.   ssh_cert_path: ""
  30.   labels: {}
  31.   taints: []
  32. - address: "192.168.30.133"
  33.   port: "22"
  34.   internal_address: ""
  35.   role: [controlplane,etcd,worker]
  36.   hostname_override: "node3"
  37.   user: admin
  38.   docker_socket: /var/run/docker.sock
  39.   ssh_key: ""
  40.   ssh_key_path: ~/.ssh/id_rsa
  41.   ssh_cert: ""
  42.   ssh_cert_path: ""
  43.   labels: {}
  44.   taints: []
  45. services:
  46.   etcd:
  47.     image: ""
  48.     extra_args: {}
  49.     extra_binds: []
  50.     extra_env: []
  51.     external_urls: []
  52.     ca_cert: ""
  53.     cert: ""
  54.     key: ""
  55.     path: ""
  56.     uid: 0
  57.     gid: 0
  58.     snapshot: null
  59.     retention: ""
  60.     creation: ""
  61.     backup_config: null
  62.   kube-api:
  63.     image: ""
  64.     extra_args: {}
  65.     extra_binds: []
  66.     extra_env: []
  67.     service_cluster_ip_range: 10.43.0.0/16
  68.     service_node_port_range: ""
  69.     pod_security_policy: false
  70.     always_pull_images: false
  71.     secrets_encryption_config: null
  72.     audit_log: null
  73.     admission_configuration: null
  74.     event_rate_limit: null
  75.   kube-controller:
  76.     image: ""
  77.     extra_args: {}
  78.     extra_binds: []
  79.     extra_env: []
  80.     cluster_cidr: 10.42.0.0/16
  81.     service_cluster_ip_range: 10.43.0.0/16
  82.   scheduler:
  83.     image: ""
  84.     extra_args: {}
  85.     extra_binds: []
  86.     extra_env: []
  87.   kubelet:
  88.     image: ""
  89.     extra_args: {}
  90.     extra_binds: []
  91.     extra_env: []
  92.     cluster_domain: cluster.local
  93.     infra_container_image: ""
  94.     cluster_dns_server: 10.43.0.10
  95.     fail_swap_on: false
  96.     generate_serving_certificate: false
  97.   kubeproxy:
  98.     image: ""
  99.     extra_args: {}
  100.     extra_binds: []
  101.     extra_env: []
  102. network:
  103.   plugin: flannel
  104.   options: {}
  105.   mtu: 0
  106.   node_selector: {}
  107.   update_strategy: null
  108. authentication:
  109.   strategy: x509
  110.   sans: []
  111.   webhook: null
  112. addons: ""
  113. addons_include: []
  114. system_images:
  115.   etcd: rancher/coreos-etcd:v3.4.3-rancher1
  116.   alpine: rancher/rke-tools:v0.1.58
  117.   nginx_proxy: rancher/rke-tools:v0.1.58
  118.   cert_downloader: rancher/rke-tools:v0.1.58
  119.   kubernetes_services_sidecar: rancher/rke-tools:v0.1.58
  120.   kubedns: rancher/k8s-dns-kube-dns:1.15.2
  121.   dnsmasq: rancher/k8s-dns-dnsmasq-nanny:1.15.2
  122.   kubedns_sidecar: rancher/k8s-dns-sidecar:1.15.2
  123.   kubedns_autoscaler: rancher/cluster-proportional-autoscaler:1.7.1
  124.   coredns: rancher/coredns-coredns:1.6.9
  125.   coredns_autoscaler: rancher/cluster-proportional-autoscaler:1.7.1
  126.   nodelocal: rancher/k8s-dns-node-cache:1.15.7
  127.   kubernetes: rancher/hyperkube:v1.18.3-rancher2
  128.   flannel: rancher/coreos-flannel:v0.12.0
  129.   flannel_cni: rancher/flannel-cni:v0.3.0-rancher6
  130.   calico_node: rancher/calico-node:v3.13.4
  131.   calico_cni: rancher/calico-cni:v3.13.4
  132.   calico_controllers: rancher/calico-kube-controllers:v3.13.4
  133.   calico_ctl: rancher/calico-ctl:v3.13.4
  134.   calico_flexvol: rancher/calico-pod2daemon-flexvol:v3.13.4
  135.   canal_node: rancher/calico-node:v3.13.4
  136.   canal_cni: rancher/calico-cni:v3.13.4
  137.   canal_flannel: rancher/coreos-flannel:v0.12.0
  138.   canal_flexvol: rancher/calico-pod2daemon-flexvol:v3.13.4
  139.   weave_node: weaveworks/weave-kube:2.6.4
  140.   weave_cni: weaveworks/weave-npc:2.6.4
  141.   pod_infra_container: rancher/pause:3.1
  142.   ingress: rancher/nginx-ingress-controller:nginx-0.32.0-rancher1
  143.   ingress_backend: rancher/nginx-ingress-controller-defaultbackend:1.5-rancher1
  144.   metrics_server: rancher/metrics-server:v0.3.6
  145.   windows_pod_infra_container: rancher/kubelet-pause:v0.1.4
  146. ssh_key_path: ~/.ssh/id_rsa
  147. ssh_cert_path: ""
  148. ssh_agent_auth: false
  149. authorization:
  150.   mode: rbac
  151.   options: {}
  152. ignore_docker_version: null
  153. kubernetes_version: ""
  154. private_registries: []
  155. ingress:
  156.   provider: ""
  157.   options: {}
  158.   node_selector: {}
  159.   extra_args: {}
  160.   dns_policy: ""
  161.   extra_envs: []
  162.   extra_volumes: []
  163.   extra_volume_mounts: []
  164.   update_strategy: null
  165. cluster_name: ""
  166. cloud_provider:
  167.   name: ""
  168. prefix_path: ""
  169. addon_job_timeout: 0
  170. bastion_host:
  171.   address: ""
  172.   port: ""
  173.   user: ""
  174.   ssh_key: ""
  175.   ssh_key_path: ""
  176.   ssh_cert: ""
  177.   ssh_cert_path: ""
  178. monitoring:
  179.   provider: ""
  180.   options: {}
  181.   node_selector: {}
  182.   update_strategy: null
  183.   replicas: null
  184. restore:
  185.   restore: false
  186.   snapshot_name: ""
  187. dns: null

  188. cluster.yaml
复制代码
10、部署
  1. rke up
复制代码
11、设置环境变量
  1. export KUBECONFIG=/home/admin/kube_config_cluster.yml
  2. mkdir ~/.kube
  3. cp kube_config_cluster.yml ~/.kube/config
  4.   通过RKE安装k8s集群成功,启动的时候有些节点启动的比较慢。需要稍微等待一段时间.

  5.   可以先找一台网络好一点的,pull全部镜像,再保存到本地,cp到所有主机

  6. docker save -o images.tgz `docker images|awk 'NR>1 {print $1":"$2}'`
复制代码
二、RKE的环境清理
1、rancher-node-1,2,3中分别执行以下命令

  1. mkdir rancher
  2. cat > rancher/clear.sh << EOF
  3. df -h|grep kubelet |awk -F % '{print $2}'|xargs umount
  4. rm /var/lib/kubelet/* -rf
  5. rm /etc/kubernetes/* -rf
  6. rm /var/lib/rancher/* -rf
  7. rm /var/lib/etcd/* -rf
  8. rm /var/lib/cni/* -rf

  9. rm -rf /var/run/calico

  10. iptables -F && iptables -t nat -F

  11. ip link del flannel.1

  12. docker ps -a|awk '{print $1}'|xargs docker rm -f
  13. docker volume ls|awk '{print $2}'|xargs docker volume rm

  14. rm -rf /var/etcd/
  15. rm -rf /run/kubernetes/
  16. docker rm -fv $(docker ps -aq)
  17. docker volume rm  $(docker volume ls)
  18. rm -rf /etc/cni
  19. rm -rf /opt/cni

  20. systemctl restart docker
  21. EOF
  22. sh rancher/clear.sh  清理脚本
复制代码
2、清理残留目录结束。如果还有问题可能需要卸载所有节点上的docker
  1. 首先查看Docker版本
  2. # yum list installed | grep docker
  3. docker-ce.x86_64  18.05.0.ce-3.el7.centos @docker-ce-edge

  4. 执行卸载
  5. # yum -y remove docker-ce.x86_64

  6. 删除存储目录
  7. # rm -rf /etc/docker
  8. # rm -rf /run/docker
  9. # rm -rf /var/lib/dockershim
  10. # rm -rf /var/lib/docker

  11. 如果发现删除不掉,需要先 umount,如
  12. # umount /var/lib/docker/devicemapper

  13. 卸载docker
  14. rke up --config=./rancher-cluster.yml  rke启动执行是幂等操作的 有时候需要多次执行才能成功
复制代码
3、rke多次安装和卸载k8s集群问题
启动的时候提示ectd的集群健康检查失败
清空节点上所有k8s的相关目录。卸载和删除docker所有相关目录。重新安装docker
最后在执行rke启动命令
三、扩容节点,缩容节点
1、添加节点
  1. 修改cluster.yal 将需要添加的节点配置,然后运行
  2. more cluster.yml
  3. nodes:
  4.   - address: 172.20.101.103
  5.     user: ptmind
  6.     role: [controlplane,worker,etcd]
  7.   - address: 172.20.101.104
  8.     user: ptmind
  9.     role: [controlplane,worker,etcd]
  10.   - address: 172.20.101.105
  11.     user: ptmind
  12.     role: [controlplane,worker,etcd]

  13.   - address: 172.20.101.106
  14.     user: ptmind
  15.     role: [worker]
  16.     labels: {traefik: traefik-outer}
  17. 2、执行添加节点操作

  18. rke up --update-only
  19. 3、rke 删除节点

  20. 修改cluster.yal 将需要删除的节点配置删除,然后运行
  21. more cluster.yml
  22. nodes:
  23.   - address: 172.20.101.103
  24.     user: ptmind
  25.     role: [controlplane,worker,etcd]
  26.   - address: 172.20.101.104
  27.     user: ptmind
  28.     role: [controlplane,worker,etcd]
  29.   - address: 172.20.101.105
  30.     user: ptmind
  31.     role: [controlplane,worker,etcd]

  32. 删除#  - address: 172.20.101.106
  33. 删除#    user: ptmind
  34. 删除#    role: [worker]
  35. 删除#    labels: {traefik: traefik-outer}
复制代码
4、执行删除节点操作
  1. rke up --update-only
复制代码
问题:当node节点处于NotReady状态下,对节点不可做操作,比如做了删除节点操作,会报错,删除不了节点。
解决办法:
 1、手动删除节点上的组件。
 2、通过命令移除节点的角色
  1. kubectl label node prod-129 node-role.kubernetes.io/controlplane-
复制代码
  问题:k8s 集群的节点处于 SchedulingDisabled
  解决方法:
  1. kubectl patch node NodeName -p "{"spec":{"unschedulable":false}}"
复制代码
  或者:
  1. 设置不可调度
  2. kubectl cordon node07-ingress
  3. 取消节点不可调度
  4. kubectl uncordon node07-ingress
  5. 驱逐节点的pod
  6. kubectl drain --ignore-daemonsets --delete-local-data node07-ingress
  7. 删除节点
  8. kubectl delete node node07-ingress
复制代码
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

快速回复 返回顶部 返回列表