REDIS Sentinel and Cluster

REDIS Sentinel and Cluster


Redis Sentinel

REDIS Sentinel 是 Redis高可用解决方案,每个Sentinel 节点定期监控 Redis 数据节点和其它 sentinel 节点是否可到达,一旦主节点不可到达,将通过协商选举方式,选举出新主节点,完成故障转移。

下面以一个实验演示 Sentinel 的配置:

拓扑图

Alt text

搭建主从复制结构

  1. 配置主节点
vim /etc/redis.conf
...
port 6379
bind 172.18.17.20
requirepass "123"
...
  1. 配置从节点
# 从节点 172.18.17.21
vim /etc/redis.conf
...
port 6379
bind 172.18.17.21
requirepass 123
slaveof 172.18.17.20 6379
masterauth 123
slave-read-only yes
...
# 从节点 172.18.17.22
vim /etc/redis.conf
...
port 6379
bind 172.18.17.22
requirepass 123
slaveof 172.18.17.20 6379
masterauth 123
slave-read-only yes
...
  1. 启动 redis服务
# 使用 ansible 管理启动
vim /etc/ansible/hosts
[redis]
172.18.17.20
172.18.17.21
172.18.17.22
ansible redis -m service -a "name=redis state=started"
  1. 检查主从复制结构是否建立
# 主节点视角查看
$ redis-cli -h 172.18.17.20 -p 6379 -a 123
172.18.17.20:6379> info replication
# Replication
role:master
connected_slaves:2
slave0:ip=172.18.17.22,port=6379,state=online,offset=481564,lag=0
slave1:ip=172.18.17.21,port=6379,state=online,offset=481425,lag=1
master_repl_offset:481564
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:2
repl_backlog_histlen:481563
# 从节点视角查看
$ redis-cli -h 172.18.17.21 -p 6379 -a 123
172.18.17.21:6379> info replication
# Replication
role:slave
master_host:172.18.17.20
master_port:6379
master_link_status:up
master_last_io_seconds_ago:1
master_sync_in_progress:0
slave_repl_offset:498217
slave_priority:100
slave_read_only:1
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0


搭建 Sentinel
三个节点上均需配置启动 redis-sentinel 服务,每个sentinel节点均会监控所有 redis 数据节点和其它 sentinel 节点

  1. 三个节点上的 redis-sentinel.conf 配置文件
vim /etc/redis-sentinel.conf.bak
# sentinel 监听的地址
bind 0.0.0.0
# sentinel 监听的端口
port 26379
# sentinel 节点工作目录
dir "/tmp"
# sentinel 节点监听的主节点的ip,端口,及判定主节点不可到达的票数
sentinel monitor mymaster 172.18.17.20 6379 2
# 若主节设置了密码验证,此处需输入密码,sentinel节点才有权限监控
sentinel auth-pass mymaster 123
# sentinel节点发送ping命令判断 Redis数据节点和其它sentinel节点是否可到达,如果超过指定的>时间则认为不可到达,单位ms
sentinel down-after-milliseconds mymaster 5000
# 当发生故障转移操作,从节点同时连接主节点发起复制操作的从节点数
sentinel parallel-syncs mymaster 1
# 故障转移超时时间
sentinel failover-timeout mymaster 30000
# sentinel 日志文件
logfile "/var/log/redis/sentinel.log"
  1. 启动三个节点的 redis-sentinel 服务
$ ansible redis -m service -a "name=redis-sentinel state=started"
  1. 查看 sentinel 结构是否建立
$ redis-cli -p 26379
127.0.0.1:26379> info sentinel
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=mymaster,status=ok,address=172.18.17.20:6379,slaves=2,sentinels=3
  1. 模拟主节点故障
# 关闭主节点 redis 服务
$ systemctl stop redis
# 查看刚关闭时主节点状态
master0:name=mymaster,status=sdown,address=172.18.17.20:6379,slaves=2,sentinels=3
# 过一段时间后查看是否选举出新的主节点,验证故障是否转移成功
$ redis-cli -p 26379
127.0.0.1:26379> info sentinel
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=mymaster,status=ok,address=172.18.17.22:6379,slaves=2,sentinels=3


REDIS Cluseter

Redis Cluster 是 Redis 分布式解决方案,从 Redis 3.0 开始推出。我们可以部署 Redisl Cluster 来实现 Redis 主机的负载均衡,解决单台主机所遇到的并发问题和单点故障。

对于分步式系统而言,首先要考虑的是如何将数据存储于不同的节点上,目前可以使用节点取余法,一致性哈希算法。对于 Redis Cluster 未使用这两种方法,而是引入了slot (虚拟槽) 的概念,将 0-16383 (16384) 个 slot 分发到不同的节点上。 对所有的键应用哈希函数对16383取余,映射到相应虚拟槽上。计算公示:slot=CRC16(key)%13683。每个节点上保存一定数量的虚拟槽,及槽内所映射的键值。

下面通过实验演示,Redis Cluster的部署


手动部署 Redis Cluster

纯手工部署

首先我们手动部署 Redis Cluster 暂时不考虑主从复制

Alt text

  1. 三个master节点上安装 redis
yum -y install redis
  1. 修改三个 master 节点的配置文件
port 6379
bind 172.18.17.20 #另外两个节点为 172.18.17.21172.18.17.22
cluster-enabled yes #是否开启集群功能
cluster-config-file nodes-6379.conf #集群信息保存文件,此文件自动创建
cluster-node-timeout 10000 #集群节点互联超时时间
  1. 启动 redis
systemctl start redis
  1. 分别为三个节点分配虚拟槽
# 节点1 分配的虚拟槽编号为:0-5460
redis-cli -h 172.18.17.20 -p 6379 cluster addslots {0..5460}
# 节点2 分配的虚拟槽编号为:5461..10921
redis-cli -h 172.18.17.21 -p 6379 cluster addslots {5461..10921}
# 节点3 分配的虚拟槽编号为:10922..16383
redis-cli -h 172.18.17.22 -p 6379 cluster addslots {10922..16383}
  1. 节点之间握手
    在其中一个节点上执行 CLUSTER MEET ip port 命令和另外两个节点握手,互相通信上,握手状态将在集群内传播,最终实现各节点之间互相握手通信。
# 在节点1 172.18.17.20 上执行以下两条命令,建立节点间的通信
redis-cli -h 172.18.17.20 -p 6379 cluster meet 172.18.17.21 6379
redis-cli -h 172.18.17.20 -p 6379 cluster meet 172.18.17.22 6379
  1. 查看集群信息
# 查看通信上的节点和所处的状态
[20@root ~]# redis-cli -h 172.18.17.20 -p 6379
172.18.17.20:6379> cluster nodes
53b15e66e3d6303ed29a86b0ce39e8ea3ce1d53a 172.18.17.20:6379 myself,master - 0 0 2 connected 0-5460
06aabaefc2318bf67901e8038565d9fed72cd6ea 172.18.17.22:6379 master - 0 1510999620109 1 connected 10922-16383
a6ea66562eb19e3fb4767cdbd8ec3d08e463f679 172.18.17.21:6379 master - 0 1510999622128 0 connected 5461-10921
# 集群状态信息
172.18.17.20:6379> cluster info
cluster_state:ok <= 集群状态
cluster_slots_assigned:16384 <= 集群总虚拟槽数
cluster_slots_ok:16384 <= 可用的槽数
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:3 <= 通信上的节点数
cluster_size:3
cluster_current_epoch:2
cluster_my_epoch:2
cluster_stats_messages_sent:292
cluster_stats_messages_received:292


使用 Ansible 完成以上操作的自动部署

  1. 首先在节点1 上安装 ansible
  2. 主备主机文件
$ cat /etc/ansible/hosts
[redis_cluster]
172.18.17.20
172.18.17.21
172.18.17.22
  1. redis.conf 模板文件
...
port 6379
bind {{ ansible_default_ipv4.address }} #由ansible内部变量自动替换
cluster-enabled yes #是否开启集群功能
cluster-config-file nodes-6379.conf #集群信息保存文件,此文件自动创建
cluster-node-timeout 10000 #集群节点互联超时时间
...
  1. 按照手工部署的步骤,编写 playbook 文件
$ cat redis_cluster.yml
- hosts: redis_cluster
remote_user: root
vars:
- port: 6379
- password: 123
tasks:
- name: install redis
yum: name=redis state=installed
- name: copy master config file
template: src=/root/redis_master.conf dest=/etc/redis.conf
notify: restart redis
tags: chgconf
- name: start redis
service: name=redis state=started
- name: add slots for node1
shell: redis-cli -h {{ ansible_default_ipv4.address }} -p {{ port }} -a {{ password }} cluster addslots {0..5460}
when: ansible_default_ipv4.address == '172.18.17.20'
- name: add slots for node2
shell: redis-cli -h {{ ansible_default_ipv4.address }} -p {{ port }} -a {{ password }} cluster addslots {5461..10921}
when: ansible_default_ipv4.address == '172.18.17.21'
- name: add slots for node3
shell: redis-cli -h {{ ansible_default_ipv4.address }} -p {{ port }} -a {{ password }} cluster addslots {10922..16383}
when: ansible_default_ipv4.address == '172.18.17.22'
- name: meet other nodes
shell: redis-cli -h {{ ansible_default_ipv4.address }} -p {{ port }} -a {{ password }} cluster meet {{ item }} {{ port }}
with_items:
- 172.18.17.21
- 172.18.17.22
- 172.18.17.23
when: ansible_default_ipv4.address == '172.18.17.20'
handlers:
- name: restart redis
service: name=redis state=restarted
  1. 执行 playbook 完成集群的部署
$ ansible-playbook redis_cluster.yml


使用 redis-trib.rb 部署 Redis Cluster

本实验借助 redis-trib.rb 集群管理工具搭建 Redis Cluster,redis-trib.rb 内部集成cluster管理命令,简化虚拟槽的分配,主从复制等集群操作。redis-trib.rb 是通过 Ruby 实现的,所以首先得安装 Ruby 环境。

Alt text
手工部署

  1. 六个节点上安装 redis, 并启动
# 安装 redis
yum -y install redis
# 启动 redis
systemctl start redis
  1. 修改六个节点的 redis.conf 配置文件
port 6379
bind 172.18.17.20 #其它节点分别为自己对于的IP
cluster-enabled yes #是否开启集群功能
cluster-config-file nodes-6379.conf #集群信息保存文件,此文件自动创建
cluster-node-timeout 10000 #集群节点互联超时时间
  1. 准备 ruby 环境
    此步只需在其中一个节点上进行配置即可,通过该节点上的 redis-trib.rb 完成集群的配置,我们在节点 172.18.18.20 上完成以下操作
# 1. 安装 ruby-devel
yum -y install ruby-devel
# 2. 下载 redis 依赖 redis.gem
wget https://rubygems.org/downloads/redis-3.3.5.gem
# 3. 安装 redis.gem
gem install redis-3.3.5.gem
# 4. 下载 redis-trib.rb 脚本
wget http://download.redis.io/redis-stable/src/redis-trib.rb
  1. 创建集群
ruby ./redis-trib.rb create --replicas 1 172.18.17.20:6379 172.18.17.21:6379 172.18.17.22:6379 172.18.17.23:6379 172.18.17.24:6379 172.18.17.25:6379
...
Can I set the above configuration? (type 'yes' to accept): <=输入 yes 同意
...
  1. 查看集群信息
# 1. 查看分配的虚拟槽信息
$ ruby ./redis-trib.rb info 172.18.17.20:6379
172.18.17.20:6379 (9deb336d...) -> 0 keys | 5461 slots | 1 slaves.
172.18.17.21:6379 (fd1fb707...) -> 0 keys | 5462 slots | 1 slaves.
172.18.17.22:6379 (a6418367...) -> 0 keys | 5461 slots | 1 slaves.
# 2. 检查集群是否创建成功,给出任意一个节点的信息,均能测出
$ ruby ./redis-trib.rb check 172.18.17.20:6379
>>> Performing Cluster Check (using node 172.18.17.20:6379)
M: 9deb336d8e713ba178185589dc75531246e33b71 172.18.17.20:6379
slots:0-5460 (5461 slots) master
1 additional replica(s)
S: d321c36ba093f83598e96124062ea2f537000099 172.18.17.24:6379
slots: (0 slots) slave
replicates fd1fb707099a999fcfff2a014453dcb23bb705a6
S: 6499a3df68aeccfee85e551154768171f953af75 172.18.17.25:6379
slots: (0 slots) slave
replicates a6418367ad5839dfe22768f67d1f1c2f4f14465c
S: 767926cf68fe8ee2a042a84f0a9a6f0bc792dd5c 172.18.17.23:6379
slots: (0 slots) slave
replicates 9deb336d8e713ba178185589dc75531246e33b71
M: fd1fb707099a999fcfff2a014453dcb23bb705a6 172.18.17.21:6379
slots:5461-10922 (5462 slots) master
1 additional replica(s)
M: a6418367ad5839dfe22768f67d1f1c2f4f14465c 172.18.17.22:6379
slots:10923-16383 (5461 slots) master
1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
3. 通过 redis-cli 查看集群信息
# redis-cli -h 172.18.17.20 -p 6379
172.18.17.20:6379> cluster nodes
d321c36ba093f83598e96124062ea2f537000099 172.18.17.24:6379 slave fd1fb707099a999fcfff2a014453dcb23bb705a6 0 1511010828097 5 connected
6499a3df68aeccfee85e551154768171f953af75 172.18.17.25:6379 slave a6418367ad5839dfe22768f67d1f1c2f4f14465c 0 1511010829108 6 connected
767926cf68fe8ee2a042a84f0a9a6f0bc792dd5c 172.18.17.23:6379 slave 9deb336d8e713ba178185589dc75531246e33b71 0 1511010827089 4 connected
fd1fb707099a999fcfff2a014453dcb23bb705a6 172.18.17.21:6379 master - 0 1511010826079 2 connected 5461-10922
9deb336d8e713ba178185589dc75531246e33b71 172.18.17.20:6379 myself,master - 0 0 1 connected 0-5460
a6418367ad5839dfe22768f67d1f1c2f4f14465c 172.18.17.22:6379 master - 0 1511010829108 3 connected 10923-16383
172.18.17.20:6379> cluster info
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6
cluster_size:3
cluster_current_epoch:6
cluster_my_epoch:1
cluster_stats_messages_sent:1246
cluster_stats_messages_received:1246
  1. 测试故障转移

Alt text


通过 ansible 自动完成以上实验的部署

  1. hosts主机文件
$ cat /etc/ansible/hosts
[redis_cluster]
172.18.17.20
172.18.17.21
172.18.17.22
172.18.17.23
172.18.17.24
172.18.17.25
[manage]
172.18.17.20
  1. playbook 文件
$ cat redis_cluster.yml
- hosts: redis_cluster
remote_user: root
vars:
- port: 6379
tasks:
- name: install redis
yum: name=redis state=installed
- name: copy master config file
template: src=/root/redis_master.conf dest=/etc/redis.conf
notify: restart redis
tags: chgconf
- name: start redis
service: name=redis state=started
handlers:
- name: restart redis
service: name=redis state=restarted
- hosts: manage
remote_user: root
vars:
- port: 6379
tasks:
- name: install ruby-devel
yum: name=ruby-devel state=installed
- name: make dir
file: path=/app/cluster state=directory
- name: download redis.gem
get_url: url=https://rubygems.org/downloads/redis-3.3.5.gem dest=/app/cluster
- name: install redis.gem
shell: gem install /app/cluster/redis-3.3.5.gem
- name: download redis-trib.rb
get_url: url=http://download.redis.io/redis-stable/src/redis-trib.rb dest=/app/cluster
- name: creating cluster
shell: echo -e "yes\n" | ruby /app/cluster/redis-trib.rb create --replicas 1 172.18.17.20:6379 172.18.17.21:6379 172.18.17.22:6379 172.18.17.23:6379 172.18.17.24:6379 172.18.17.25:6379
  1. 运行 playbook
ansible-playbook redis_cluster.yml