如何避免regionServer宕机

如题所述

为什么regionserver 和Zookeeper的session expired? 可能的原因有
1. 网络不好。
2. Java full GC, 这会block所有的线程。如果时间比较长,也会导致session expired.
怎么办?
1. 将Zookeeper的timeout时间加长。
2. 配置“hbase.regionserver.restart.on.zk.expire” 为true。 这样子,遇到ZooKeeper session expired , regionserver将选择 restart 而不是 abort
具体的配置是,在hbase-site.xml中加入
<property>
<name>zookeeper.session.timeout</name>
<value>90000</value>
<description>ZooKeeper session timeout.
HBase passes this to the zk quorum as suggested maximum time for a
session. See http://hadoop.apache.org/zookeeper/docs/current/zookeeperProgrammers.html#ch_zkSessions
“The client sends a requested timeout, the server responds with the
timeout that it can give the client. The current implementation
requires that the timeout be a minimum of 2 times the tickTime
(as set in the server configuration) and a maximum of 20 times
the tickTime.” Set the zk ticktime with hbase.zookeeper.property.tickTime.
In milliseconds.
</description>
</property>
<property>
<name>hbase.regionserver.restart.on.zk.expire</name>
<value>true</value>
<description>
Zookeeper session expired will force regionserver exit.
Enable this will make the regionserver restart.
</description>
</property>
为了避免java full GC suspend thread 对Zookeeper heartbeat的影响,我们还需要对hbase-env.sh进行配置。

export HBASE_OPTS="$HBASE_OPTS -XX:+HeapDumpOnOutOfMemoryError \ -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode"

修改成
export HBASE_OPTS="$HBASE_OPTS -XX:+HeapDumpOnOutOfMemoryError \ -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled \ -XX:+CMSInitiatingOccupancyFraction=70 \ -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseParNewGC -Xmn256m"

同时,当linux的maxfile设置过小时,scan多个列族也会造成regionServer宕机
温馨提示:答案为网友推荐,仅供参考
相似回答