参考链接:
一、安装
二、问题
2.1 单机部署的Hbase启动失败,hbase.log如下:
021-05-14 11:10:18,086 ERROR [main] master.HMasterCommandLine: Master exiting
java.lang.RuntimeException: Master not initialized after 200000ms
at org.apache.hadoop.hbase.util.JVMClusterUtil.waitForEvent(JVMClusterUtil.java:232)
at org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:200)
at org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:430)
at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:249)
at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:149)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:149)
at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:3085)
经过百度调研可知,可使工具修复此类问题。
官方教程:https://hbase.apache.org/book.html#HBCK2
Step 1 克隆
github地址:
apache/hbase-operator-tools: Apache HBase Operator Tools (github.com)
Step2 打包生成hbase-operator-tools:
cd \hbase-hbck2\ mvn package -DskipTests
Step2 拷贝target下的hbase-hbck2-1.2.0-SNAPSHOT.jar
执行以下命令修复未知异常:
/opt/hbase-2.3.5/bin/hbase hbck -j /opt/hbase-data/hbase/yyp/hbase-hbck2-1.2.0-SNAPSHOT.jar recoverUnknown
Tips: recoverUnknown 去掉后,可以查看所有修复方法
Step 4 重启hbase,完成修复
问题2:hbase启动十几小时候,报错后,宕机
WARN [RS:0;iios-hbase:16020] util.Sleeper: We slept 14148ms instead of 3000ms, this is likely due to a long garbage collecting pause and it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired,
修复参考链接见:http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
修复方法:
不确定是否生效,带prod环境7*24h检验。
Step 1 hbase-site.xml增加:
<property> <name>zookeeper.session.timeout</name> <value>120000</value> </property> <property> <name>hbase.zookeeper.property.tickTime</name> <value>6000</value> </property>
Step 2 hbase-env.sh 增加:
export HBASE_HEAPSIZE=2G
Step 3 修改hbase docker-compose
最大内存翻倍,最小cpu分配为30%。
问题3:客户意外停电后hbase宕机,导致数据无法写入
场景:hbase为单机部署
状态一:连接Hbase的代码报错如下:
---2021-09-03 15:10:54.931 - INFO - --- [shared-pool3-t2] o.a.h.h.client.AsyncRequestFutureImpl : id=1, table=realtime_data_object, attempt=11/16, failed=4ops, last exception=org.apache.hadoop.hbase.NotServingRegionException: org.apache.hadoop.hbase.NotServingRegionException: realtime_data_object,00000795\x00\x00\x01y\xFC\xCE\xD5\x14,1624794165063.9d2f38da9f615d3ff76e670db2e27f22. is not online on iios-hbase,16020,1630652847466, at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3358), at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3335), at org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1492), at org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2774), at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:44870), at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:393), at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133), at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338), at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318), on iios-hbase,16020,1630384166013, tracking started null, retrying after=10072ms, replay=4ops,
状态二:Hbase启动时无明显报错
状态三:
执行命令,未发现明显错误。
状态四:
综上所述,尝试各类解决方法均为解决。
成功的方法:
进入hbase shell,执行查看表的命令,均正常。
Took 0.3641 seconds hbase(main):006:0> list TABLE energy_realtime_datapoint energy_timing_datapoint
查询出错表前10条数据:
hbase(main):007:0> scan 'realtime_data_object',{LIMIT=>10} ROW COLUMN+CELL org.apache.hadoop.hbase.TableNotEnabledException: realtime_data_object is disabled. at org.apache.hadoop.hbase.client.ConnectionImplementation.relocateRegion(ConnectionImplementation.java:773) at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:330) at org.apache.hadoop.hbase.client.ScannerCallable.prepare(ScannerCallable.java:139) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.prepare(ScannerCallableWithReplicas.java:408) at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:105) at org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) ERROR: Table realtime_data_object is disabled! For usage try 'help "scan"' Took 0.4266 seconds
Good! 原来是这个问题。
手动enable表后,表状态正常,hbase吞吐正常。
hbase(main):008:0> enable "realtime_data_object" Took 4.1879 seconds
-