请问一个关于hdfs Ha不生效的问题

fanshuai Hadoop 高可用集群 2017年06月13日提问

  • 0 回答
  • 840 浏览

我用了三台虚拟机,配置了zk集群的自动故障转移,现在遇到的问题是,当我kill掉active状态的,sandby状态的namenode没有自动转移。查看sandby状态机器的zkfc 的日志发现有一些报错。这时,如果我重新启动zkfc发现,会自动变成 active的状态,

java.net.ConnectException: Call From hadoop03/192.168.1.103 to hadoop02:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused

        at sun.reflect.GeneratedConstructorAccessor23.newInstance(Unknown Source)

        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)

        at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:801)

        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)

        at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1485)

        at org.apache.hadoop.ipc.Client.call(Client.java:1427)

        at org.apache.hadoop.ipc.Client.call(Client.java:1337)

        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)

        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)

        at com.sun.proxy.$Proxy9.transitionToStandby(Unknown Source)

        at org.apache.hadoop.ha.protocolPB.HAServiceProtocolClientSideTranslatorPB.transitionToStandby(HAServiceProtocolClientSideTranslatorPB.java:112)

        at org.apache.hadoop.ha.FailoverController.tryGracefulFence(FailoverController.java:172)

        at org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:518)

        at org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:509)

        at org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:61)

        at org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:895)

        at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:985)

        at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:882)

        at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)

        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)

        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)

Caused by: java.net.ConnectException: Connection refused

        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)

        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)

        at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)

        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)

        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)

        at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:681)

        at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:777)

        at org.apache.hadoop.ipc.Client$Connection.access$3500(Client.java:409)

        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1542)

        at org.apache.hadoop.ipc.Client.call(Client.java:1373)

        ... 15 more

2017-06-13 07:28:11,990 ERROR org.apache.hadoop.ha.NodeFencer: Unable to fence service by any configured method.

2017-06-13 07:28:11,990 WARN org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning of election

java.lang.RuntimeException: Unable to fence NameNode at hadoop02/192.168.1.102:8020

        at org.apache.hadoop.ha.ZKFailoverController.doFence(ZKFailoverController.java:537)

        at org.apache.hadoop.ha.ZKFailoverController.fenceOldActive(ZKFailoverController.java:509)

        at org.apache.hadoop.ha.ZKFailoverController.access$1100(ZKFailoverController.java:61)

        at org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.fenceOldActive(ZKFailoverController.java:895)

        at org.apache.hadoop.ha.ActiveStandbyElector.fenceOldActive(ActiveStandbyElector.java:985)

        at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:882)

        at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:467)

        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)

        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)

2017-06-13 07:28:11,991 INFO org.apache.hadoop.ha.ActiveStandbyElector: Trying to re-establish ZK session

2017-06-13 07:28:11,991 INFO org.apache.hadoop.ha.SshFenceByTcpPort.jsch: Caught an exception, leaving main loop due to Socket closed

2017-06-