Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I'm new in storm and trying to submit a topology and found this in supervisor enter image description here enter image description here I found this in log file of workers

 [ERROR] Async loop died!
java.lang.RuntimeException: org.apache.thrift7.transport.TTransportException: java.net.ConnectException: Connection refused
    at backtype.storm.drpc.DRPCInvocationsClient.<init>(DRPCInvocationsClient.java:23)
    at backtype.storm.drpc.DRPCSpout.open(DRPCSpout.java:69)
    at storm.trident.spout.RichSpoutBatchTriggerer.open(RichSpoutBatchTriggerer.java:41)
    at backtype.storm.daemon.executor$fn__3985$fn__3997.invoke(executor.clj:460)
    at backtype.storm.util$async_loop$fn__465.invoke(util.clj:375)
    at clojure.lang.AFn.run(AFn.java:24)
    at java.lang.Thread.run(Unknown Source)
Caused by: org.apache.thrift7.transport.TTransportException: java.net.ConnectException: Connection refused

log file of supervisor

supervisor [INFO] ff6460a5-aafb-44a4-a49c-2de945ffd572 still hasn't started
2015-09-15 02:00:54 supervisor [ERROR] Error when processing event
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
    at com.netflix.curator.ConnectionState.getZooKeeper(ConnectionState.java:72)
    at com.netflix.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:74)
    at com.netflix.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:353)
    at com.netflix.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:149)
    at com.netflix.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:138)
    at com.netflix.curator.RetryLoop.callWithRetry(RetryLoop.java:85)
    at com.netflix.curator.framework.imps.ExistsBuilderImpl.pathInForeground(ExistsBuilderImpl.java:134)
    at com.netflix.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:125)
    at com.netflix.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:34)
    at backtype.storm.zookeeper$exists_node_QMARK_.invoke(zookeeper.clj:78)
    at backtype.storm.zookeeper$mkdirs.invoke(zookeeper.clj:88)
    at backtype.storm.cluster$mk_distributed_cluster_state$reify__1996.set_ephemeral_node(cluster.clj:54)
    at backtype.storm.cluster$mk_storm_cluster_state$reify__2415.supervisor_heartbeat_BANG_(cluster.clj:300)
    at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93)
    at clojure.lang.Reflector.invokeInstanceMethod(Reflector.java:28)

and this is in the supervisor log file too

   at java.lang.Thread.run(Unknown Source)
2015-09-15 02:00:54 supervisor [INFO] ff6460a5-aafb-44a4-a49c-2de945ffd572 still hasn't started
2015-09-15 02:00:55 ClientCnxn [INFO] Client session timed out, have not heard from server in 20020ms for sessionid 0x14fce3996380015, closing socket connection and attempting reconnect
2015-09-15 02:00:58 ClientCnxn [INFO] Opening socket connection to server localhost/127.0.0.1:2181
2015-09-15 02:00:58 ClientCnxn [INFO] Socket connection established to localhost/127.0.0.1:2181, initiating session
2015-09-15 02:00:59 supervisor [INFO] ff6460a5-aafb-44a4-a49c-2de945ffd572 still hasn't started
2015-09-15 02:01:01 supervisor [INFO] ff6460a5-aafb-44a4-a49c-2de945ffd572 still hasn't started
2015-09-15 02:00:59 util [INFO] Halting process: ("Error when processing an event")
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
750 views
Welcome To Ask or Share your Answers For Others

1 Answer

There are many possible reasons for this issue.

  1. zookeeper is not started.
  2. CPU get to peak for a while, no heartbeat send in the timeout, so nimbus think the supervisor is dead, the disconnect the connection.
  3. worker timeout is too short, maybe the default is 10sec, you can change it to 600 or more to try. it's almost like #2.
  4. Make sure nimbus is working fine.
  5. worker.childopts is not correct, it means the memory setting is not correct, change the xmx and maxpermsize try again.
  6. if you start the storm with winrm or powershell, maybe the default memory is not enough, since the default memory is only 1024M, you need to set more, such as 2048M to try.

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...