Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
749 views
in Technique[技术] by (71.8m points)

apache storm - exception after submitting topology

I'm new in storm and trying to submit a topology and found this in supervisor enter image description here enter image description here I found this in log file of workers

 [ERROR] Async loop died!
java.lang.RuntimeException: org.apache.thrift7.transport.TTransportException: java.net.ConnectException: Connection refused
    at backtype.storm.drpc.DRPCInvocationsClient.<init>(DRPCInvocationsClient.java:23)
    at backtype.storm.drpc.DRPCSpout.open(DRPCSpout.java:69)
    at storm.trident.spout.RichSpoutBatchTriggerer.open(RichSpoutBatchTriggerer.java:41)
    at backtype.storm.daemon.executor$fn__3985$fn__3997.invoke(executor.clj:460)
    at backtype.storm.util$async_loop$fn__465.invoke(util.clj:375)
    at clojure.lang.AFn.run(AFn.java:24)
    at java.lang.Thread.run(Unknown Source)
Caused by: org.apache.thrift7.transport.TTransportException: java.net.ConnectException: Connection refused

log file of supervisor

supervisor [INFO] ff6460a5-aafb-44a4-a49c-2de945ffd572 still hasn't started
2015-09-15 02:00:54 supervisor [ERROR] Error when processing event
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
    at com.netflix.curator.ConnectionState.getZooKeeper(ConnectionState.java:72)
    at com.netflix.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:74)
    at com.netflix.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:353)
    at com.netflix.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:149)
    at com.netflix.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:138)
    at com.netflix.curator.RetryLoop.callWithRetry(RetryLoop.java:85)
    at com.netflix.curator.framework.imps.ExistsBuilderImpl.pathInForeground(ExistsBuilderImpl.java:134)
    at com.netflix.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:125)
    at com.netflix.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:34)
    at backtype.storm.zookeeper$exists_node_QMARK_.invoke(zookeeper.clj:78)
    at backtype.storm.zookeeper$mkdirs.invoke(zookeeper.clj:88)
    at backtype.storm.cluster$mk_distributed_cluster_state$reify__1996.set_ephemeral_node(cluster.clj:54)
    at backtype.storm.cluster$mk_storm_cluster_state$reify__2415.supervisor_heartbeat_BANG_(cluster.clj:300)
    at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93)
    at clojure.lang.Reflector.invokeInstanceMethod(Reflector.java:28)

and this is in the supervisor log file too

   at java.lang.Thread.run(Unknown Source)
2015-09-15 02:00:54 supervisor [INFO] ff6460a5-aafb-44a4-a49c-2de945ffd572 still hasn't started
2015-09-15 02:00:55 ClientCnxn [INFO] Client session timed out, have not heard from server in 20020ms for sessionid 0x14fce3996380015, closing socket connection and attempting reconnect
2015-09-15 02:00:58 ClientCnxn [INFO] Opening socket connection to server localhost/127.0.0.1:2181
2015-09-15 02:00:58 ClientCnxn [INFO] Socket connection established to localhost/127.0.0.1:2181, initiating session
2015-09-15 02:00:59 supervisor [INFO] ff6460a5-aafb-44a4-a49c-2de945ffd572 still hasn't started
2015-09-15 02:01:01 supervisor [INFO] ff6460a5-aafb-44a4-a49c-2de945ffd572 still hasn't started
2015-09-15 02:00:59 util [INFO] Halting process: ("Error when processing an event")
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

There are many possible reasons for this issue.

  1. zookeeper is not started.
  2. CPU get to peak for a while, no heartbeat send in the timeout, so nimbus think the supervisor is dead, the disconnect the connection.
  3. worker timeout is too short, maybe the default is 10sec, you can change it to 600 or more to try. it's almost like #2.
  4. Make sure nimbus is working fine.
  5. worker.childopts is not correct, it means the memory setting is not correct, change the xmx and maxpermsize try again.
  6. if you start the storm with winrm or powershell, maybe the default memory is not enough, since the default memory is only 1024M, you need to set more, such as 2048M to try.

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...