Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
692 views
in Technique[技术] by (71.8m points)

python 2.7 - PySpark in Eclipse: using PyDev

I am running a local pyspark code from command line and it works:

/Users/edamame/local-lib/apache-spark/spark-1.5.1/bin/pyspark --jars myJar.jar --driver-class-path myJar.jar --executor-memory 2G --driver-memory 4G --executor-cores 3 /myPath/myProject.py

Is it possible to run this code from Eclipse using PyDev? What are the arguments required in the Run Configuration? I tried and got the following errors:

Traceback (most recent call last):
  File "/myPath/myProject.py", line 587, in <module>
    main()
  File "/myPath/myProject.py", line 506, in main
    conf = SparkConf()
  File "/Users/edamame/local-lib/apache-spark/spark-1.5.1/python/pyspark/conf.py", line 104, in __init__
    SparkContext._ensure_initialized()
  File "/Users/edamame/local-lib/apache-spark/spark-1.5.1/python/pyspark/context.py", line 234, in _ensure_initialized
    SparkContext._gateway = gateway or launch_gateway()
  File "/Users/edamame/local-lib/apache-spark/spark-1.5.1/python/pyspark/java_gateway.py", line 76, in launch_gateway
    proc = Popen(command, stdin=PIPE, preexec_fn=preexec_func, env=env)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 711, in __init__
    errread, errwrite)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 1308, in _execute_child
    raise child_exception
OSError: [Errno 2] No such file or directory

Does any one have any idea? Thank you very much!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Considering the following prerequisites:

  • Eclipse, PyDev and Spark installed.
  • PyDev with a Python interpreter configured.
  • PyDev with the Spark Python sources configured.

Here is what you'll need to do:

  • From Eclipse ID, Check that you are on the PyDev perspective:

    • On Mac : Eclipse > Preferences
    • On Linux : Window > Preferences
  • From the Preferences window, go to PyDev > Interpreters > Python Interpreter:

    • Click on the central button [Environment]
    • Click on the button [New...] add a new Environment variable.
    • Add the environment variable SPARK_HOME and validate:
    • Name: SPARK_HOME, Value: /path/to/apache-spark/spark-1.5.1/
    • Note : Don’t use the system environment variables such as $SPARK_HOME

I also recommend you to handle your own log4j.properties file in each of your project.

To do so, you'll need to add the environment variable SPARK_CONF_DIR as done previously, example:

Name: SPARK_CONF_DIR, Value: ${project_loc}/conf

If you experience some problems with the variable ${project_loc} (e.g: with Linux), specify an absolute path instead.

Or if you want to keep ${project_loc}, right-click on every Python source and: Runs As > Run Configuration, then create your SPARK_CONF_DIR variable in the Environment tab as described previously.

Occasionally, you can add other environment variables such as TERM, SPARK_LOCAL_IP and so on:

  • Name: TERM, Value on Mac: xterm-256color, Value on Linux: xterm , if you wan to use xterm of course
  • Name: SPARK_LOCAL_IP, Value: 127.0.0.1 (it’s recommended to specify your real local IP address)

PS: I don't remember the sources of this tutorial, so excuse me for not citing the author. I didn't come up with this by myself.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...