Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
524 views
in Technique[技术] by (71.8m points)

python - pyspark: Method isBarrier([]) does not exist

I'm trying to learn Spark following some hello-word level example such as below, using pyspark. I got a "Method isBarrier([]) does not exist" error, full error included below the code.

from pyspark import SparkContext

if __name__ == '__main__':
    sc = SparkContext('local[6]', 'pySpark_pyCharm')
    rdd = sc.parallelize([1, 2, 3, 4, 5, 6, 7, 8])
    rdd.collect()
    rdd.count()

enter image description here

Although, when I start a pyspark session in command line directly and type in the same code, it works fine:

enter image description here

My setup:

  • windows 10 Pro x64
  • python 3.7.2
  • spark 2.3.3 hadoop 2.7
  • pyspark 2.4.0
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The problem is incompatibility between versions of Spark JVM libraries and PySpark. In general PySpark version has to exactly match the version of your Spark installation (while in theory matching major and minor versions should be enough, some incompatibilities in maintenance releases have been introduced in the past).

In other words Spark 2.3.3 is not compatible with PySpark 2.4.0 and you have to either upgrade Spark to 2.4.0 or downgrade PySpark to 2.3.3.

Overall PySpark is not designed to be used a standalone library. While PyPi package is a handy development tool (it is often easier to just install a package than manually extend the PYTHONPATH), for actual deployments it is better to stick with the PySpark package bundled with actual Spark deployment.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...