multithreading - How does Apache spark handle python multithread issues?

Question

Welcome To Ask or Share your Answers For Others

multithreading - How does Apache spark handle python multithread issues?

1 Reply

深蓝 · Answer 1 · 2021-10-23T19:27:14+0000

Multi-threading python issues are separated from Apache Spark internals. Parallelism on Spark is dealt with inside the JVM.

And the reason is that in the Python driver program, SparkContext uses Py4J to launch a JVM and create a JavaSparkContext.

Py4J is only used on the driver for local communication between the Python and Java SparkContext objects; large data transfers are performed through a different mechanism.

RDD transformations in Python are mapped to transformations on PythonRDD objects in Java. On remote worker machines, PythonRDD objects launch Python sub-processes and communicate with them using pipes, sending the user's code and the data to be processed.

PS: I'm not sure if this actually answers your question completely.

Categories

multithreading - How does Apache spark handle python multithread issues?

multithreading - How does Apache spark handle python multithread issues?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags