scala - Apache Spark : When not to use mapPartition and foreachPartition?

Question

Welcome To Ask or Share your Answers For Others

scala - Apache Spark : When not to use mapPartition and foreachPartition?

1 Reply

深蓝 · Answer 1 · 2021-10-17T03:08:52+0000

When you write Spark jobs that uses either mapPartition or foreachPartition you can just modify the partition data itself or just iterate through partition data respectively. The anonymous function passed as parameter will be executed on the executors thus there is not a viable way to execute a code which invokes all the nodes e.g: df.reduceByKey from one particular executor. This code should be executed only from the driver node. Thus only from the driver code you can access dataframes, datasets and spark session.

Please find here a detailed discussion over this issue and possible solutions

Categories

scala - Apache Spark : When not to use mapPartition and foreachPartition?

scala - Apache Spark : When not to use mapPartition and foreachPartition?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags