python - How to map a column to create a new column in spark sql dataframe?

Question

Welcome To Ask or Share your Answers For Others

python - How to map a column to create a new column in spark sql dataframe?

posted Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - How to map a column to create a new column in spark sql dataframe?

In python and pandas, I can create a new column like this:

Using two columns in pandas dataframe to create a dict.

 dict1 = dict(zip(data["id"], data["duration"]))

Then I can apply this dict to create a new column in a second dataframe.

df['id_duration'] = df['id'].map(lambda x: dict1[x] if x in dict1.keys() else -1)

How can I create a new column id_duration in spark sql dataframe, in case I have a dataframe data (having two columns: id and duration) and a dataframe df (having a column id)?

question from:https://stackoverflow.com/questions/65838706/how-to-map-a-column-to-create-a-new-column-in-spark-sql-dataframe

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-06T19:32:32+0000

Using a dictionary would be a shame because you would need to collect the entire dataframe data onto the driver which will be very bad for performance and could cause an OOM error.

You could simply perform a left outer join between the two dataframes and use na.fill to fill empty values with -1.

data = spark.createDataFrame([(1, 10), (2, 20), (3, 30)], ['id', 'duration'])
df = spark.createDataFrame([(1, 2), (3, 4)], ['id', 'x'])

df
    .join(data.withColumnRenamed("duration", "id_duration"), ['id'], 'left')
    .na.fill(-1).show()

+---+---+-----------+
| id|  x|id_duration|
+---+---+-----------+
|  5|  6|         -1|
|  1|  2|         10|
|  3|  4|         30|
+---+---+-----------+

Categories

python - How to map a column to create a new column in spark sql dataframe?

python - How to map a column to create a new column in spark sql dataframe?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags