Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
219 views
in Technique[技术] by (71.8m points)

scala - How to find the unique product among the stores using spark?

I am new to Apache spark. I want to find the unique product among the stores using scala spark.

Data in file is like below where 1st column in each row represents store name.

Sears,shoe,ring,pan,shirt,pen
Walmart,ring,pan,hat,meat,watch
Target,shoe,pan,shirt,hat,watch

I want the output to be

Only Walmart has Meat.
only Sears has Pen.

I tried the below in scala spark, able to get the unique products but don't know how to get the store name of those products. Please help.

val filerdd = sc.textFile("file:///home/hduser/stores_products")

val uniquerdd =  filerdd.map(x=>x.split(",")).map(x=>Array(x(1),x(2),x(3),x(4),x(5))).flatMap(x=>x).map(x=>(x,1)).reduceByKey((a,b)=>a+b).filter(x=>x._2==1)

uniquerdd holds - Array((pen,1),(meat,1))

Now I want to find in which row of filerdd these products presents and should display the output as below

Only Walmart has Meat. 
Only Sears has Pen.

can you please help me to get the desired output?

question from:https://stackoverflow.com/questions/65938025/how-to-find-the-unique-product-among-the-stores-using-spark

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The dataframe API is probably easier than the RDD API to do this. You can explode the list of products and filter those with count = 1.

import org.apache.spark.sql.expressions.Window

df = spark.read.csv("filepath")

result = df.select(
    $"_c0".as("store"),
    explode(array(df.columns.tail.map(col):_*)).as("product")
).withColumn(
    "count",
    count("*").over(Window.partitionBy("product"))
).filter(
    "count = 1"
).select(
    format_string("Only %s has %s.", $"store", $"product").as("output")
)

result.show(false)
+----------------------+
|output                |
+----------------------+
|Only Walmart has meat.|
|Only Sears has pen.   |
+----------------------+

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...