I am new to Apache spark. I want to find the unique product among the stores using scala spark.
Data in file is like below where 1st column in each row represents store name.
Sears,shoe,ring,pan,shirt,pen
Walmart,ring,pan,hat,meat,watch
Target,shoe,pan,shirt,hat,watch
I want the output to be
Only Walmart has Meat.
only Sears has Pen.
I tried the below in scala spark, able to get the unique products but don't know how to get the store name of those products. Please help.
val filerdd = sc.textFile("file:///home/hduser/stores_products")
val uniquerdd = filerdd.map(x=>x.split(",")).map(x=>Array(x(1),x(2),x(3),x(4),x(5))).flatMap(x=>x).map(x=>(x,1)).reduceByKey((a,b)=>a+b).filter(x=>x._2==1)
uniquerdd holds - Array((pen,1),(meat,1))
Now I want to find in which row of filerdd these products presents and should display the output as below
Only Walmart has Meat.
Only Sears has Pen.
can you please help me to get the desired output?
question from:
https://stackoverflow.com/questions/65938025/how-to-find-the-unique-product-among-the-stores-using-spark 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…