python 2.7 - How to use map() to convert (key,values) pair to values only in Pyspark -
i have code in pyspark .
wordslist = ['cat', 'elephant', 'rat', 'rat', 'cat'] wordsrdd = sc.parallelize(wordslist, 4) wordcounts = wordpairs.reducebykey(lambda x,y:x+y) print wordcounts.collect() #prints--> [('rat', 2), ('elephant', 1), ('cat', 2)] operator import add totalcount = (wordcounts .map(<< fill in >>) .reduce(<< fill in >>)) #should print 5 #(wordcounts.values().sum()) // trick want map() , reduce() need use reduce() action sum counts in wordcounts , divide number of unique words.
* first need map() pair rdd wordcounts, consists of (key, value) pairs, rdd of values.
this stuck. tried below, none of them work:
.map(lambda x:x.values()) .reduce(lambda x:sum(x))) and, .map(lambda d:d[k] k in d) .reduce(lambda x:sum(x)))
any in highly appreciated!
finally got answer, -->
wordcounts .map(lambda x:x[1]) .reduce(lambda x,y:x + y)
Comments
Post a Comment