python 2.7 - How to use map() to convert (key,values) pair to values only in Pyspark -


i have code in pyspark .

wordslist = ['cat', 'elephant', 'rat', 'rat', 'cat'] wordsrdd = sc.parallelize(wordslist, 4)   wordcounts = wordpairs.reducebykey(lambda x,y:x+y) print wordcounts.collect()  #prints-->  [('rat', 2), ('elephant', 1), ('cat', 2)]  operator import add totalcount = (wordcounts               .map(<< fill in >>)               .reduce(<< fill in >>))  #should print 5  #(wordcounts.values().sum()) // trick want map() , reduce()   need use reduce() action sum counts in wordcounts , divide number of unique words. 

* first need map() pair rdd wordcounts, consists of (key, value) pairs, rdd of values.

this stuck. tried below, none of them work:

.map(lambda x:x.values()) .reduce(lambda x:sum(x)))  and,  .map(lambda d:d[k] k in d) .reduce(lambda x:sum(x))) 

any in highly appreciated!

finally got answer, -->

wordcounts .map(lambda x:x[1]) .reduce(lambda x,y:x + y) 

Comments

Popular posts from this blog

Android : Making Listview full screen -

javascript - Parse JSON from the body of the POST -

javascript - Chrome Extension: Interacting with iframe embedded within popup -