Need some inputs in feature extraction in Apache Spark -
i new apache spark , trying use mlib utility analysis. collated code convert data features , apply linear regression algorithm that. facing issues . please , excuse if silly question
my person data looks like
1,1000.00,36 2,2000.00,35 3,2345.50,37 4,3323.00,45
just simple example code working
import org.apache.spark.sparkcontext import org.apache.spark.sparkcontext._ import org.apache.spark.sparkconf import org.apache.spark.mllib.linalg.{vector, vectors} import org.apache.spark.mllib.regression.labeledpoint case class person(rating: string, income: double, age: int) val persondata = sc.textfile("d:/spark/mydata/persondata.txt").map(_.split(",")).map(p => person(p(0), p(1).todouble, p(2).toint)) def preparefeatures(people: seq[person]): seq[org.apache.spark.mllib.linalg.vector] = { val maxincome = people.map(_ income) max val maxage = people.map(_ age) max people.map (p => vectors.dense( if (p.rating == "a") 0.7 else if (p.rating == "b") 0.5 else 0.3, p.income / maxincome, p.age.todouble / maxage)) } def preparefeatureswithlabels(features: seq[org.apache.spark.mllib.linalg.vector]): seq[labeledpoint] = (0d 1 (1d / features.length)) zip(features) map(l => labeledpoint(l._1, l._2)) ---its working till here. ---it breaks in below code val data = sc.parallelize(preparefeatureswithlabels(preparefeatures(people)) scala> val data = sc.parallelize(preparefeatureswithlabels(preparefeatures(people))) <console>:36: error: not found: value people error occurred in application involving default arguments. val data = sc.parallelize(preparefeatureswithlabels(preparefeatures(people))) ^
please advise
you seem going in right direction there few minor problems. first off trying reference value (people) haven't defined. more seem writing code work sequences, , instead should modify code work rdds (or dataframes). seem using parallelize
try , parallelize
operation, parallelize
helper method take local collection , make available distributed rdd
. i'd recommend looking @ programming guides or additional documentation better understanding of spark apis. best of luck adventures spark.
Comments
Post a Comment