apache spark - How to move 1000 files to RDD's? -

- February 15, 2011

i new in apache spark , need help.

i have python script reading 6 tdms files (tdms() function) , building graph numerical data of each of them (graph() function). loop. want load 1000 such files , run script in parallels each one. want create rdd's files , apply function each file?

how can it? can define number of nodes in spark?

have tried making python list includes files need read, , run in loop read data file, create rdd, run graph function, , guess save it?

or make file list rdd, , run map, lambda(for graph), each.

if care parallel run, can keep loading data , make 1 big rdd, , call sc.parallelize. can either decide spark it, or can specify number want use calling sc.parallelize(data, ).

Search This Blog

Bay WIKI

apache spark - How to move 1000 files to RDD's? -

Comments

Post a Comment

Popular posts from this blog

Android : Making Listview full screen -

javascript - Parse JSON from the body of the POST -

Revit Family Rename in a project -