apache spark - How to move 1000 files to RDD's? -


i new in apache spark , need help.

i have python script reading 6 tdms files (tdms() function) , building graph numerical data of each of them (graph() function). loop. want load 1000 such files , run script in parallels each one. want create rdd's files , apply function each file?

how can it? can define number of nodes in spark?

have tried making python list includes files need read, , run in loop read data file, create rdd, run graph function, , guess save it?

or make file list rdd, , run map, lambda(for graph), each.

if care parallel run, can keep loading data , make 1 big rdd, , call sc.parallelize. can either decide spark it, or can specify number want use calling sc.parallelize(data, ).


Comments

Popular posts from this blog

Android : Making Listview full screen -

javascript - Parse JSON from the body of the POST -

javascript - Chrome Extension: Interacting with iframe embedded within popup -