hadoop - Clean AWS EMR to allow reuse -


i have several task i'm preforming on aws emrs don't share data , use same emr perform them 1 after another. there way clean running emr initial state (remove hive tables, clean hdfs files etc.) avoid collision of data?

i want reuse emr several reasons:

  1. creation of new emr can take 5-10 minutes.
  2. my task relative shorts, 20-25 minutes.
  3. once emr created paying full hour.

we didn't find "quick , clean" api achieve behaviour. instead consolidate simple work methodology promise can clean data.

  • we work on specific db instead of default one.
  • we put our internal data files under specific location in hdfs.

so every time task started, first delete specific db if exists , recreate , recursively delete data under specific location in hdfs.


Comments

Popular posts from this blog

Android : Making Listview full screen -

javascript - Parse JSON from the body of the POST -

javascript - Chrome Extension: Interacting with iframe embedded within popup -