amazon web services - How to use output of RedShift query as input of an EMR job? -


so limited understand of redshift plan going problem...

i want take results of query, , use them input emr job. best way go programmaticly.

currently emr job takes flat file s3 input, , use amazon java sdk, set job , everything.

should write output of redshift query s3, , point emr job there, , remove file after emr job has completed?

or redshift , aws skd offer more resourceful way directly pipe query redshift emr, cutting out the s3 step?

thanks

recently spoke memebers of amazon redshift team, said solution in works.

this pretty easy - no need sqoop. add cascading lingual step @ front of job executes redshift unload command s3:

unload ('select_statement') 's3://object_path_prefix' [ ] credentials [as] 'aws_access_credentials'  [ option [ ... ] ] 

then can either process export directly on s3, or add s3distcp step bring data onto hdfs first.

this lot more performant adding sqoop, , lot simpler maintain.


Comments

Popular posts from this blog

php - Calling a template part from a post -

Firefox SVG shape not printing when it has stroke -

How to mention the localhost in android -