postgresql - Is Hadoop Suitable For This? -
we have postgres queries take 6 - 12 hours complete , wondering if hadoop suited doing faster. have (2) 64 core servers 256gb of ram hadoop use.
we're running postgresql 9.2.4. postgres uses 1 core on 1 server query, i'm wondering if hadoop 128 times faster, minus overhead.
we have 2 sets of data, each millions of rows.
set one:
id character varying(20), a_lat double precision, a_long double precision, b_lat double precision, b_long double precision, line_id character varying(20), type character varying(4), freq numeric(10,5)
set two:
a_lat double precision, a_long double precision, b_lat double precision, b_long double precision, type character varying(4), freq numeric(10,5)
we have indexes on lat, long, type, , freq fields, using btree. both tables have "vacuum analyze" run right before query.
the postgres query is:
select id setone 1 not exists ( select 'x' settwo 2 two.a_lat >= one.a_lat - 0.000278 , two.a_lat <= one.a_lat + 0.000278 , two.a_long >= one.a_long - 0.000278 , two.a_long <= one.a_long + 0.000278 , two.b_lat >= one.b_lat - 0.000278 , two.b_lat <= one.b_lat + 0.000278 , two.b_long >= one.b_long - 0.000278 , two.b_long <= one.b_long + 0.000278 , ( two.type = one.type or two.type = 's' ) , two.freq >= one.freq - 1.0 , two.freq <= one.freq + 1.0 ) order line_id
is type of thing hadoop can do? if can point me in right direction?
try stado @ http://stado.us. use branch: https://code.launchpad.net/~sgdg/stado/stado, used next release.
even 64 cores, using 1 core process query. stado can create multiple postgresql-based "nodes" on single box , leverage parallelism , cores working.
in addition, have had success converting correlated not exists queries (select count(*) ...) = 0.
Comments
Post a Comment