SSIS to insert non-matching data on non-linked server -


this regarding sql server 2008 r2 , ssis.

i need update dozens of history tables on 1 server new data production tables on server.

the 2 servers not, , not be, linked.

some of history tables have 100's of millions of rows , of production tables have dozens of millions of rows.

i have process in place each table uses following data flow components:

  1. oledb source task pull appropriate production data.
  2. lookup task check if production data's key exists in history table , using "redirect error output" -
  3. transfer missing data oledb destination history table.

the process slow large tables. there has better way. can help?

i know if servers linked single set based query accomplish task , efficiently, servers not linked.

segment problem smaller problems. that's way you're going solve this.

let's examine problems.

  1. you're inserting and/or updating existing data. @ database level, rows packed pages. exact fit , there's amount of free space left in page. when update row, pretend name field went "bob" "robert michael stuckenschneider iii". row needs more room live , while there's some room left on page, there's not enough. other rows might shuffled down next page give 1 elbow room. that's going cause lots of disk activity. yes, it's inevitable given adding more data it's important understand how data going grow , ensure database ready growth. maybe, have non-clustered indexes on target table. disabling/dropping them should improve insert/update performance. if still have database , log set grow @ 10% or 1mb or whatever default values are, storage engine going spend of time trying grow files , won't have time write data. take away: ensure system poised receive lots of data. work dba, lan , san team(s)

  2. you have tens of millions of rows in oltp system , hundreds of millions in archive system. starting oltp data, need identify not exist in historical system. given data volumes, plan package have hiccup in processing , needs "restartable." have package has data flow business keys selected oltp used make match against target table. write keys table lives on oltp server (tobetransfered). have second package uses subset of keys (n rows) joined original table source. it's wired directly destination no lookup required. fat data row flows on over network 1 time. have execute sql task go in , delete batch sent archive server. batching method can allow run second package on multiple servers. ssis team describes better in paper: we loaded 1tb in 30 minutes

  3. ensure lookup query of form select key1, key2 mytable better yet, can provide filter lookup? where processingyear = 2013 there's no need waste cache on 2012 if oltp contains 2013 data.

  4. you might need modify packetsize on connection manager , have network person set jumbo frames.

  5. look @ queries. getting plans? tables over-indexed? remember, each index going result in increase in number of writes performed. if can dump them , recreate after processing completed, you'll think san admins bought fusionio drives. know did when dropped 14 nc indexes billion row table had 10 total columns.

if you're still having performance issues, establish theoretical baseline (under ideal conditions never occur in real world, can push 1gb b in n units of time) , work way there actual is. must have limiting factor (io, cpu, memory or network). find culprit , throw more money @ or restructure solution until it's no longer lagging metric.


Comments

Popular posts from this blog

php - Calling a template part from a post -

Firefox SVG shape not printing when it has stroke -

How to mention the localhost in android -