[oracle] avoiding SLOW sdo_aggr_union

There is this recurring problem we have in GIS: getting road-segments and wanting to show complete roads. The naive approach would we to do something like the following:

insert into street_geoms
select ro.rd_ro_ident, ro.rd_ro_name, ro.com_code, ssdo_aggr_union(sdoaggrtype(rd.ro_geometry, 0.005)) as geom
from rd_road ro, rd_ro_sec ros
where ros.rd_ro_ident = ro.rd_ro_ident
group by ro.rd_ro_ident, ro.rd_ro_name, ro.com_code;

For good measure: we have 45.000+ roads, with a total of 230.000+ road segments. So when that query starts running and starts taking a long time, I started googling. Apparently there are two faster alternatives: SDO_AGGR_CONCAT_LINES and SDO_AGGR_SET_UNION. While the first was really quick, completed in minutes, the result was completely wrong (complete segments were missing). The second might be quicker, but it was really hard to get an idea about any progress, and if it would fail, everything should be lost (rolled back).

So I decided to write a little script, and issue a sql statement for each single road, allowing me to track progress and added restartibility. For each road I issued a statement like:

insert into street_geoms
select ro.rd_ro_ident, ro.rd_ro_name, ro.com_code, sdo_aggr_set_union(CAST(COLLECT(ros.rd_ros_geometry) AS mdsys.SDO_Geometry_Array),0.005) as geom
from rd_road ro, rd_ro_sec ros
where ros.rd_ro_ident = ro.rd_ro_ident
  and ro.rd_ro_ident = 1895101 
group by ro.rd_ro_ident, ro.rd_ro_name, ro.com_code;

I added some ruby code around it, to make sure it tracked the progress and calculated the remaining time, just to have an idea. The first "large" road it stumbled upon literally took hours. It only had to join 39 segments. A simple query learned I had 150+ roads with more segments, and a maximum of 125 segments in the database. I could not just simply ignore them :) So this was not going to work either.

Why would this be so hard? I just wanted to throw all linestrings together into one geometry. How could I do that? Querying the geometries was really easy, so what if I joined the geometries outside of oracle? And wouldn't that be hard? But there is a simple solution: convert the strings to WKT, and join all LINESTRING in a MULTILINESTRING. This would just be simple string manipulation. I can do that ;)

I had some hiccups with this approach: handling the long strings proved a bit akward (use CLOB instead) and I had to regularly call GC.start to make sure the open cursors were released. And I had to make sure not to build a string literal which was too long (ORA-06550).

But in the end I was able to join the road-sections for the 45.000 + roads in approx 1.5h, which is not blindingly fast, but faster than 1 single SDO_AGGR_SET_UNION operation :) :)

For reference you can see the full code:

class StreetGeom < ActiveRecord::Base
  self.primary_key = 'rd_ro_ident'
end

def format_time (t)
  t = t.to_i
  sec = t % 60
  min = (t / 60) % 60
  hour = t / 3600
  sprintf("% 3d:%02d:%02d", hour, min, sec)
end

def eta(count)
  if count == 0
    "ETA: --:--:--"
  else
    elapsed = Time.now - @start_time
    # eta = elapsed * @total / count - elapsed;
    eta = (elapsed / count) * (@total - count)

    sprintf("ETA: %s", format_time(eta))
  end
end

all_roads = Road.count
geoms_to_calculate = all_roads - StreetGeom.count
@total = geoms_to_calculate

puts "Joining geometries for #{all_roads} roads [still #{geoms_to_calculate} to do]"

cntr = 1
@start_time = Time.now

done = 0

Road.order(:rd_ro_ident).each do |road|
  street_count = StreetGeom.where(rd_ro_ident: road.rd_ro_ident).count
  print "\rConverting #{cntr}/#{all_roads} [#{eta(done)}] "
  if street_count == 0
    print "..."
    $stdout.flush

    ## get all geometries in WKT format
    get_geoms_sql = <<-SQL
      select sdo_cs.make_2d(ros.rd_ros_geometry).get_wkt() as wkt_geom from rd_ro_sec ros where ros.rd_ro_ident = #{road.rd_ro_ident}
    SQL

    cursor = Road.connection.execute(get_geoms_sql)

    line_strings = []

    while row = cursor.fetch
      line_string = row[0].read.to_s
      line_strings << line_string[10..-1]
    end

    insert_sql = <<-SQL
      DECLARE
        wkt_str clob;
      BEGIN
        wkt_str := 'MULTILINESTRING(#{line_strings.join(", ';\nwkt_str := wkt_str || '")})';
        insert into street_geoms(rd_ro_ident, name, com_code, geom)
        values (#{road.rd_ro_ident}, q'[#{road.rd_ro_name}]', '#{road.com_code}',
             sdo_util.from_wktgeometry(to_clob(wkt_str)) );
      END;
    SQL

    Road.connection.execute(insert_sql)
    done += 1
  else
    print "_"
  end

  cntr += 1

  # periodically cleanup GC so we release open cursors ...
  # to avoid ORA-1000 errors
  if (cntr % 50) == 0
    GC.start
  end
end

print "\n"
puts "\n\nDone!"

and I run this script in the rails environment as follows: rails runner lib\tasks\join_road_geometries.rb.


Comments
Add comment

Recent comments

Tags

ruby on rails 34 ruby 26 rails3 17 rails 15 oracle 11 rspec 9 rspec2 7 jquery 7 ubuntu 5 javascript 5 windows 5 activerecord 3 refactoring 3 geoserver 3 gis 3 arrrrcamp 3 actionmailer 2 oracle spatial 2 tdd 2 postgis 2 routing 2 rvm 2 mongoid 2 csharp 2 thin 2 win32 2 gem 2 rails4 2 git 2 service 2 haml 2 cucumber 2 view testing 2 i18n 1 displaysleep 1 spatial 1 gemsets 1 wubi 1 oracle_enhanced_adapter 1 migrations 1 watchr 1 ci 1 plugins 1 coderetreat 1 ie8 1 ssl 1 oci 1 nested model form 1 wcf 1 11.04 1 jsonp 1 ruby-oci8 1 teamcity 1 engines 1 pgadmin 1 soap 1 content_for 1 word automation 1 plugin 1 capybara 1 xml 1 bootstrap 1 migrate to rails3 1 mvc 1 unity 1 rendering 1 word2007 1 x64 1 limited stock 1 fast tests 1 pl/sql 1 delayed_job 1 pdf 1 test coverage 1 optimization 1 processing 1 borland 1 method_missing 1 cross-browser 1 devise 1 schema_plus 1 mongo 1 mongrel 1 dual boot 1 usability 1 mongrel_service 1 dba 1 mission statement 1 model 1 metadata 1 rcov 1 exceptions 1 image_tag 1 attachments 1 bde 1 css 1 yield 1 ajax 1 generative art 1 rails-assets 1 coordinate systems 1 submodules 1 netzke 1 ora-01031 1 authlogic 1 postgresql 1 shopping cart 1 agile 1 fast_tagger 1 subjective 1 wice_grid 1 generators 1 nvidia 1 mongodb 1 etsyhacks 1 staleobjecterror 1 session 1 jeweler 1 wordpress hacked 1 jasmine 1 heroku 1 rjs 1 life 1 unobtrusive-javascript 1 render_anywhere 1 html5 1 rails31 1 json 1 cocoon 1 mingw32 1 observe_field 1 osx 1 actionwebservice 1 testing 1 debugging 1 strings 1