We had more than a few troubles migrating to Oracle 10, in our production environment. We are using a clustered server, with 3 nodes, so it should be blindingly fast.

I’ll sum up our biggest issues (we had) :

  1. Sequences : we used a sequence to make sure the messages we send, can be ordered. Somehow our sequence was set to “NO ORDER”, but in a single-node environment this always works. In our environment, we suddenly go gaps, leaps, depending on which node you were connected to. This seemingly unexplainable behaviour (sequences are always in order, so the error must be something else), was luckily quickly discovered (are they really?), and easily fixed.
  2. ORA-01483 : all our c++ server software, running on NT, is built using C++ Builder and BDE. We have had troubles like hell with “max open cursor” problems. So, fix, or workaround : count maximum open cursors, if this approaches the database maximum, re-connect to the database. Dirty, but it works. So, in Oracle 8, we had upped the SESSION_CACHED_CURSORS to 250. When we did the same in the Oracle 10g, suddenly, out of the blue appeared the ORA-01483 error. Whammo !! It took us quite a while to figure out what exactly caused the errors. Resetting the aforementioned to it’s default value fixed that problem.
  3. Database drop-out : every night our database just crashes, hangs. This is related to “open cursors problem”. Our applications count the amount of open cursors using the v$-tables, which apparently after a while causes Oracle to hang.

    Here is the query which correctly counts the open cursors given the program name :

    select distinct a.value, s.sid, s.machine, s.program, s.inst_id, s.server
      from gv$sesstat a, gv$statname b, gv$session s
      where a.statistic# = b.statistic# and b.name = 'opened cursors current'
        and s.sid = a.sid and s.inst_id = a.inst_id and upper(s.program) like '[program-name]%'
    order by 1 desc
    

    The query we used before, was :

    select count(*) from v$open_cursor where sid=:sid

    To solve this, i changed the server-process causing the database to hang each night, to use DOA instead of BDE. Apparently this also alleviates our open cursor problem. I did notice that open cursors, using DOA, are only really closed in Oracle after a commit. This is taken into production tonight, let’s hope it will finally improve our databases up-time.

For the moment we manually stop and start our database twice a day, in a controlled way, in the hope our database can keep up for the rest of the time. Oracle are looking into our case, i am not quite sure why our setup is different than the rest of the world.

It has been a very hectic two weeks. Finally we are reaching a somewhat stable environment, although it still requires a lot of manual intervention to keep it running smoothly.