Updating millions of rows
I spend an inordinate proportion of design time of an ETL system worrying about the relative proportion of rows inserted vs updated.
I want to test on a level playing field and remove special factors that unfairly favour one method, so there are some rules: TEST (Update Source) - 100K rows TEST (Update target) - 10M rows Name Type Name Type ------------------------------ ------------ ------------------------------ ------------ PK NUMBER PK NUMBER FK NUMBER FK NUMBER FILL VARCHAR2(40) FILL VARCHAR2(40) Not many people code this way, but there are some Pro*C programmers out there who are used to Explicit Cursor Loops (OPEN, FETCH and CLOSE commands) and translate these techniques directly to PL/SQL.
The requests that did not timeout will be executed once the transaction ends if the original table was not dropped.
Note that, even if you create a new table with the same name the requests will still fail because they use the table OID.
Doing these kind of operations without downtime is an even harder challenge.
In this blog post I will try to outline a few strategies to minimize the impact in table availability while managing large data sets.
The UPDATE portion of the code works in an identical fashion to the Implicit Cursor Loop, so this is not really a separate "UPDATE" method as such.
DECLARE CURSOR c1 IS SELECT * FROM test6; rec_cur c1%rowtype; BEGIN OPEN c1; LOOP FETCH c1 INTO rec_cur; EXIT WHEN c1%notfound; UPDATE test SET fk = rec_, fill = rec_WHERE pk = rec_cur.pk; END LOOP; CLOSE C1; END; / This is the simplest PL/SQL method and very common in hand-coded PL/SQL applications.Update-wise, it looks as though it should perform the same as the Explicit Cursor Loop.The difference is that the Implicit Cursor internally performs bulk fetches, which should be faster than the Explicit Cursor because of the reduced context switches. I generally recommend against it for high-volume updates because the SET sub-query is nested, meaning it is performed once for each row updated. Using BULK COLLECT and FORALL statements is the new de-facto standard for PL/SQL programmers concerned about performance because it reduces context switching overheads between the PL/SQL and SQL engines.To support this method, I needed to create an index on TEST8. The biggest drawback to this method is readability. LAST UPDATE test SET fk = fk_tab(i) , fill = fill_tab(i) WHERE pk = pk_tab(i); END LOOP; CLOSE rec_cur; END; / The modern equivalent of the Updateable Join View. Parallel PL/SQL ORA-00060: deadlock detected Well, if further proof was needed that Bitmap indexes are inappropriate for tables that are maintained by multiple concurrent sessions, surely this is it.Since Oracle does not yet provide support for record collections in FORALL, we need to use scalar collections, making for long declarations, INTO clauses, and SET clauses. Gaining in popularity due to its combination of brevity and performance, it is primarily used to INSERT and UPDATE in a single statement. Note that I have included a FIRST_ROWS hint to force an indexed nested loops plan. The Deadlock error raised by Method 8 occurred because bitmap indexes are locked at the block-level, not the row level.Besides this, here is a list of things that you should know when you need to update large tables: With this in mind, let’s look at a few strategies that you can use to effectively update a large number of rows in your table: If you can segment your data using, for example, sequential IDs, you can update rows incrementally in batches.