Powered By Blogger

Monday, March 14, 2011

How Materialized Views Work


Starting with Oracle 8.1.5, introduced in March 1999, you can have a materialized view, also known as a summary. Like a regular view, a materialized view can be used to build a black-box abstraction for the programmer. In other words, the view might be created with a complicated JOIN, or an expensive GROUP BY with sums and averages. With a regular view, this expensive operation would be done every time you issued a query. With a materialized view, the expensive operation is done when the view is created and thus an individual query need not involve substantial computation.

Materialized views consume space because Oracle is keeping a copy of the data or at least a copy of information derivable from the data. More importantly, a materialized view does not contain up-to-the-minute information. When you query a regular view, your results includes changes made up to the last committed transaction before your SELECT. When you query a materialized view, you're getting results as of the time that the view was created or refreshed. Note that Oracle lets you specify a refresh interval at which the materialized view will automatically be refreshed.

At this point, you'd expect an experienced Oracle user to say "Hey, these aren't new. This is the old CREATE SNAPSHOT facility that we used to keep semi-up-to-date copies of tables on machines across the network!" What is new with materialized views is that you can create them with the ENABLE QUERY REWRITE option. This authorizes the SQL parser to look at a query involving aggregates or JOINs and go to the materialized view instead. Consider the following query, from the ArsDigita Community System's /admin/users/registration-history.tcl page:

 
select 
  to_char(registration_date,'YYYYMM') as sort_key,
  rtrim(to_char(registration_date,'Month')) as pretty_month, 
  to_char(registration_date,'YYYY') as pretty_year, 
  count(*) as n_new
from users
group by 
  to_char(registration_date,'YYYYMM'), 
  to_char(registration_date,'Month'), 
  to_char(registration_date,'YYYY')
order by 1;
 
 
SORT_K PRETTY_MO PRET     N_NEW
------ --------- ---- ----------
199805 May      1998        898
199806 June     1998        806
199807 July     1998        972
199808 August   1998        849
199809 September 1998      1023
199810 October  1998       1089
199811 November  1998      1005
199812 December  1998      1059
199901 January  1999       1488
199902 February  1999      2148

For each month, we have a count of how many users registered at photo.net. To execute the query, Oracle must sequentially scan the users table. If the users table grew large and you wanted the query to be instant, you'd sacrifice some timeliness in the stats with

 
create materialized view users_by_month
   enable query rewrite
   refresh complete
   start with 1999-03-28
   next sysdate + 1 
   as 
   select 
     to_char(registration_date,'YYYYMM') as sort_key,
     rtrim(to_char(registration_date,'Month')) as pretty_month, 
     to_char(registration_date,'YYYY') as pretty_year, 
     count(*) as n_new
   from users
   group by 
     to_char(registration_date,'YYYYMM'), 
     to_char(registration_date,'Month'), 
     to_char(registration_date,'YYYY')
   order by 1

Oracle will build this view just after midnight on March 28, 1999. The view will be refreshed every 24 hours after that. Because of the enable query rewrite clause, Oracle will feel free to grab data from the view even when a user's query does not mention the view. For example, given the query

 
select count(*) 
from users 
where rtrim(to_char(registration_date,'Month')) = 'January'
and to_char(registration_date,'YYYY') = '1999'

Oracle would ignore the users table altogether and pull information from users_by_month. This would give the same result with much less work. Suppose that the current month is March 1999, though. The query

 
select count(*) 
from users 
where rtrim(to_char(registration_date,'Month')) = 'March'
and to_char(registration_date,'YYYY') = '1999'

will also hit the materialized view rather than the users table and hence will miss anyone who has registered since midnight (i.e., the query rewriting will cause a different result to be returned).

No comments: