DB2 Temporary Tables

Temporary Tables:
db2 has four kinds of temporary tables which we will study.
  • Temporary tables which exist in a where_clause only during the execution of a single sql statement,
  • Temporary tables which exist in a from_clause only during the execution of a single sql statement,
  • Temporary tables which survive for the duration of a single session.
  • Temporary tables that are part of a with-statement.


 
Single SQL Statement:
Subquery tables are anonymous temporary tables with any name. These tables disappear at the end of the query
 
-- print out the borrowers of a single book
select b_name, b_addr from cardholder ch
  where ch.borrowerid in (select borrowerid from borrows
                             group by borrowerid
                             having count(*) =1)
                             
-- print out the accession_no of the most expensive copy of each book
select accession_no, p_price from copy c
  where c.p_price = (select max(p_price) 
                        from copy c1 
                        where c1.isbn = c.isbn)


Other temporary tables which survive for the duration of a single SQL statement are those defined in the from_clause of such a statement. They are used when you wish to break up a query into steps but don't need the intermediate tables for anything other than the final results of the query being developed.
  1: Find the most expensive books sold by each publisher.

  select pub_name, author, title, c_price
    from book bk, table(select pub_name, max(c_price) as maxprice
                           from book group by pub_name) as temp1 t
    where bk.pub_name = t.pub_name and bk.c_price = t.maxprice
The table temp1 is a temporary table which will not exist after the execution of the query (t is its alias). We use temp1 to break the query logic up into two steps - first find the highest book price for each publisher and second find the books published by that publisher that cost that much.

The alternative to doing things this way is to use a corelated subquery.

  1': Find the most expensive books sold by each publisher.

  select pub_name, author, title, c_price
    from book bk
    where bk.c_price = ( select max(c_price) from book bk1
                            where bk1.pub_name = bk.pub_name)
This query, since it is corelated on pub_name, will need to have its subquery re-executed for every row in book and so theoretically, if there are many books published by the same publisher, we may need to recalculate the exactly same max price over and over again instead of once per publisher.

The use of the temporary table means we build the table only once but it will be "bigger" since we collect all max(c_price) values for all publishers. In order to know which is faster, it is probably a good idea to execute both and see which runs faster.


Session:
Temporary tables which survive for an entire session need to be created in a user tablespace which is a user temporary space. This kind of table space is not created automatically and so needs to be created by the sysadmin. You will need to do this if you are running db2 on your own computer. This is most easily done using the Control Center however the default syntax for doing this is:
-- don't worry about this; the sysadmin executes this
CREATE  USER TEMPORARY  TABLESPACE some_name PAGESIZE 4 K
   MANAGED BY SYSTEM  USING ('/home/db2inst1/db2inst1/NODE0000/container_name')
   EXTENTSIZE 32 OVERHEAD 24.1 PREFETCHSIZE 32
   TRANSFERRATE 0.9 BUFFERPOOL IBMDEFAULTBP;
This code has already been run on avalon.

Once this tablespace exists and you are granted use of this space (by me),

   db2 grant use of tablespace some_name to user some_user
you are able to create temporary tables which can be used to hold temporary, non-logged data for the duration of your session (until you terminate the database connection). To terminate a session just execute
   db2 connect reset
or
   db2 terminate
Upon doing this all temporary tables you created during the recent session are dropped.

There are a couple of ways you can define a temporary table. A complete list of the possible syntaxes can be found in the SQL Reference Manual under declare global temporary table. For now we will look at only two.

   declare global temporary table table_name1
     LIKE <table_name2>  on commit preserve rows not logged
   
or
   declare global temporary table table_name
     (column1 some_data_type, column2 some_data_type, ...)
     on commit preserve rows not logged
   
or
   declare global temporary table table_name
     as select ...
     on commit preserve rows not logged
   
Neither of these statements actually puts any rows into the temporary table. To do that you must use the normal insert into table_name syntax. However to do this you need to know what schema the temporary tables are stored in. The schema is called SESSION. This schema name can not be left off whenever you are using a global temporary table. This is because no reference to such a table is to be found in SYSIBM.SYSTABLES so if you leave off the schema name and a table exists in your current schema with the same name then that table will be used instead.

If we were to answer the same question these notes began with using a global temporary table the SQL syntax would be

  1'': Find the most expensive books sold by each publisher.

  

  declare global temporary table Temp1
     (pub_name varchar(7), maxprice  numeric(8,2))
     on commit preserve rows not logged ;

  insert into SESSION.Temp1 (pub_name, maxprice )
    select pub_name, max(c_price) from book group by pub_name;

  select bk.pub_name, bk.author, bk.title, bk.c_price
    from book bk, SESSION.Temp1 t
    where bk.pub_name = t.pub_name and bk.c_price = t.maxprice
 
One curious characteristic of the declare global ... command is that if you do not include the on commit preserve rows section then the default behaviour is to wipe out the contents of the temporary table whenever a commit statement is issued (explicitly or implicitly) which would otherwise write all table data to disk. Since a commit transaction statement might be executed by db2 between the insert into and select statements above you might end up with an empty answer set of you don't explicitly save the data in the temporary table.

The WITH Statement:
The with statement allows you to create temporary tables that can be used within a single query, like our first example, but with the added advantage that the tables can refer to one another.

For example, suppose we want to print out the most expensive books published by each publisher ( pub_name, isbn, author, title, c_price) and to put an "*" beside the most expensive books in the entire library. The output should look like

1 PUB_NAME ISBN AUTHOR  TITLE  C_PRICE
- -------- ---- ------- ------ -------
  AW       1-52 Date    DB       28.00
  CSP      7-45 Baer    Arch     35.00
* PH       2-34 T'baum  Netw     37.00
  Wiley    3-56 K'rock  Queue    25.00

  4 record(s) selected.
 
The SQL is:
 with
      temp1 as (select pub_name, max(c_price) as maxprice
                           from book group by pub_name),
      temp2 as (select  pub_name,  maxprice from temp1
                    where maxprice = (select max(maxprice) from temp1) )
      select case when t.maxprice = t2.maxprice  then '*' else '' END,
        k.pub_name, isbn, author, title, c_price
        from book k, temp1 t , temp2 t2
        where k.pub_name = t.pub_name and
              k.c_price = t.maxprice
        order by k.pub_name;          
Notice how we referred to the table temp1 in the query which defined table temp2. This would not be possible if we had used our first approach of putting both temp1 and temp2 in the from-clause of the select statement as follows:
 select case when temp1.maxprice = temp2.maxprice  then '*' else '' END,
        k.pub_name, isbn, author, title, c_price
        from book k,
         table (select pub_name, max(c_price) as maxprice
                           from book group by pub_name) as temp1  ,
         table (select  pub_name,  maxprice from temp1
                    where maxprice = (select max(maxprice) from temp1) )
                         as  temp2
        where k.pub_name = temp1.pub_name and
              k.c_price = temp1.maxprice
        order by k.pub_name;
This is because the table temp1 does not exist in the schema we are currently using - db2inst1 or pletcha - but only in the query currently being executed. However, in the definition of temp2 that is precisely where the query looks for the temp1 mentioned. Since it doesn't find it we get an error message (SQL0204N).

So the value of the with-clause is that you can have several temporary tables and refer to one in the construction of some other. You also have the advantage of simplifying the final select statement so it is easier to read.

One last comment about the case statement used. In C, C++ and Java there is a statement of the form

   (condition) ? <expression_1> : <expression_2>;
 
This statement evaluates (condition), if it is true it returns <expression_1> else it returns <expression_2>. This is precisely the behaviour of the case statement. The statement
 case when temp1.maxprice = temp2.maxprice  then '*' else '' END
evaluates temp1.maxprice = temp2.maxprice and if it is true then the case statement returns '*'; otherwise it returns the empty string.

An observation. Using the case statement gives you a means of taking some of the search logic, which normally appears within the where-clause, and moving it into the select-list. There are moments when this can be very useful.



pletcha@newpaltz.edu