Shawn
Guest
|
 |
« on: August 07, 2007, 07:36: AM » |
|
Advanced SQL - 1
This advanced SQL tutorial focuses on the design of more complex SQL statements and the strategies available for implementing them, it concentrates on sub queries and joins because they are often inter-changeable, and views because these are often used to hide the complexity of queries involving sub-queries and joins. Sub Queries
Sub queries are also known as nested queries and are used to answer multi-part questions. They are often interchangeable with a join in SQL. In fact, when executed, a query containing a sub-query may well be treated by the Oracle optimiser exactly as if it were a join.
Let's use a trivial example of finding the names of everybody who works in the same department as a person called Jones to illustrate this point. The SQL could be written using a sub query as follows:
SELECT name FROM emp WHERE dept_no = (SELECT dept_no FROM emp WHERE name = 'JONES')
or as a join statement, like this:-
SELECT e1.name FROM emp e1,emp e2 WHERE e1.dept_no = e2.dept_no AND e2name = 'JONES'
With a trivial example like this there would probably be very little difference in terms of performance of the SQL for such a simple query, but with more complex queries there could well be performance implications. For this reason it is always worth trying a few variations of the SQL for a query and examining the execution plans before deciding on a particular approach, unless they're very simple queries. Learn more about Oracle performance tuning here.
Non Correlated Sub-Queries
There are, in fact, two types of sub query: correlated and non-correlated. The example shown above is a non-correlated sub query. The difference between them is that a correlated sub query refers to a column from a table in the parent query, whereas a non-correlated sub query doesn't. This means that a non-correlated sub query is executed just once for the whole SQL statement, whereas correlated sub queries are executed once per row in the parent query.
The most common use of sub queries is in the WHERE clause of queries to define the limiting condition for the rows returned (i.e. what value(s) the rows must have to be of interest), as in the above example. However, they can also be used in other parts of the query.
Specifically, sub queries can be used:
* to define the limiting conditions for SELECT, UPDATE and DELETE statements in the following clauses:- o WHERE o HAVING o START WITH
* Instead of a table name in o INSERT statements o UPDATE statements o DELETE o statements the FROM clause of SELECT statements
* To define the set of rows to be created in the target table of a CREATE TABLE AS or INSERT INTO sql statement.
* To define the set of rows to be included by a view or a snapshot in a CREATE VIEW or CREATE SNAPSHOT statement.
* To provide the new values for the specified columns in an UPDATE statement
The first example of sub query in SQL shown above, used a simple equality expression as we were interested in only one row, but we can also use the sub query to provide a set of rows.
For example, to find the names of all employees in the same departments as Smith and Jones, we could use the following SQL statement :-
SELECT name FROM emp WHERE dept_no IN (SELECT dept_no FROM emp WHERE name = 'JONES' OR name = 'SMITH')
In fact, the original example could also return more than one row from the sub query if there were two or more people that were called Jones working in different departments. In the first example a run-time SQL error would be generated in that case, because the first example, by using '=', specified that the sub query should produce no more than one row (it is perfectly legitimate for a sub query to return no rows).
We can reverse the question to ask for the names of all the employees that are NOT in the same department as Jones, To do this, the sense of the sub query just has to be reversed by prefixing it with 'NOT' or '!'. Again depending on whether there might be more than one Jones, we would either use 'IN' or '='
SELECT name FROM emp WHERE dept_no NOT IN ( SELECT dept_no FROM emp WHERE name = 'JONES')
Or
SELECT name FROM emp WHERE dept_no != ( SELECT dept_no FROM emp WHERE name = 'JONES')
Performance issues are not thoroughly explored as that subject is covered by another series of articles Oracle Performance Tuning Sub Queries (ctd)
Nested Sub-Queries
The syntax of SQL allows queries to be nested, which means that a sub query itself can contain a sub query, enabling very complex queries to be built.
For example, the SQL statement to find the departments that have emplyess with a salary higher than the average employee salary could be written as:
SELECT name FROM dept WHERE id IN (SELECT dept_id FROM emp WHERE sal > (SELECT avg(sal)FROM emp) )
Any of the other comparison operators instead of '=' or 'IN' such as '>', or '<' can also be used with a sub query.
Sub Queries In The From Clause
The previous SQL examples used sub queries in the where clause, but sub queries can also be used in the from clause instead of a table name. In these circumstances the sub query acts as if it had been predefined as a view.
For example, the following SQL statement returns the amount of used space, the free space and the total allocated space for all tablespaces in a database.
SELECT ts.tablespace_name ,ROUND(fs.mbytes,2) "Free (Mbytes)" FROM dba_tablespaces ts ,( SELECT tablespace_name , SUM(bytes)/1024/1024 mbytes FROM dba_free_space GROUP BY tablespace_name) fs WHERE ts.tablespace_name = fs.tablespace_name Sub Queries That Return No Rows
Up until now the queries shown have all been expected to produce a result, but when creating tables, it can be very useful to write the SQL to use a sub query which will not return any rows - when just the table structure is required and not any of the data.
In the following example we create a copy of the policy table with no rows:
CREATE TABLE new_policy AS (SELECT * from policy WHERE FALSE=TRUE);
The sub query returns no data but does return the column names and data types to the 'create table' statement.
Correlated Sub-Queries
As we've seen already, there are two types of
sub query: correlated and non-correlated. We've already looked at non-correlated sub queries (in part 1). All of the examples of sub queries up until now have been non-correlated sub queries.
Just like non-correlated sub queries, correlated sub queries are used to answer multi-part questions, but they are most often used to check for existence or absence of matching records in the parent table and the related table in the sub query.
A correlated sub query refers to a column from a table in the parent query. This type of query can often be performed just as easily by a join query or a non-correlated sub query, but the SQL may be significantly faster when a correlated sub-query is used. As correlated sub queries refer to a column from their parent queries, they are executed once per row in the parent query whereas non-correlated sub queries are executed once for the whole statement.
For example, using the emp and dept tables from before, to find out which departments have no employees assigned to them, we can write the SQL statement in 3 different ways - as a non-correlated sub query, as an outer join, or as a correlated sub-query. Example 1 - non-correlated sub query
SELECT dept.name FROM dept WHERE dept.id NOT IN ( SELECT dept_id FROM emp WHERE dept_id IS NOT NULL) Example 2- outer join
SELECT dept.name FROM dept,emp WHERE emp.dept_id (+) = dept.id Example 3 - correlated sub query
SELECT dept.name FROM dept WHERE NOT EXISTS (SELECT dept_id FROM emp WHERE emp.dept_id = dept.id)
The second example is an outer join SQL statement. This in fact does more than just return the names of departments which have no employees assigned to them, it also returns the names of those departments that do have employees assigned to them. This is because an outer join returns both matching rows and the non-matching rows on one side of the join.
The first and the third SQL statements would produce exactly the same results, but the first would probably be slower than the third if the dept_id column in the emp table were indexed (depending on the sizes of the tables).
The first SQL statement can not use any indexes - the where clause of the sub query is just checking for NOT NULL rows - so a full table scan would be performed. Also the sub query would be executed once for each row in the dept table.
On the other hand, the sub query in the third example can use the index and since only the dept_id is returned by the sub query, there is no need for any subsequent table access. For these reasons, the third query would normally perform better than the first.
Summary
In summary, there are nearly always several ways in which the SQL for a query may be written, and it is therefore best to try alternative SQL statements particularly for complex queries before deciding on the preferred one. This advanced SQL tutorial has just touched on the possibilities but hopefully it has given you some ideas which you can test.
This advanced SQL tutorial also only touched on performance tuning which is a whole subject in itself.
|