SQL is invented primarily to provide a method to access structured
data in order to transparentise the physical storage scheme, so a lot of various
types of English vocabularies and syntaxes are used in SQL to reduce the
difficulty in understanding and writing it. And the relational algebra as the
basic theory of SQL is a complete computation system, which can compute everything
in principle. In terms of this, we certainly should use SQL to satisfy various
demands for data computation.
But, though relational database has achieved a huge success,
evidently SQL fails to realize its original aim of invention. Except very few
simple queries can be completed by end user using SQL, most of SQL users are
still technical personnel, and even many complex queries are no easy job for
technical personnel.
Why? We inspect the disadvantage of SQL in computation through a
very simple example.
Suppose there is a sales performance table consisting of three
fields (to simplify the problem, date information is omitted):
sales_amount
|
Sales performance table
|
sales
|
Name of salesman, suppose there is no duplicate
name.
|
product
|
Products sold
|
amount
|
Sales amount of the salesman on the
product
|
Now we want to know the name list of the salespersons whose sales
amounts rank among the top 10 places both in air-conditioners and TV sets.
This question is rather simple and people will very naturally design
out the computation process as follows:
1)
Arrange the sequence according to the sales
amount of air-conditioner and find out the top 10 places.
2) Arrange the sequence according to the sales
amount of TV and find out the top 10 places.
3) Get the intersection of the results of 1 and
2 and obtain the answer.
Now we use SQL to do
it.
1) Find out the top 10 places of the sales
amount of air-conditioner. This is very simple:
select
top 10 sales from sales_amount where product='AC' order by amount desc
2) Find out the top 10 places of the sales
amount of TV. The action is the same:
select
top 10 sales from sales_amount where product='TV' order by amount desc
3) Seek the intersection of 1 and 2. This is
somewhat troublesome, as SQL does not support computation by steps. The
computation result of the above two steps cannot be saved, and thus it is
necessary to copy it once again:
select
* from
(
select top 10 sales from sales_amount where product='AC' order by amount desc )
intersect
(
select top 10 sales from sales_amount where product='TV' order by amount desc )
A simple 3-step computation has to be written like this using SQL,
and daily computations of more than 10 steps are in great numbers. So this
evidently goes beyond the acceptability of many people.
In this way, we know the first important disadvantage of SQL: Do not support computation by steps.
Dividing complex computation into several steps can reduce the difficulty of a
problem to a great extent. On the contrary, completing many steps of
computation into one step can increase the difficulty of a problem to a great
extent.
It can be imagined that, if a teacher requires pupils to create only
one calculation formula to complete the calculation in solving application
problems, how distressed the pupils will feel (of course, there are certain
clever children who can solve the problem)!
SQL query cannot by conducted by steps, but the stored procedure
written out with SQL can operate by steps. Then, is it possible to use the
stored procedure to conveniently solve this problem?
For the time being, we just ignore how complex is the technical
environment in which the stored procedure is used (this is enough to make most
people give it up) and the incompatibility caused by differences of databases.
We only try to know theoretically whether it is possible to use SQL to make
this computation simpler and faster.
1)
Compute the top 10 places sales amount of
air-conditioners. The statement is still the same, but we need to save the
result for use by Step 3, while in SQL, it is only possible to use table to
store set data. So we need to create a temporary table:
create
temporary table x1 as
select
top 10 sales from sales_amount where product='AC' order by amount desc
2)
Compute the top 10 places of the sales
amount of TV. Similarly
create
temporary table x2 as
select
top 10 sales from sales_amount where product='TV' order by amount desc
3)
Seek the intersection, the preceding steps
are troublesome but this step is simpler.
select
* from x1 intersect x2
After the computation is done in steps, the working thought becomes
clear, but it still appears over-elaborate to use a temporary table. In the
computation of mass structured data, temporary set, as intermediate result, is
rather common. If the temporary table is created for storage in all cases, the
computation efficiency is low and it is not intuitive.
Moreover, SQL does not allow the value of a certain field to be a
set (namely temporary table), so in this way, it is impossible to implement
some computations even if we tolerate the over elaborate.
If we change the problem into computing the salespersons whose sales
amounts of all products rank among the top 10 places, try thinking how to
compute it. By continuing to use the above-mentioned working thought, it is
very easy to get the below points:
1) Group the
data according to products, arrange the sequence of each group, and get the top
10 places;
2) Get the
intersection of the top 10 places of all products;
As we do not know beforehand how many products there are, so it is
necessary to also store the grouping result in a temporary table. There is a
field in this table that needs to store the corresponding group members, which
is not supported by SQL, so the method is unfeasible.
If supported by window function (SQL2003 standard), it is possible
to change the working thought. After grouping by product, compute the number of
times each salesman appears in the top 10 places of the sales amounts of all
product category group. If the number of times is the same as the total number
of the product categories, it indicates this salesman is within the top 10
places regarding the sales amounts of all product categories.
select
sales
from
( select sales,
from ( select sales,
rank() over
(partition by product order by amount desc ) ranking
from sales_amount)
where ranking <=10 )
group
by sales
having
count(*)=(select count(distinct product) from sales_amount)
How many people can write such complex SQL?
Moreover, in many databases, the window functions are not supported.
Then, it is only possible to use the stored procedure to develop a loop, according
to the sequence, the top 10 places of each product, and seek the intersection
of the result of the preceding time. This process is not very much simpler than
using high level language to develop, and it is also necessary to cope with the
triviality of the temporary table.
Now, we know the second important disadvantage of SQL: Set-lization is not complete. Though
SQL has the concept of set, it fails to provide set as a kind of basic data
type, which makes it necessary to transform a lot of natural set computations
in thinking and writing.
没有评论:
发表评论