2014年7月22日星期二

The Disadvantages of SQL Computation (II)

In the above computation, we have used the keyword top. In fact there is not such a thing (it can be combined out by other computation computations) in the theory of relational algebra, and this is not the standard writing style of SQL.

Let us see how difficult it is to look for the top 10 places when there is no top.
Rough working thought: Seek out the number of members whose sales amount are higher than itself to rank the sales person, and then get the members whose places do not exceed 10, and the SQL is written as follows:
select sales
from ( select A.sales sales, A.product product,
(select count(*)+1 from sales_amount
where A.product=product AND A.amount<=amount) ranking
        from sales_amount A )
where product='AC' AND ranking<=10
or
select sales
from ( select A.sales sales, A.product product, count(*)+1 ranking
from sales_amount A, sales_amount B
where A.sales=B.sales and A.product=B.product AND A.amount<=B.amount
group by A.sales,A.product )
              where product='AC' AND ranking<=10

Professional technical personnel may not necessarily write such SQL statement well! And only the first ten places are computed.

To say the least, even if there is top, it only makes it easy to get the preceding part lightly. If we change the problem into getting the 6th place to the 10th place, or seeking the salesman whose sales amount is 10% more than that of the next one, the difficulty is still there.

The reason causing this phenomenon lies in the third important disadvantage of SQL: Lack the support of ordered set. SQL inherits the unordered set in mathematics, which directly causes the fact that the computations relating to sequence are rather difficult. And it can be imagined how common the computations relating to sequence (such as over the preceding month, over the same period last year, the first 20%, and rankings) will be.

The newly added window functions in SQL2003 standard provides some computation capabilities relating to sequence, which makes it possible to solve some problems in a relatively simple method and alleviate the problem of SQL to a certain extent. But the use of window functions is often accompanied by sub-query, and it cannot enable user to directly use the sequence number to access set member, so there are still many ordered computations that are difficult to solve. 

Now we want to pay attention to the gender proportion of the “good” salespersons that are computed out, that is, how many males and females there are respectively. Generally, the gender information is recorded in the employee table but not in the performance table, and it is simplified as follows:
employee
Employees table
name
Names of employees, suppose there is no repeated name.
gender
Genders of employees.

We have already computed out the name list of “good” salespersons, and the relatively natural idea is to seek out their genders from the employee table using name list, and count the number. But in SQL, it is necessary to use join operation to get information across tables . In this way, following the initial result, SQL will be written as:
select employee.gender,count(*)
from employee,
( ( select top 10 sales from sales_amount where product='AC' order by amount desc )
intersect
( select top 10 sales from sales_amount where product='TV' order by amount desc ) ) A
where A.sales=employee.name
group by employee.gender

With only an associated table more, it is made so over-elaborate and in reality there are rather more cross-table storages and they are often multi-layered. For example, for salespersons, there are departments where there are managers, and now we want to know by which managers these “good” salespersons are managed. Then there are three table joins, and it is indeed no easy job to write clear where and group in this computation.

This is just the fourth important disadvantage of SQL as we want to say: Lack of object reference, in relational algebra, the relations between objects completely depends on foreign key. This not only makes the efficiency very low in looking for relation, but also makes it impossible to directly treat the record pointed by foreign key as the attribute of primary record . Try thinking, can the above statement be written as this:
select sales.gender,count(*)
from (…)          // …is the SQL computing the “good” salespersons above
group by sales.gender

Evidently, this statement is not only clearer, and at the same time, the computation will also be more efficient (without join computation).

We have analyzed, through a simple example, the four important difficulties of SQL. We believe this is just the main reason why SQL fails to reach the original intention of its invention. The process of solving business problem based on a kind of computation system is in fact the process of translating business problems into formalized computation syntax (similar to the case in which a pupil solves application problem, translates the problem into formalized four arithmetic operations). Before overcoming these difficulties, SQL model system rather does not comply with people’s natural thinking habit, causing great barriers in translating problems, making it very difficult for SQL to be applied, on a large scale, in data computation for business problems.

For still another example which is easily understood by programmer, use SQL as data computation, which is similar to the case in which assembly language is used to complete four arithmetic operations. We very easily write out the calculation expression such as 3+5*7, but to use assembly language (take X86 as the example), it needs to be written as
mov ax,3
mov bx,5
mul bx,7
add ax,bx
In either writing or reading, such code is far inferior to 3+5*7 (it will be more troublesome if we come across decimal). Though it cannot be regarded as a big problem to a skilled programmer, to most people, however, this kind of writing is too hard to understand. In this sense, FORTRAN is really a great invention.


没有评论:

发表评论