2014年7月17日星期四

Two Approaches to a Complex Report

The following is a relatively complex report.Two approaches are tried and here is a record on my thoughts.

The report is as following (to protect the customer’s information assets,we simplified the table structure in the testing environment):





After the simplification, the table structure are:

order_details table, contains the following fields:
Product id, productid
Order id, orderid
Unit price, price
Sales quantity, quantity

Sales order table, orders, with fields:
Order number, orderid
Sales date, orderdate
Oder status, ordertype (1 means valid order)
Customer id, customerid

Approach 1: with complex SQL statements
The report can be implemented with the following complex SQL statement:

select b.*,a.DAY1,a.QTY1,a.DAY2,a.QTY2
from
(select t2.productid,t2.quantity as QTY1,t1.orderdate as Day1,
row_number() over (partition by t2.productid order by t1.orderdate desc) rn,
lead (orderdate) over (PARTITION BY t2.productid order by t1.orderdate desc) as Day2,
lead (t2.quantity) over (PARTITION BY t2.productid order by t1.orderdate desc) as QTY2
from orders t1,order_details t2
where t1.ordertype='1'and t2.orderid=t1.orderid 
)a,
(select t2.productid, sum(t2.quantity)as ZS, count(t2.productid) asCS, 
sum(case when t2.quantity<=50 then 1 else 0 end) as CS1, 
sum(case when t2.quantity>50 and t2.quantity<=100 then 1 else 0 end)as CS2,
sum(case when t2.quantity>100 then 1 else 0 end) as CS3 
from orders t1,order_details t2 where t1.ordertype='1'and t2.orderid=t1.orderid 
group by t2.productid)b
where a.productid=b.productid and a.rn=1

This complicated SQL statement consists of two parts: sub query a leverages window function to compute the value for last order and second last order; sub query b leverage case when function to calculate the order count for sales quantities falling into different value groups. Generally speaking the statement is complicated. After 2 or 3 months, even the developer himself/herself will need more time to understand such SQL statement.

Approach 2: Solving the problem in a step-by-step way
The above SQL is hard to read, because it’s not in accordance with the human way of thinking. Normally, as in Excel, the problem could be resolved in such way:

1.Create an empty result table with the following fields: "Product id, total sales quantity, last order date, last sales quantity, second last sales date, second last sales quantity, order count for orders with sales quantity < 50, order count for orders with sales quantity between 50 and 100 , order count for orders with sales quantity >100. "

2.Retrieve the data from sales order table and sales order details table, and associate them with order number to form an associated table.

3.Separate the associated table into groups according to different products , with the members of each group corresponding to the order details of a product. These groups can be processed one by one.

4.Each group has the same product id, and sum of the group’s sales quantity is the " total sales quantity of the product ."

5.After each group is sorted by order date ( in descending order), the first record will be last order and the second will be second last order.

6.If each group is further separated into sub sets according to the order quantity of less than 50, between 50 to 100, and more than 100, the number of record in each group is then number of orders for each category.

7.In this way, the result for each product can be computed and added to the result table. Repeat loop to compute the data for next group of products.

However, SQL is not a procedural language. It’s hard for it to implement such computation. Here we tested the step-by-step computation of esProc. Personally, I think this is more in line with human way of thinking. The script is as following:


My Thoughts: comparison of two approaches

In fact, the aboveSQL statement is not the most complex onein the project. There are ones with tens or hundreds of linesofSQL statements in the project.Normally we start the writing from simple SQL statement, and then the complex ones. After they are done we feel a great deal of achievements. However, the ongoing maintenanceand modificationis becoming a headache, even for the developer themselves. If they are maintained by others, the difficulties are doubled. This is one of the reason for added project costs.

Possible reason for this is that, to meet are query request, SQL does not work in the same way as human mind. Human beings are used to solve the problems in a step-by-step way. If the codes are in line with this habit, it will be easy to understand, maintain and modify.

esProc’s approach is no better in terms of the length of codes. However, the process is easy for understanding. This makes it much easier for future maintenance. For complicatedcomputation process this is in great advantage.

没有评论:

发表评论