2014年7月16日星期三

esProc’s Contribution to Report Development

With the constant improvement and maturing of reporting tools, the presentation features of the reports are growing stronger, supporting more and more reporting styles. At the same time, some report development aspects has gradually become bottlenecks. Two obvious ones are: 1) data preparation and computation for reports with complex business logic; 2) report development with heterogeneous and diversified data sources. A new development tool, esProc, can remove both bottlenecks easily. This will reduce the difficulty of report development and improve efficiency. 

Regarding the first bottleneck, let’s see the issue with data preparation and computation for reports with complex business logic. 
In project practice, report developers need to write some very long and complex SQL statements (or stored procedure), to prepare and compute the data for reporting. 

The main reasons are: 

1.The original database structure is normally designed for transactional system, and is not suitable for direct presentation in statistical reports. This results in very complex business logic to get report data from original data. 

2. Even for data warehouse, in which all data has been sorted and prepared to some degree, the structure is not suitable for all reports, especially some new reports needed in later stage of the projects. Usually, more complex computation is needed on the basis of data warehouse before submitting data to reporting tools for presentation. 

Report developers normally find it’s very difficult to do complex statistical computation with SQL (or stored procedures), especially computation related to sequence and sets, eg., computing for products which are top 10 revenue contributors every month, or stocks rising for five consecutive days. SQL statements for preparing such data is extremely complicated and could be tens or hundreds lines long. 

Let's look at a real life example with SQL. The following statements are written for Oracle, leveraging the window function. Other DBMS does not support the window function to the same degree as Oracle, which means the SQL statements written for them could be longer. 

Example 1: Computing for customers who are Top 10 revenue contributor in every month. 

 select Client from(
 select * from(
 select B.*,row_number() over(partition by month order by SumValuedesc)
 rown from(
 select to_char(SellDate,'mm') month,Client,sum(Quantity*Amount) SumValue
 from contract
 where SellDate>=to_date('2012-01-01','yyyy-mm-dd')
 and bSellDate<=to_date('2012-12-31','yyyy-mm-dd')
 group by to_char(SellDate,'mm'),Client order by month,client
 ) B
 ) C
 where rown<=10
 ) D
 group by Client
 having count(Client)=(
 select count(distinct(to_char(SellDate,'mm')))
 from contract
 where SellDate>=to_date('2012-01-01','yyyy-mm-dd')
 and SellDate<=to_date('2012-12-31','yyyy-mm-dd') )

Example 2: computing for stocks rising for 5 consecutive days.

 select max(number_of_consecutive_days)
 from (select count(*) number_of_consecutive_days
      from (select sum(flag_for_rising_or_falling) over(order by trading_days)                days_not_rising
        from (select trading_days,
              case when closing_price >lag(closing_price) over(order by                           trading_days)
              then 0 else 1 end flag_for_rising_or_falling
              from price_index) )
      group by days_not_rising) 

These two examples are relatively simple. For more complex business logic, there will be multiple levels of nesting, and SQL statements with many window functions, running to tens or even hundreds of lines. These complex SQL statements does not follow our natural way of thinking. They are hard to write or debug, and difficult to modify. After a period of time, even the original developer might need extra time to understand.

Moreover, complex SQL statements is normally slow to execute, leaving the end users waiting long for the report. Since it’s difficult to specify the full execution path for the database, it’s usually hard for the report developers to find a way to manually optimize these complex SQL.

As the new generation of development tools, esProc can easily resolve such bottlenecks.

First, esProc is more in line with our natural way of thinking. When used to solve the same problem, its codes are several times shorter than SQL. It’s easier to debug and maintain. The execution process of esProc can be decided by the developer, which means it’s easy to be optimized.
Also with the two examples mentioned above, let’s see what’s esProc's way of implementation:

Example 1: Computing for customers who are Top 10 revenue contributor in every month.


Example 2: computing for stocks rising for 5 consecutive days.



Meanwhile, esProc's provides standard JDBC interface, which means reporting tools can access the result-set from esProc in the same easy and seamless way as they do with databases.

The second bottleneck is the trouble of developing reports with heterogeneous and diversified data sources.

Many complex reports contains data from multiple heterogeneous databases, non-relational databases or even file system. A typical example is the payroll report: basic information about employees are from HR database; salary information is from accounting system; and attendance record is in text file.
All mainstream databases support integration of heterogeneous data sources, but in reality, these functionality does not prove to be handy. Even if they support integration of heterogeneous data sources, the performance is very bad.
Reporting tools can also be used to integrate heterogeneous data sources, but a considerable number of such tools do not support multiple data sources association. For the few of them which does support, the association feature is very limited, and might suffer from poor performance with any increase of data amount. Developers need to take care of the codes for data integration.
However, ordinary high-level languages (Java, C#) used for developing customer-defined data sources for reports lack class libraries for batch data computing. It’s very tedious to develop codes for such computation. A simple filtering summary will require tens of lines of codes, which is time-consuming, and code readability is poor. 

This bottleneck can also be easily resolved by esProc.

esProc can access all kinds of heterogeneous databases, non-relational databases and files. Its complete computing capability can easily associate multiple data sources for computing. The code length is several times, or even ten times shorter than Java.

For example, there is a "Project Payment Schedule Report", in which the data are from project information table project underlying the Project Management System (Oracle based), payment information table payProcess underlying the Accounting System (SQL Server based), and contract information table contract underlying the Contract Management System (SQL Server based) .

With traditional methods, it’s very difficult to join these three tables from three databases through project number projectNo. However, with esProc, it could be done easily. The scripts in following illustration shows that with 3 JDBC connections, oracledb, mssqlFin and msssqlCon, data from 3 databases could be retrieved for easy association with esProc's join and new functions.


Note that esProc's computing power does not depend on a database or other third-party engine. Thus regardless of the data sources, once they are retrieved into esProc, you can rely on esProc's wealth of computing functions and class libraries to do unified computing.

In summary, esProc can easily resolve the two major bottlenecks encountered in report development, and significantly improve development efficiency.

没有评论:

发表评论