Let's look the steps for report processing, which is shown below:
Figure 1：Report handling process illustration
As shown in the above figure,the report process is:
1. The user selectsparameters for report, and the report engine will interpret the report according to the report template and input parameters. Requests for data computation and retrieval are then sent to data layers.
2. The data layer is responsible for data retrieval, computation and the return of the data. Data layers are normally traditional databases: Oracle, DB2, etc.
3. The data layer pass the data result, through interfaces like JDBC, to reporting tools.
4. Reporting toolspresents to users the data they received with visual reports, according to the report template & styles required.During the whole process of report preparation, each stage in the diagram, report layer and data layer can become a performance bottleneck.
One thing worth to discuss is that in Figure 1, the stage"R2 generate visual report" normally includes some data computation process: data returned from the data layer has to be further processed to meet the report requirement.It’s especially true for those reporting tools with programming capabilities that this stage will consume some computation time.For example: the the custom scripts for BIRT, custom dataset for Runqian Report, and inter-cell computation.
Therefore, we can separate this stage, which is good for us to analyze the factors affecting the performance of the report. After being separated,the process is as below:
Figure 2 Report handling process illustration after being further separated
In Figure 2, the highlighted stagesare the main reasons for performance degradation. Obviously, the interpretation of report by the report engine, and the presentation of the report seldom become the key factor for performance issues. Instead, data computation, as well as JDBC data transmission is normally the root of such issues. Therefore, optimization on data computation or JDBC transmission is a more rewarding approach.
As we are clear with the report generation, let’s now deep dig for reasons in the highlighted stages of Figure 2, which are causing the performance issues.
I.The data layer"D1 executing data computation"stage.
In development practices, report developers often need to write very long and complex SQL statements (or stored procedure) to prepare data for the report.
Developers usually find that it’s difficult to implement complex computation with SQL (or stored procedures), especially when they are related to ordered sets. For example, when computing for products, which ranked in the monthly top 10 products lists, or stocks, whose price rise for last five consecutive days. The SQL statement to prepare data for these report are too complicated and may run for tens or even hundreds of lines.
Since it’s difficult to manually specify the full execution path for the database, the execution of complex SQL statements are often slow. It’s hard for developers to manually optimize these complex SQL statements. This is the main factor that affects the performance of "D1 executing data computation" stage.
Example: Computing for customers who rank in Top 10 revenue contributor in every month.
select Client from(
select * from(
select B.*,row_number() over(partition by month order by SumValuedesc)
selectto_char(SellDate,'mm') month,Client,sum(Quantity*Amount) SumValue
group by to_char(SellDate,'mm'),Client order by month,client ) B
group by Client
II.The data layer "D2 JDBC data transmission" stage.
The JDBC drivers from traditional mainstream database are usually less efficient when transfer data stream into Java objects. However, they cannot be bypassed. When handling large amounts of data the performance is not acceptable. Even with new versions of databases, the problem remains. Therefore, for reports against large amount of data, JDBC is often the reason that the report is becoming too slow.
III.The report layer "R2 data re-computation"stage.
1. Many reporting tools handle " R2 data re-computation" and "R3 generating visual presentation"together. The data computation stage includes lot of display attributes(font, color, cell size, etc), which will slow down the computationspeed.
3. Reporting tools do not have the ability for parallel computing. This means complex task with large amount of data cannot be split into small and separated tasks to be processed by stand-alone or multiple machines.Thusit is difficult to effectively improve computing speed.
Know the causes for low performance issues existed in these stages, we can now consider the following optimization alternatives:
I.The data layer"D1 executing data computation"stage.
To cope with the difficulties associated with complex SQL statement optimization, we can use procedural programming language. In this regard, esProc is a good choice. Its code is more in line with the natural way of thinking, and isseveral times shorter than SQL when solving same issues. On the one hand, it’s easy to develop, debug and maintain, reducing development costs; on the other hand, it’s up for the programmer to select the execution path and process, which is easier to improve computing performance.
With use esProc, it’s very easy to computefor customers who rank in Top 10 revenue contributor in every month:
II. The data layer "D2 JDBC data transmission" stage
To resolve the issue of slow data transmission with JDBC in traditional database, we can consider storing the relatively large-scale historical data in file system. It’s much faster to access data in file system through esProc, than JDBC transmission from traditional database. Although the data transmission from esProc to reporting tools is also done through JDBC, it’s much faster as esProc’s JDBC doesn’t need to convert the data stream object in any way.
Another possible solution is the use of parallel computing with several esProc subroutines to retrieve data from database through multiple JDBC connections. During actual test against Oracle we found that with 16 parallel tasks the data transmission speed can be increased by ten times. Considering the increased overhead to the database, this solution is suitable for systems where the database is less stressed.
III. The report layer "R2 data re-computation" stage
1．esProc does not include display properties in its computation. Thus the computing speed is improved .
2. esProc comes with a rich library for relational data computation, which offers great varietyof performance optimization options. Especially when used with ordered sets, programmers can effectively leverage the regularity ofbusiness data to control the execution plan and improve computing speed.
3. esProc comes with parallel computing capabilities, with which we can split large data and more complex tasks into small parallel tasks running on multiple machines. This will effectively improve the computing speed.
Note: for specific performance test measurements, please refer to esProc’s performance test reports.