2014年7月17日星期四

esProc Enables Oracle JDBC to Speed Up Retrieve Process

Recently, a performance issue occurs in a big detail report, where the original data volume in the two data tables as it involves has exceeded 3 million rows; the data volume that the report can retrieve after filtering remains at round 2 million rows. As we expected before, by using paged retrieve method, this report would present a much higher performance. However, for the fact that the user also needs to export the data from the report, the only option is that the data retrieve can be achieved from Oracle at a time. As the JDBC of Oracle runs quite slowly, it takes least 6-7 minutes to retrieve the data of more than 2 million rows from the Oracle. In general, such performance cannot meet the expectation of the customer.
      
Through great efforts we have made for a long period, this issue was finally solved by a way that the retrieve time can be cut down to dozens of seconds, thus to achieve the performance as the user requires. The solution mainly involves the use of parallel mechanism of esProc, and by multiple JDBCs, to achieve the synchronization of data retrieves.

Basic Information

Hardware:
HP ProLiant DL380 G7, CPU Intel Xeon E5620*2, RAM 20G, HDD Raid5 1T

Software:
Oracle11g64-bit, esProc, CentOS 6.0, JDK 1.6u43

Data Volumes of 2 Data Tables:
VM: 100K rows, 168 cols; and 65k rows after filtering
CT: 3 million rows, 16 cols; and 1.95 million rows after filtering

Basic Idea

By utilizing the parallel mechanism of esProc, the task of retrieving data from the two Oracle tables can be divided into a number of sub-tasks, which can be executed synchronously. And then the retrieved results are merged into one for each table. Finally, the two tables with merged results are joined to return the set of results for the report.

Implementation

Main Program Script


First, the database connection is created, in order to fetch the records from the two tables. Second, the data is segmented as per the number of sub-tasks (the screen shot shows the case of 16 sub-tasks), and the query range is assigned for a subroutine. Third, the subroutine is called by using callx to complete the query process. In the end, the main program is available to merge the results and produce a set of the end results which will then be returned to the report.

Subroutine Script

It is easy for the subroutine to complete the query process within a specified query range.

Performance Comparison


Pre-parallel (No Parallel)
Post-parallel (Parallel)
4 Parallels
8 Parallels
16 Parallels
436s
113s
81s
43s

As a result, here we see a leap from 400s to 40s in the data retrieve process.

Note:

1. What is described here covers only the part on how the data retrieve process can be expedited and how the esProc is used to generate the data source for the report in a fast way. However, the description on design and presentation of report is not mentioned here.

2. When the parallel retrieve process is used, it is required to create a connection with the database by each subtask, for example, connect(conn), which can be found in the subroutine script;  


3. Parallel retrieve process is suitable for the case where the Oracle resources are spare (e.g. the number of connections is still far below the upper limit). In other words, this feature can enable the report to make full use of the database resources.

没有评论:

发表评论