Recently, a performance issue
occurs in a big detail report, where the original data volume in the two data
tables as it involves has exceeded 3 million rows; the data volume that the
report can retrieve after filtering remains at round 2 million rows. As we
expected before, by using paged retrieve method, this report would present a
much higher performance. However, for the fact that the user also needs to
export the data from the report, the only option is that the data retrieve can
be achieved from Oracle at a time. As the JDBC of Oracle runs quite slowly, it
takes least 6-7 minutes to retrieve the data of more than 2 million rows from
the Oracle. In general, such performance cannot meet the expectation of the customer.
Through great efforts we have
made for a long period, this issue was finally solved by a way that the
retrieve time can be cut down to dozens of seconds, thus to achieve the
performance as the user requires. The solution mainly involves the use of
parallel mechanism of esProc, and by multiple JDBCs, to achieve the synchronization
of data retrieves.
Basic
Information
Hardware:
HP ProLiant DL380 G7,
CPU Intel Xeon E5620*2, RAM 20G, HDD Raid5 1T
Software:
Oracle11g64-bit,
esProc, CentOS 6.0, JDK 1.6u43
Data
Volumes of 2 Data Tables:
VM:
100K rows, 168 cols; and 65k rows after filtering
CT: 3 million rows, 16
cols; and 1.95 million rows after filtering
Basic
Idea
By utilizing the parallel
mechanism of esProc, the task of retrieving data from the two Oracle tables can
be divided into a number of sub-tasks, which can be executed synchronously. And
then the retrieved results are merged into one for each table. Finally, the two
tables with merged results are joined to return the set of results for the
report.
Implementation
Main
Program Script
First, the database connection
is created, in order to fetch the records from the two tables. Second, the data
is segmented as per the number of sub-tasks (the screen shot shows the case of 16
sub-tasks), and the query range is assigned for a subroutine. Third, the
subroutine is called by using callx to complete the query process. In the end,
the main program is available to merge the results and produce a set of the end
results which will then be returned to the report.
Subroutine
Script
It is easy for the subroutine to
complete the query process within a specified query range.
Performance
Comparison
Pre-parallel (No
Parallel)
|
Post-parallel
(Parallel)
|
||
4 Parallels
|
8 Parallels
|
16 Parallels
|
|
436s
|
113s
|
81s
|
43s
|
As a result, here we see a leap
from 400s to 40s in the data retrieve process.
Note:
1. What is described here covers only the part on how the data
retrieve process can be expedited and how the esProc is used to generate the
data source for the report in a fast way. However, the description on design
and presentation of report is not mentioned here.
2. When the parallel retrieve process is used, it is required to
create a connection with the database by each subtask, for example,
connect(conn), which can be found in the subroutine script;
3. Parallel retrieve process is suitable for the case where the
Oracle resources are spare (e.g. the number of connections is still far below
the upper limit). In other words, this feature can enable the report to make full
use of the database resources.
没有评论:
发表评论