To date, the reporting tools have been proven
and developed pretty well. We have embraced a lot of emerging tools like Qlikview,
Tableau, Spotfire, and still have the choices of classic tools like Sap Visual
Intelligence, JasperReport, and Cognos. However, in developing the report,
there is still a key obstacle – data source computation.
The
goal of data source computation is to compute out the convenient and usable
result set for report. SQL/SP is the traditional method to perform the data
source computation. But it is a bit difficult for SQL/SP user to design, debug,
and migrate.
What’s worse, with regard to the computation with various databases and
non-database data sources, this traditional method becomes useless, let alone
the JasperReport and those tools that is restricted to use a single data source
only. To override the last obstacle, we need to employ some other techniques,
and the typical ones include data warehouse, data source computation layer, semantic
layer, table-joining, and API programing. I would like to share my viewpoints
on the respective pros and cons of these methods according to my personal
experience.
Data Warehouse. As a way to build a single consistent data view with ETL, data
warehouse offers lots of relevant tools that is already proven and
well-developed, for example, Teradata Fastload, DB2 data warehouse, and Oracle
Warehouse Builder. The data warehouse has many advantages: cross-database
computation, non-database computation, historical data tracking, high
performance, and low pressure on business database.
The
major drawback of data warehouse is the high cost that mainly resulting from the
building and maintaining, but not the product purchase. The data warehouse
requires the double expert who is skillful in both IT and business. They must be
excellent in Perl/VB Script/JAVA/SQL/SP and business logics. The data warehouse
users have to perform the complicated data splitting and consolidating, design
the norm form and the algorithm for incremental update, and prioritize the
task-scheduling across multiple levels. To keep the data warehouse efficient
and usable, more than one ETL processes are required. The data warehouse may
easily get failed due to the uncontrollable cost.
Data source computation layer. As an emerging method in recent years, the data
source computation layer tries to root out the problem in data source
computation through a common method. With the computational scripts
specially-designed for the structured data, the data source computation layer
supports data retrieval from various databases and non-database data sources,
and even the interactive computations between heterogeneous data sources. Plus,
it offers the professional development tool and perfect debugging tool to
reduce the difficulty of complex computation. All in all, this method is more
efficient than SQL/SP is. As for the portability, this method is better because
it is independent of any specific database or report by nature.
esProc
and R are the typical data source computation layers. esProc can output the
result via JDBC for direct referencing by report. But esProc does not support
the correlativity analysis and other models. Contrarily, it is awkward to
invoke R in the report, though R supports various fixed analysis models.
Typically,
the full memory computation is adopted in this method. So, its performance on standalone
machine is poorer than that of the data warehouse. However, both esProc and R
are excellent in handling the distributed computation because they all support
the big data computation model like Hadoop.
Semantic Layer. The semantic layer was introduced when the data warehouse becomes disappointing.
The semantic layer redefines the physical data model used to be quite difficult
to understand. Regarding the semantic layer, the database\text\Excel\XML will
not be differentiated strictly. So report designers can operate the data in a more
user-friendly way. Sap Universe and Qlikview are the most characteristic ones.
Both
the semantic layer and the data warehouse are designed to build an easy-to-use
data source, and thus their advantages are basically the same. However, building
the semantic layer consumes fewer resources and less time because it requires
less procedures and comparatively more agile. It is established and maintained
at the cost higher than the cost of data warehouse but less than that of data
source computation layer.
The
drawback of semantic layer is mainly the insufficient computational capability.
This layer works with the wizard as the core, complemented with some elementary
functions of data source consolidation and computation column. However, it
lacks the special-tailored structured data computational script. Therefore, the
semantic layer can hardly perform the data source computation independently.
Additional assistance from simple data warehouse or ETL is often necessary. The
full memory computation is also weak in performance. Lastly, the semantic layer
is often bundled with reporting tools and thus poor in portability.
Table-joining. This is one of the basic
functions of semantic layer. In many reporting tools, it is defined separately
as a simple way to solve the multi-data-source problems for the single-source
report. For example, the JasperReport and the Birt only support a single data
set. If the data from Oracle and MSSQL are to be represented in a single report
at the same time, then you can take advantage of their table-joining
functionality.
The
table-joining is superior to other methods for the lowest workload and human
resource costs. But, there are comparatively more feature restrictions. These
restrictions include: only the easiest joining can be performed, the normal
data source computation is not allowed; insufficient support for the
non-database data source; the bundled reporting tools and database are not easy
to migrate; The performance is quite low due to the full memory computation;
API Programing. It is the
reporting tool API to compute the data source programmatically, and return the
result set ultimately. Most reporting tools provide such methods as CDS frame
of SAP, and Bean data source of Jasper.
This
method has the utmost flexibility, capable of solving any data source computation
problem theoretically. Among all these methods, this method involves the
greatest development complexity.
In
order to access the non-database data source, Java and other senior languages
is generally adopted in this method. These languages may require chunks of code
to implement a simple algorithm, which is unlike the languages designed for data
computation, such as SQL, esProc, or R. So, even a simple algorithm could be
quite difficult to implement and require great workload. As a matter of fact,
its existence value largely lies in its role as a complementary method to the semantic
layer or table-joining.
In
conclusion, the above 5 methods are fit for their respective scenarios: the
table-joining is fit for the simple joining operations between databases; the
API programing is fit for the simple non-database computations; the data source
computation layer is fit for the computation requiring great portability or the
complex data environment; the data warehouse is fit for the LAN environment
with great performance demand; and the semantic layer is fit for those business
personnel processing massive amount of simple reports.
没有评论:
发表评论