esProc TSeq
and SQL data table are all the structured 2-dimensional data
objects. The records, indexes, and primary keys can
all be
applied to the structured data computing. Although both of
them can be applied to the structured data computing, their application
scenarios differ distinctly. TSeq is fit for solving the complex computing
problem involving not-so-big data volume, while data table is fit for the
common computation involving possible big data volume.
Their
differences are determined by the underlying mechanism.
TSeq
is ordered. Each record and data in each column has a specific sequence number.
TSeq supports the explicit set, and set operations can be performed directly
between multiple TSeq; TSeq is also the generic set. The basic element can be
numeric value, reference, or another set.
SQL
data table does not have the above features. But SQL data table is transparent
to the memory and external storage. A consistent syntax is used to access to
the memory, external storage, or blended data.
In
the below sections, we will further discuss their differences and similarities.
Common basic functions
Both esProc TSeq and SQL data
table are the structured 2-dimensional data objects.
In other words, they are row-style 2-dimensional table based on records with one
row for one record. The 2-dimensional table together with multiple column names
forms a complete data structure. Because the structures are largely similar,
their basic usages are almost the same.
Example 1:Query the data in the object. Find the orders in which the number of Freight is greater than 100
and placed before 2013.
SQL: SELECT * FROM Orders WHERE Freight
> 100 AND OrderDate<'2013-01-01'
TSeq: =Orders.select(Freight >
100 &&OrderDate< date("'2013-01-01"))
Note:In this example, the data object
name is Orders, and another data object of Customers will be used in the
subsequent sections.
Example
2: Sort.
Sort the orders by EmployeeID in ascending order, then sort by Freight in
reverse order.
SQL: SELECT * FROM Orders ORDER BY EmployeeID
,Freight DESC
TSeq: =Orders.sort(EmployeeID,Freight:-1)
Example 3:Grouping and summarizing. Group by employee, summarize the freight charge, and count the
orders.
SQL: SELECT EmployeeID, COUNT(OrderID),
SUM(Freight)
FROM Orders GROUP
BY EmployeeID
TSeq: =
Orders.groups(EmployeeID;sum(Freight),count(OrderID))
Example 4:Join. Join the two data objects Orders and Customers to form a new data
object. Use the left join and the join field is CustomerID.
SQL: Select * from Orders left
join Customers on Orders. CustomerID =Customers.
CustomerID
TSeq: =join@1(Orders:, CustomerID; Customers:,
CustomerID)
Besides
the above several basic usages, esProc TSeq
and SQL data table are very alike in the algorithms like distinct, count, sum
up, average, maximum, and minimum, just a few example here.
Difference of being in order
Thanks to the ordered record set of
TSeq, the order-related computing can be solved easily. SQL data table lacks
the sequence number and sequence-number-related access method, making it a bit
inconvenient to handle the order-related computing.
Example 1: As for the sales data object, compute
the sales increment of this month compared with that of the previous month.
SQL:
select
salesAmount, salesMonth,
(case when
prev_price ! =0
then ((salesAmount)/prev_price)-1
else 0
end) compValue
from (select
salesMonth, salesAmount,
lag(salesAmount,1,0)
over(order by salesMonth) prev_price
from sales) t
TSeq:
sales.derive(salesAmount / salesAmount [-1]-1: compValue)
Comparison:
The monthly sales have nothing to do
with orders. It is represented as salesAmount in the TSeq and the data table. The
one related to the order is the Sales amount of previous month, which is
equivalent to the salesAmount of the previous record relative to the current record.
TSeq is ordered, sosalesAmount[-1]
can be used to represent the sales of previous month. By comparison, SQL data
table is not ordered. Since SQL2003 standard was introduced, the order-related
features are added to the window functions. It is so tedious to use the
unavoidable complex method lag(salesAmount,1,0)
over(order by salesMonth) to
compute the sales of the previous month.
TSeq can also be used to represent the
relative interval, for example, the total five months comprising the current
month, the two months before, and two months after the current month. With TSeq,
it can be represented like this: salesAmount{-2,2}. In SQL, the window function
can also be used to represent the summarizing, but much more troublesome.
Example 2: For
the sales data table, find the first 10 records with the highest sales for each
product.
SQL:
select salesMan,
product ,amount
from ( select
salesMan, product ,amount, rank() over (partition by product order by amount
desc ) ranking from sales)
where ranking
<=10
TSeq:
=
sales.group(product).(~.top(-amount;10))
Comparison:
In this example, the most intuitive thinking
pattern to implement is to group data by product, and then
perform the order-related computations in the groups. The simplest method is to
retrieve the first ten records with the highest amount in the group. The
intuitive method is to sort the data in the group in reverse order by amount,
and then get the records whose sequence numbers in the group is from 1 to 10.
TSeq supports the order-related
computation quite well. For TSeq, not only the function top can be used to
implement the first algorithm, as demonstrated in the example, but also the
sort function and the sequence number of record can be used to implement the
second algorithm, as shown below: =sales.group(product).(~.sort(Amount:-1)).(~([to(10)])).
The record in the SQL data table is not
ordered. We must compute out a sequence number or field to act as the sequence
number, such as ranking. The algorithm adopted in the above example is to
compute the data rankings in the group, and then get the records among the top
10. Obviously, SQL syntax is a bit zigzag and difficult to understand. SQL
users will have to use the window function over (partition by…… order by……) and the sub-query
which is hard to track and debug.
By comparison, TSeq is more
straightforward and simpler for easy track and debug. For another example,
programmers can firstly write =sales.group(product) code for test. This code represents
that the data will be grouped. This code snippet can run and display the result
independently. If the grouping results meet your expectation, then programmers
can proceed to add the second paragraph of codes: sort the data in the group in
reverse order, that is, .(~.sort(Amount:-1)).
In which, the "." indicates the former computing result will be taken as a
whole for further processing, "~" indicates the data in the current group, and
“-1” indicates the reverse order. The current code line is =sales.group(product).(~.sort(Amount:-1)), which can run and display the result
independently. If the grouping results meet your expectation, then programmers
can proceed to add the third paragraph of codes: retrieve the records whose sequence
numbers in the group is 1-10, i.e. .(~([to(10)])).
As can be seen, the computational
procedure of TSeq can be pushed forward step by step. In solving the complex
computing problems, the computation can be simplified to avoid errors. In
facts, the above three continuous paragraphs of code can be written into three
rows of codes step by step, so as to decompose the computing goal more clearly:
One thing worthy of noticing is that
SQL cannot perform data sorting before order-related computing,even with the
temporary table.The two steps must be combined into one step. The reason for
this is that the data table does not support the explicit set and generic set.
It is rather weak regarding its syntax representation.
In addition, although SQL uses the window
function up to the ANSI standard, the database vendor does not implement it
fully to the standard. The writing styles vary for various databases and some
databases does not provide the window function at all. The function syntax of
TSeq is independent of data source. For whatever data sources such as database,
TXT file, Excel file, and binary file, programmers are not required to modify
the code when using TSeq for computing.
Difference between explicit sets
Although SQL has the concept of set,
the explicit set is not provided, and cannot be taken as the independent
variable. The set operations can only be implemented with temporary table. TSeq
is the true explicit set for implementing the set operation.
Example:
Compute over the Contract data object. Suppose the contract with over 40 order
quantity is the big contract, and the unit price over 2000 is the important
contract. Please find the contract settled in this year which are
both the big contract and the important contract, and all other contracts except these contracts.
SQL:
select
SellDate,Quantity,Amount,Client from Contract where
to_char(SellDate,'yyyy')='2014' and quantity>=40 and AMOUNT>=2000
select SellDate,Quantity,Amount,Client
from Contract where not(to_char(SellDate,'yyyy')='2014' and
quantity>=40 and AMOUNT>=2000)
TSeq:
=big= Contract.select(Quantity>40)
=importance = Contract.select(AMOUNT>2000)
=answer=thieYear^big^
importance
Comparison
and Analysis:
The typical natural thinking
pattern is to find the contracts that are big contract and also the
important contract of this year. The intersection operation is most intuitive.
If the big contract is defined as big,
the important contract is defined as importance,
and the contract in the current year is defined as thisYear, then we can write the pseudocode easily:
big∩importance∩thisYear. TSeq is the explicit set capable of representing the
equivalent expression very intuitively, i.e. thieYear^big^importance. SQL does not allow for representing
it with the variables for set. So, we have to find another way, for example,
converting it to the Boolean condition, as demonstrated in the example: to_char(SellDate,'yyyy')='2014' and
quantity>=40 and AMOUNT>=2000.
It
is easy to solve the first problem because the respective development
difficulty with these two tools are not great. With the further research into the
problems, their differences become evident.
Problem
2: “Other contacts except for those big and
important contracts” - it is also a typical thinking pattern - the computing
goal can be achieved with difference set in one step. TSeq expression:Contract\answer, very
intuitive. Using Boolean conditions, SQL can also be used to compute out the
answer. But the expression writing style and business description vary greatly,
as shown below: not(to_char(SellDate,'yyyy')='2014'
and quantity>=40 and AMOUNT>=2000).
With
SQL, we can solve it using the set operation, and the algorithm is also quite
intuitive. However, the code may appear very lengthy since the data table
cannot be represented with the set variables.
(select SellDate,Quantity,Amount,Client
from Contract)
minus
(Select select
SellDate,Quantity,Amount,Client from Contract from(
(select
SellDate,Quantity,Amount,Client from Contract where to_char(SellDate,'yyyy')='2012')
Intersect
(select
SellDate,Quantity,Amount,Client from Contract where quantity>=40)
Intersect
(select
SellDate,Quantity,Amount,Client from Contract where AMOUNT>=2000))
Because
the code is lengthy, many people would rather use Boolean condition to
implement the set operations indirectly.
Undoubtedly,
in most cases, it is more convenient for data table to use operation set than
Boolean condition, for example, the set operations between multiple physical
tables, or the set operations between multi-level sub-queries. In this case,
the cost would be high if converting the set operation to the Boolean
conditions, and programmers have to adopt the lengthy set operation.
Difference between generic sets
TSeq
is the generic set to store both the physic data and the reference to the
associated data. With this advantage, TSeq can implement the associated
computing through the intuitive object reference. By comparison, the data table
can only be used to store the physical data. The equivalent computation can
only be completed using the associated complex statements.
Example:Please compute which
department managers received the President Award, and of
these managers, which subordinates are the annual outstanding employees.
This involves two data object:department and employee, in which, the deptName
field of department and the empDept field of employee is in the one-to-one
relation. Still, the manager field in department and the empName field in
employee is also the one-to-one relation. In addition, the code for president
award is PA; and the code for annual outstanding employee is EOY.
Data
table:
SELECT
A.*
FROM
employee A,department B,employee C
WHERE
A.empDept=B.deptName AND B.manager=C.empName AND
A.empHonor=‘EOY’ AND C.empHornor=‘PA’
TSeq:
employee.select(empHonor: "EOY",empDept.manager.empHornor:"PA")
Comparison and Analysis:
The
SQL solution can definitely work out.But the relevant associating statement is
fairly complex that the average programmer could feel hard to comprehend. By
comparison, TSeq solution is rather intuitive, empHonor:"EOY"is one of the
conditions: “Who are Year Outstanding Employees”, while empDept.manager.empHornor indicates the “the department.the
department manager.awards received by the department manager”. Obviously, if
this value is PA, then the condition 2 in the problem is met: "Department managers who received
the president award". This is the object
reference by definition.
The
object reference enables programmers to use "." operator to make reference to
the related data. By this way, the business association relation can be
translated into the computer language intuitively, so that multilevel relations
can be represented conveniently, and the associating computation can be
performed intuitively.
Differences regarding Memory and
External Storage Transparencies
SQL
Because
SQL data table does not support the generic and set data,no data will be lost
when writing the data from memory into the external storage. Thus, it is
transparent to the computation over memory and external storage. On accessing
the data table for the first time, data are usually from the external storage;
then, for the subsequent accesses to the same data table, data can be from the
memory cache; for the data table with relatively large volume of data, part of
the data can be from the external storage, and part from the memory. No matter
the data is from memory or external storage, and data volume is big or small,
the syntax for retrieving data table is always the same, and programmers need
not to write different SQL statements for it.
TSeq
TSeq
supports the generic type (reference in particular) and set data. Data could
lost when writing the data from memory to the external storage, and can not always
ensure to be loaded properly, which making the computation intransparent to the
memory or external storage. TSeq is the pure memory data object, and is only
capable of handling limited data volume; If the data volume is relatively big,
then cursor (another data object of esProc) should be used for the computation
in external storage, with different syntaxes for the cursor and the TSeq; esProc
programmers also must convert the data between cursor and TSeq if they want to
improve performance or handle the computation involving complex business
logics.
Comparisons
TSeq
is not transparent to the memory and external storage. esProc programmers need to
write different codes to meet the requirement of computing in memory, external
storage, or mixed computing. In addition, they need to modify the codes to meet
the needs of data volume increase. Therefore, the workload for the initial
design and post-maintenance is relatively great. SQL data table is transparent
to the memory and external storage. Programmers only need to write one set of
code to fit various scale of data. The designing and maintenance workloads are
relatively small.
Through
the above comparison, we can conclude: Characterized with the ordered data,
explicit set, and generic set, TSeq can easily solve the order-related complex
problem and reduce the complexity of set operations, with support for the
intuitive object reference to handle the complex multi-table association. SQL
data table is transparent to the memory and external storage, and SQL code is commoner.
没有评论:
发表评论