1.Basic principle of cursor
The esProc cursor is like the database cursor in the stored procedure: In fact, the cursor is a pointer. Records can be retrieved row by row according to the position in cursor. The data will not be returned in full. By using the cursor, the data retrieval can be completed progressively so as to ensure the big data table will not bring overburden pressure on system memory.
In order to facilitate the usage, esProc saves the effort to process the records one by one in the cursor. However, esProc also limits the usage of cursor. In esProc cursor,moving backward is forbidden, only moving forward is allowed, which is similar to TYPE_FORWARD_ONLY type of result set in JDBC. In this way, esProc cursor can only traverse the records in TSeq once, and the related computations are also to be completed in this traversal. This differs from the computational mode of normal TSeq greatly. Once a traversal is completed, the cursor will be closed automatically and invalidated. So, records can no longer be retrieved with it anymore.
In esProc, no matter the big data tables are from the databases or data files, they can all form cursors. Sometimes, the existing TSeq in memory can form a cursor to get involved in the cursor computation. For example:
In which,
A3 holds a file cursor, A4 holds a database cursor,
and A5 holds a memory RSeq cursor.
When using
the cursor, the function cs.fetch() can be used
to retrieve the records of one or several rows according to the position
specified by the cursor to compute or debug.
2.Inheritance
thoughts for cursor
When using
cursor to handle the computation, the result returned by cursor is usually the
records from a data table or result set. However, the data in cursor is usually
not the final result we demand, and we need to process the query result.
When
handling the result in the cursor, you may need to perform filtering,
summarizing, sorting, and other jobs. Sometimes, you may also need to
consolidate or merge the data in multiple cursors. To
complete these jobs, in esProc, you can directly use a cursor to generate a new
one. No need to query the data actually when
generating new cursors. The actual retrieval will be performed when you
actually get the data. For example, to complete a certain data
processing job, use cursor A to generate new cursor
B. The cursor B will use A to retrieve data, and return the retrieved data
after it processes the data correspondingly. Based on the cursor B, you can
generate cursor C as necessary, and further the processing of the data in B……When
handling the joining and consolidating jobs, you can use multiple cursors to
generate a new cursor.
In the
mode of multiple inheritances, the complex data processing
job can be divided into multiple steps to accomplish, with a clear purpose for
each step. When the data is processed in each cursor, the records are
retrieved through the original cursor. So, for the final data computing, the
data retrieval is actually conducted by the original cursor. Such mode can
ensure that the original data can only be traversed once.
The data conflict can be avoided and the communications with databases can be
reduced while ensuring the data efficiency.
In the mode
of multiple inheritances, processing the data in the cursor will not incur the
extra data operations. In addition, the multiple-inheritance cursor object only
takes very limited amount of memory. So, this mode
will not compromise the performance of accessing the external storage.
Likewise, the computational performance also won't be obviously affected if inheritance
hierarchy is added.
3.Basic computation
of cursor
3.1.Basic usage of
cursor
Once the cursor is
created, the cs.fetch() function
can be used to retrieve data from data table. You can also use cs.skip() function to skip several rows of records. To close the cursor
before all data is traversed, you can use cs.close(). For example:
This cellset is
designed to find the name of the first employee who is older than 40. Once the file cursor is generated in A2, loop in A3, and
retrieve one employee record from B3 each time. If all records are traversed
with the cursor, then the loop will break in C4;otherwise in C6, where the
application will judge if the age of employee is over 40. If the condition is
met, then exit the loop and close the cursor. In A8, you will see the
computation result:
In retrieving data
with a cursor, you may retrieve multiple records all at once:
For the above
program, all data of text data cursor in A2 will be
traversed. Retrieve 100 rows of records each time, and compute the Total number of employees above 40. The final result can be viewed in A7:
Because
the cursor will be closed when the traversal over all data is completed, there
is no need to use the cs.close(). Please note that
the cs.fetch() always get
one TSeq each time, instead of an RSeq composed of various records from a same
TSeq.
3.2.Filtering on cursor
When you need to
filter the desired data out from the records in the cursor, you can use
function cs.select(). Thus, the
above example problem can be solved with the following method:
The final result in
A7 is the same as above:
Because the returned
result is also the big data when filtering the big data in the cursor, the cs.select() also
returns a cursor,you can perform the summarizing and other operations in the
further processing. If the data returned by cursor is relatively less, then all
of them can be returned at once. For example, find all female employees who are older than 40in California:
In
A6, you will see the results:
In the computation,
multiple inheritance of cursor is used for filtering step by step. Data can be
retrieved from text data table and computed to get the required result only if
they are retrieved in A6. This method almost will not
affect performance.
3.3.Creating and
modifying cursor
Sometimes,the data
retrieved from big data table is not the final result we need, and further
computation maybe required to get the new records. In such case, you can use
the cs.new() function to
generate a new cursor:
Likewise, data will
be retrieved from text data table and computed to get the required result only
if they are retrieved in A7. The result in A7 is
shown below:
Besides the cs.new() used for generating new data, function cs.run() can also be used to modify the fields of records directly:
The results in A8 are shown below:
3.4 Foreign key in cursor
Sometimes, the big
data table requires foreign keys to relate to data in other tables. In this case,
you can use cs.switch() function,
for example:In this example, find the profiles for all female employees whose name initials are W and older than 40. In this example, A7 sets the field State as the foreign key, and references the record of TSeq in A6.The results in A8 are shown below:
没有评论:
发表评论