1. The process of data analysis and task presentation is:
a. Retrieve data from a database through SQL or stored procedures.
b. Get data from other sources (other databases, texts, hdfs, nosql databases, http data sources, json data sources, etc).
c. Process heterogeneous data uniformly.
d. Provide data for application programs or present data with report forms.
2. Tasks processed in batches that are similar to ETL
The process is similar to task analysis and presentation. Their difference is that data of the last operation is not used for presentation, but is written to other databases or other data sources.
3. Modify the current database in batches
One way is to retrieve data from the current database, process them and write them back to the database; the other is to directly process data of the database through SQL or stored procedures.
Now let’s look at in detail some examples of the three basic operations.
A.Retrieve data from databases.
In the above figure, cell A1 has connected to a hsql database named demo. Cell A2 uses SQL statements to query table employee, which is stored in this cell, a variable, as esProc's table sequence; arg1 is a parameter from outside. Cell A3 closes database connection. Cell A4 returns query results outward. In order to make it easier for programmers to write SQL statements, table name and field name of database demo are displayed in the red box in bottom right corner of esProc's integrated development environment.
B.Write data to databases.
esProc can conveniently execute operations of add, delete and update, the simplest code is:
In the above table, insert, update and delete are respectively executed from A2 to A4. Execution of each SQL will be submitted automatically. Note that:
1. It is too frequent access to a database that three SQL statements are submitted three times.
2. There exists no transaction relation between the three SQL statements. So, if the execution of one SQL statement fails, the previous SQL statement remains unaffected.
esProc can update in batches by directly using table sequences. For example: import students’ information from students.txt to update table students1 in the database. Since there are a lot of records to be modified, submitting transactions in batches is more reasonable.
A1:Define a file object in which students’ information is saved.
A2:Import file content.
A3:Use students’ information in A2 to update table students1 in batches. Here submitting SQL in batches can avoid accessing the database too frequently. Meanwhile, this can ensure data consistency, that is, simultaneous success or fail of writing the whole batch of data to the database.
esProc can aslo deal with the complete database transaction consisting of multiple SQL statements. For example, we’ll add a new student, the student’s id should be modified to 9 after data are inserted. In order to ensure data consistency, submission must be executed after the insertion and modification are proved to be successful. Otherwise rollback should be executed.
A1:Connect to the database. Note that connect function has used @e option and the subsequent code will return error message when something wrong happens. If the option is not used, the database will terminate esProc program immediately when errors occur.
A2:Execute the insert SQL statement. Note that execute function uses @k option, meaning the transaction will not be submitted automatically after it is executed. If the option is not used, the insert SQL statement will be submitted immediately.
A3:Get the result of last operation in the database, i.e., the insert statement. If err variable is zero, the execution is successful; otherwise, err is the error code.
A4:Judging whether err variable, the execution result, is zero. If the answer is yes, the last operation of the insert statement is successful and modification in B4 can be executed.
C4:Get execution result of update SQL.
A5:Make judgment over variable err. If it is zero, submit the database; otherwise execute rollback.
A6:Close database connection.
C.Call stored procedures
For stored procedures that don’t return parameters, esProc's method of calling them is simple:
A1:Connect the database.
A2:Call the stored procedure, value of output parameter is 4.
esProc call stored procedures with result sets in this way:
Cell A2 uses proc function to call the stored procedure: orac.proc("{call proAA(?,?)}",:101:"o":a,:101:"o":b). It returns two result sets (table sequences) to form a sequence, i.e., a set of table sequence, which assigns value to A1. The following is to explain proc function’s input parameters one by one.
1) SQL strings
"{call proAA(?,?)}" contains name of the stored procedure to be called, the question marks represent SQL's parameters.
2) Output parameter 1
:101:"o":a defines an output parameter in which 101 represents that its data type is cursor ( for other types, please see appendix) and ”o” represents that it is an output parameter. a defines a variable by which returned results can be referenced.
3) Output parameter 2
:101:"o":b defines an output parameter in which 101 represents that that its data type is cursor and "o" represents that it is an output parameter. b defines a variable by which returned results can be referenced.
Cell A3 returns cell A2's first table sequence (table emp's result set).
Cell A4 and A5 use output variables a and b respectively in A2 to get the execution results corresponding to the stored procedure. a corresponds to data of table emp and assigns value to A3; b corresponds to data of table test and assigns value to A4.
Appendix: Definition of Parameter Type
Values of type are:
public final static byte DT_DEFAULT = (byte) 0; // by default, identify automatically
public final static byte DT_INT = (byte) 1;
public final static byte DT_LONG = (byte) 2;
public final static byte DT_SHORT = (byte) 3;
public final static byte DT_BIGINT = (byte) 4;
public final static byte DT_FLOAT = (byte) 5;
public final static byte DT_DOUBLE = (byte) 6;
public final static byte DT_DECIMAL = (byte) 7;
public final static byte DT_DATE = (byte) 8;
public final static byte DT_TIME = (byte) 9;
public final static byte DT_DATETIME = (byte) 10;
public final static byte DT_STRING = (byte) 11;
public final static byte DT_BOOLEAN = (byte) 12;
public final static byte DT_INT_ARR = (byte) 51;
public final static byte DT_LONG_ARR = (byte) 52;
Publicfinal static byte DT_SHORT_ARR = (byte) 53;
public final static byte DT_BIGINT_ARR = (byte) 54;
public final static byte DT_FLOAT_ARR = (byte) 55;
public final static byte DT_DOUBLE_ARR = (byte) 56;
public final static byte DT_DECIMAL_ARR = (byte) 57;
public final static byte DT_DATE_ARR = (byte) 58;
public final static byte DT_TIME_ARR = (byte) 59;
public final static byte DT_DATETIME_ARR = (byte) 60;
public final static byte DT_STRING_ARR = (byte) 61;
public final static byte DT_BYTE_ARR = (byte) 62;
public final static byte DT_CURSOR = (byte) 101;
public final static byte DT_AUTOINCREMENT = (byte) 102;
没有评论:
发表评论