2014年8月31日星期日

Referencing Thoughts in esProc

1.Foreign key referencing

The foreign key is a common concept for various relational databases. In a relational table, you can define one or multiple fields as the foreign key. The foreign key field values can be associated with the data in another table. For example, in the game scoring report, player scores are usually recorded with the player ID. To get the details of this player from another player profile table using the player ID, you can query in the player profile table according to the player ID. In this case, the player ID field can be set as the foreign key.

The foreign key can associate two relational tables and help ensure the data consistency, so as to facilitate some cascade operations. Take the above game scoring report for example. If defining the player ID as the foreign key field, then the player name, country, age, and other data can be queried at the same time while getting the game scoring record of the player.

In this example of foreign key definition, it is actually equal to referencing the records n the player data table from the game scoring record table. The core idea of foreign key concept is to reference another record.

In esProc, unlike the database, there is no particular requirement on the data type of record fields. In other words, esProc users can get the value at will. Thanks to this convenience, in the esProc TSeq, fields can be directly assigned with record for referencing. The foreign key associating can thus be established naturally and straightforwardly.

For example, create an employee education background table comprising the employee ID field and education field. This example is designed to illustrate the way to reference the esProc foreign key, and the education field is simply set through the looping in proper order:

At this point, associate this TSeq with employee table EMPLOYEE. A foreign key field Info can be added to the education table so as to reference the records in the table EMPLOYEE with the foreign key field value:

To this point, the TSeq in B4 is as follows. In this TSeq, the Info field is the employee record on which you can double-click to query.

With the record reference, the foreign-key-related query becomes more convenient to implement. The query statement gets much more clear and intuitive. To this point, the record fields can be used in the same way of using the normal fields or records. For example, list the data of serial number, full name, education background, state, and department:
In esProc, not only the record of other TSeq but also that of the current TSeq can be assigned to the foreign key field.

In esProc, besides the function A.derive() for assign a certain field of record with the objects of another entire record when adding fields to the newly-generated TSeq; the function A.run() can be used to assign value to the existing fields. For example:

In B4, modify the employee serial number field in the Education Background TSeq to the corresponding records. In A1, the TSeq is as follows:

If the primary key of the corresponding table is a single field, you can also use the function A.switch():

The TSeq in A1 is the same to that in the previous example, but much more concise and clear. One thing to note is that the switch function can be used to have the primary keys of specified fields and TSeq associated automatically. If the primary key of TSeq is not the first field, you need to set it with the function T.primary(). For example, the STATE field of employee is associated with the STATE data table:

In this case, the State information can be further viewed in the employee data of TSeq in A1:

To this point, the query or computation for multi-table association can be further processed. For example, find the state, in which the employee is located, has a population of below 1,000,000:
As can be seen, the referencing foreign key of records can make the query syntax of multi-table association more concise and readable with an improved computing speed.

2 Referencing set

In assignment, esProc fields can be assigned with records or sets in a similar way - referencing the set in the foreign key field. In most cases, what being referenced in the foreign key fields is the RSeq composed of records from other TSeqs. Please notice that the "Foreign Key Field" is not the same thing as the foreign key field in the database. It is much more like the query to reference the sub table data from the primary table with SQL.

Like the foreign key referencing, fields can be added when generating the TSeq. With the function derive, the foreign key field can be assigned with RSeq, for example:

In A3, add foreign key field Employees to A1, and assign this field with the employees in this State, then return a new TSeq. By doing so, all values of Employees fields in the TSeq are RSeq, as shown below:

For the foreign key field with referring sets, they can be called directly as the fields of TSeq. The values of foreign key fields can be taken as the set for direct use in the computation. For example, add the field Count and compute the respective total employees in each State:

The total employees can be computed with the foreign keys, and the resulting TSeq is shown in A4:

Like referencing the foreign key field of record, the sets or RSeq are referenced in the foreign key field. In addition, the expressions on foreign key fields can also be used to solve the multi-table-associated query, filtering, and other problems. For example, find the State in which there are more than 50 employees:

By referencing sets in foreign key fields, the syntax of multi-table-associated querying can be more clear and intuitive. In the TSeq of A4, the sole purpose of introducing the field Count is to have a more clear view of the result. This field is not used when filtering records in A5.

When referencing the set in the TSeq field, the switch function cannot be used, but the function A.run() can be used to assign value to the existing fields. For example:

In A3, assign the blank field Employees with the record of employees born in each month. Once executed, the resulting TSeq in A1 is as follows:

In facts, the result of grouping the RSeq is just a sequence composed of sets. It is similar to the referencing fields of a single set. For example, if grouping by the birth month of employee, the result is:

Like the result in the above example, the result of grouping can be further used in various computations as necessary:


In A3, count the total numbers of employees who was born in each respective month. The oldest employee born in each month is listed in A4.

2014年8月28日星期四

Cursor Thoughts in esProc

With esProc, the big structured data can be handled easily, and the computation and analysis over big data can be accomplished smoothly. Because the big data table cannot be loaded into the memory all at once, the basic solution is cursor when handling the big data.

1.Basic principle of cursor
The esProc cursor is like the database cursor in the stored procedure: In fact, the cursor is a pointer. Records can be retrieved row by row according to the position in cursor. The data will not be returned in full. By using the cursor, the data retrieval can be completed progressively so as to ensure the big data table will not bring overburden pressure on system memory.

In order to facilitate the usage, esProc saves the effort to process the records one by one in the cursor. However, esProc also limits the usage of cursor. In esProc cursor,moving backward is forbidden, only moving forward is allowed, which is similar to TYPE_FORWARD_ONLY type of result set in JDBC. In this way, esProc cursor can only traverse the records in TSeq once, and the related computations are also to be completed in this traversal. This differs from the computational mode of normal TSeq greatly. Once a traversal is completed, the cursor will be closed automatically and invalidated. So, records can no longer be retrieved with it anymore.

In esProc, no matter the big data tables are from the databases or data files, they can all form cursors. Sometimes, the existing TSeq in memory can form a cursor to get involved in the cursor computation. For example:
In which, A3 holds a file cursor, A4 holds a database cursor, and A5 holds a memory RSeq cursor.

When using the cursor, the function cs.fetch() can be used to retrieve the records of one or several rows according to the position specified by the cursor to compute or debug.

2.Inheritance thoughts for cursor
When using cursor to handle the computation, the result returned by cursor is usually the records from a data table or result set. However, the data in cursor is usually not the final result we demand, and we need to process the query result.

When handling the result in the cursor, you may need to perform filtering, summarizing, sorting, and other jobs. Sometimes, you may also need to consolidate or merge the data in multiple cursors. To complete these jobs, in esProc, you can directly use a cursor to generate a new one. No need to query the data actually when generating new cursors. The actual retrieval will be performed when you actually get the data. For example, to complete a certain data processing job, use cursor A to generate new cursor B. The cursor B will use A to retrieve data, and return the retrieved data after it processes the data correspondingly. Based on the cursor B, you can generate cursor C as necessary, and further the processing of the data in B……When handling the joining and consolidating jobs, you can use multiple cursors to generate a new cursor.

In the mode of multiple inheritances, the complex data processing job can be divided into multiple steps to accomplish, with a clear purpose for each step. When the data is processed in each cursor, the records are retrieved through the original cursor. So, for the final data computing, the data retrieval is actually conducted by the original cursor. Such mode can ensure that the original data can only be traversed once. The data conflict can be avoided and the communications with databases can be reduced while ensuring the data efficiency.

In the mode of multiple inheritances, processing the data in the cursor will not incur the extra data operations. In addition, the multiple-inheritance cursor object only takes very limited amount of memory. So, this mode will not compromise the performance of accessing the external storage. Likewise, the computational performance also won't be obviously affected if inheritance hierarchy is added.

3.Basic computation of cursor
3.1.Basic usage of cursor
Once the cursor is created, the cs.fetch() function can be used to retrieve data from data table. You can also use cs.skip() function to skip several rows of records. To close the cursor before all data is traversed, you can use cs.close(). For example:

This cellset is designed to find the name of the first employee who is older than 40. Once the file cursor is generated in A2, loop in A3, and retrieve one employee record from B3 each time. If all records are traversed with the cursor, then the loop will break in C4;otherwise in C6, where the application will judge if the age of employee is over 40. If the condition is met, then exit the loop and close the cursor. In A8, you will see the computation result:

In retrieving data with a cursor, you may retrieve multiple records all at once:

For the above program, all data of text data cursor in A2 will be traversed. Retrieve 100 rows of records each time, and compute the Total number of employees above 40. The final result can be viewed in A7:
Because the cursor will be closed when the traversal over all data is completed, there is no need to use the cs.close(). Please note that the cs.fetch() always get one TSeq each time, instead of an RSeq composed of various records from a same TSeq.

3.2.Filtering on cursor

When you need to filter the desired data out from the records in the cursor, you can use function cs.select(). Thus, the above example problem can be solved with the following method:

The final result in A7 is the same as above:

Because the returned result is also the big data when filtering the big data in the cursor, the cs.select() also returns a cursor,you can perform the summarizing and other operations in the further processing. If the data returned by cursor is relatively less, then all of them can be returned at once. For example, find all female employees who are older than 40in California:

In A6, you will see the results:
In the computation, multiple inheritance of cursor is used for filtering step by step. Data can be retrieved from text data table and computed to get the required result only if they are retrieved in A6. This method almost will not affect performance.

3.3.Creating and modifying cursor

Sometimes,the data retrieved from big data table is not the final result we need, and further computation maybe required to get the new records. In such case, you can use the cs.new() function to generate a new cursor:
Likewise, data will be retrieved from text data table and computed to get the required result only if they are retrieved in A7. The result in A7 is shown below:

Besides the cs.new() used for generating new data, function cs.run() can also be used to modify the fields of records directly:

The results in A8 are shown below:

3.4 Foreign key in cursor
Sometimes, the big data table requires foreign keys to relate to data in other tables. In this case, you can use cs.switch() function, for example:

In this example, find the profiles for all female employees whose name initials are W and older than 40. In this example, A7 sets the field State as the foreign key, and references the record of TSeq in A6.The results in A8 are shown below:

2014年8月27日星期三

A Handy Method of Accessing json Data in Java

It is possible that json data is needed to be processed in Java projects. Here we'll introduce a handy method of accessing data of json files through an example. Ajson file, test.json, contains information of class, serial numbers, names, subjects and scores,etc. In this example, data will be imported, sorted by scores in descending order and written to the file test_result.json. The format of test.json is as follows:
[
    {
        "class": "Class one",
        "id": 7,
        "name": "Nicholas",
        "subject": "PE",
        "score": 60
    },
{
        "class": "Class one",
        "id": 1,
        "name": "Emily",
        "subject": "English",
        "score": 84
},
    ......

    {
        "class": "Class one",
        "id": 1,
        "name": "Emily",
        "subject": "Math",
        "score": 77
}
]

The open source project json-lib is needed to be imported into Java. To do this, the necessary packages include:
json-lib-2.4-jdk15.jar
ezmorph-1.0.6.jar
commons-lang.jar
commons-beanutils.jar
commons-logging.jar
commons-collections.jar

With the json-lib available, way of coding will be like this:
1.Import data from the file, parse them into jsonArr, the json object, using json-lib's JSONArray. 

2.Sort the values of “score” of jObject, representing all members of jsonArr object, in descending order using comparison sort algorithm. 

3.Write the sorted jsonArr to the file. 

The code is as follows: 
public static void myJson() throws Exception{
//then import data from the file
File file = new File("D:/file/test.json"); 
FileInputStreamfis = null; 
fis = new FileInputStream(file); 
InputStreamReader input = new InputStreamReader(fis); 
BufferedReader reader = new BufferedReader(input); 
String laststr = ""; 
   String tempString = null;
while ((tempString = reader.readLine()) != null) {
laststr = laststr+ tempString;
   }
reader.close();
   //then parse the imported data into json object
JSONArrayjsonArr = JSONArray.fromObject(laststr ); 
   //then sort the json data (in descending order)
JSONObjectjObject = null;
for(inti = 0;i<jsonArr.size();i++){
long l = Long.parseLong(jsonArr.getJSONObject(i).get("score").toString());
for(int j = i+1; j<jsonArr.size();j++){
longnl = Long.parseLong(jsonArr.getJSONObject(j).get("score").toString());
if(l<nl){
jObject = jsonArr.getJSONObject(j);
jsonArr.set(j, jsonArr.getJSONObject(i));
jsonArr.set(i, jObject);
}
}
}
//then write the result to the file
FileOutputStream out = new FileOutputStream("D:/file/test_result.json"); 
out.write(jsonArr.toString().getBytes()); 
out.close(); 
}

my Json function has been able to access and sort json data, but it lacks some universality. When sorting in ascending order or by several fields is required, program has to be modified. But if we want to make the function perform more universal and flexible sorting as SQL does, analysis of dynamic expressions will be required, which will result in quite complicated code. 

esProc can help Java with accessing and processing json data. It has the advantage of creating dynamic sorting expressions using simple code. It can deal with tasks about json data, like importing, accessing, computing and writing to the file, conveniently. In order to perform dynamic sorting, a sorting expression can be transferred to esProc as a parameter as follows: 


The value of parameter sortBy is score:-1. Only 5 lines of code are needed to develop a program of sorting json data in esProc. Please see below: 

A1Use read() to load data from the json file in the format of strings. (esProc's developing tool can display the computed result visually, as shown in the left part of the above figure).
A2Parse json data into a table sequence using the import@j() method.
A3Sort the data; then esProc will first compute the parameter sortBy in macro ${sortBy}. After that, the statement to be executed is A2.sort(score:-1), meaning sorting by score in descending order.
A4Parse the sorting result into strings of json format.
A5Output the stings to the file.

If the fields for and way of sorting are changed, what we need is to modify parameter sortBy instead of the program. For instance, sort by id in descending order and by score in ascending order. We may just modify sortBy into id:-1,score:1. The statement of sorting we finally execute is A2.sort(id:-1,score:1), and the result is as follows: 
This piece of esProc code can be called conveniently in Java using jdbc provided by esProc. To save the above esProc program as test.dfx file, Java will call the following code:
//create a connection between esProc and jdbc
Class.forName("com.esproc.jdbc.InternalDriver");
con= DriverManager.getConnection("jdbc:esproc:local://");
//call esProc program (the stored procedure), in which test is the file name of dfx
com.esproc.jdbc.InternalCStatementst;
st =(com.esproc.jdbc.InternalCStatement)con.prepareCall("call test(?)");
// set parameters
st.setObject(1,"id:-1,score:1");//esProc’s input parameters
// execute the esProc stored procedure
st.execute();

Here the relatively simple esProc code can be called directly in Java, so it is unnecessary to write esProc script file (like the above-mentioned test.dfx). See the following for detailed code:
st=(com. esproc.jdbc.InternalCStatement)con.createStatement();
st.executeQuery(">file(\"d:/file/test_result.json\").write(export@j(file(\"d:/file/test.json\").read().import@j().sort(score:-1)))");

The above Java code calls directly a line of esProc code, that is, import data from the text file, sort them according to specified fields and write the result to the file.

2014年8月26日星期二

A Handy Method of Accessing Excel Files in Java

Processing data from Excel files is common in Java projects. Here is an example to illustrate how to accomplish this. Read information of sales orders from an Excel file orders.xls and select orders that are placed on and after January 1, 2010 and whose SELLERID field is 18. Then write the selected data to order_result.xls

The content of Excel file orders.xls is as follows:

ORDERID
CLIENT
SELLERID
AMOUNT
ORDERDATE
1
UJRNP
17
392
2008/11/2 15:28
2
SJCH
6
4802
2008/11/9 15:28
3
UJRNP
16
13500
2008/11/5 15:28
tha4
PWQ
9
26100
2008/11/8 15:28
5
PWQ
11
4410
2008/11/12 15:28
6
HANAR
18
6174
2008/11/7 15:28
7
EGU
2
17800
2008/11/6 15:28
8
VILJX
7
2156
2008/11/9 15:28
9
JAYB
14
17400
2008/11/12 15:28
10
JAXE
19
19200
2008/11/12 15:28
11
SJCH
7
13700
2008/11/10 15:28
12
QUICK
11
21200
2008/11/13 15:28
13
HL
12
21400
2008/11/21 15:28
14
JAYB
1
7644
2008/11/16 15:28
15
MIP
16
3234
2008/11/19 15:28
16
AYWYN
4
6566
2008/11/21 15:28
Steps of coding in Java are like this:
1.Import data from the Excel file by rows and store them in sourceList, the List object.
2.Traverse sourceList, the List object, and store the eligible data in resultList, the List object.
3.Traverse resultList and store the data in the output Excel file by rows.
The code contains some statements handling different data types. See the following for detail:

   public static void myExcel() throws Exception {
      //import data from the excel file
      HSSFWorkbook workbook=
newHSSFWorkbook(new FileInputStream(new File("d:/file/orders.xls")));
      Sheet sheet=workbook.getSheetAt(0);//  import the data from the first sheet
      List<Map<String,Object>>sourceList= new ArrayList<Map<String,Object>>();
      List<Map<String,Object>>resultList= new ArrayList<Map<String,Object>>();
      SimpleDateFormat format = new SimpleDateFormat("yyyy-M-ddHH:mm:ss");
      for (inti = 1; i<sheet.getPhysicalNumberOfRows(); i++) {//process excel data by rows
         Row row=sheet.getRow(i);
         Map<String,Object> order=new HashMap<String,Object>();
         Cell cell0 = row.getCell(0);
         cell0.setCellType(Cell.CELL_TYPE_STRING);
              //integer data should be converted into the type of txt, otherwise, they would become the floating point
        Cell cell1 = row.getCell(1);
        cell1.setCellType(Cell.CELL_TYPE_STRING);
        Cell cell2 = row.getCell(2);
        cell2.setCellType(Cell.CELL_TYPE_STRING);
        order.put("ORDERID",cell0.toString());
        order.put("CLIENT",cell1.toString());
        order.put("SELLERID",cell2.toString());
        order.put("AMOUNT",row.getCell(3).toString());
        //process data of date type
        order.put("ORDERDATE",
HSSFDateUtil.getJavaDate(row.getCell(4).getNumericCellValue()));
        sourceList.add(order);
     }
     for (inti = 0, len = sourceList.size(); i<len; i++) {//filter according to the specified condition
        Map<String,Object> order =(Map<String,Object>) sourceList.get(i);                 System.out.println("1order.get(\"SELLERID\")="+order.get("SELLERID"));
        if ( Integer.parseInt(order.get("SELLERID").toString())==18 &&
        ((Date)order.get("ORDERDATE")).after(format.parse("2009-12-31 23:59:59")) )
        {//evaluatethe data’s eligibility
            resultList.add(order); //add the eligible orders to resultList, the List object
        }
     }
     //write excel file
     HSSFWorkbook workbook1 = new HSSFWorkbook();//create an excel file object
     Sheet sheet1 = workbook1.createSheet();//create a sheet object
     Row row1;
     row1 = sheet1.createRow(0);//the first row: headline
     row1.createCell(0).setCellValue("ORDERID");
     row1.createCell(1).setCellValue("CLIENT");
     row1.createCell(2).setCellValue("SELLERID");
     row1.createCell(3).setCellValue("AMOUNT");
     row1.createCell(4).setCellValue("ORDERDATE");
     for (inti = 1, len = resultList.size(); i<len; i++) {//create data rows by loop
        row1 = sheet1.createRow(i);
      row1.createCell(0).setCellValue(resultList.get(i).get("ORDERID").toString());
                            row1.createCell(1).setCellValue(resultList.get(i).get("CLIENT").toString());
                            row1.createCell(2).setCellValue(resultList.get(i).get("SELLERID").toString());
                            row1.createCell(3).setCellValue(resultList.get(i).get("AMOUNT").toString());
                            row1.createCell(4).setCellValue(format.format((Date) resultList.get(i).get("ORDERDATE")));
                   }
FileOutputStreamfos = new FileOutputStream("d:/file/orders_result.xls"); 
workbook1.write(fos);// write a file
fos.close(); 
         }

The resulting Excel data after the code is executed are as follows:

ORDERID
CLIENT
SELLERID
AMOUNT
ORDERDATE
432
ERNSH
18
6272.0
2010-1-13 15:28:05
444
SJCH
18
4312.0
2010-1-25 15:28:05
452
HP
18
4312.0
2010-2-01 15:28:05
492
HP
18
27900.0
2010-3-07 15:28:05
512
BTMMU
18
18000.0
2010-3-27 15:28:05
524
PJIPE
18
15600.0
2010-4-09 15:28:05
637
HP
18
9114.0
2010-8-01 15:28:05
638
JOPO
18
294.0
2010-8-01 15:28:05

Most of the code written with myExcel function is responsible for accessing and processing Excel files and data types. Yet it is rather complicated. There is just a small amount ofcode used for filtering based on conditions,and it is not universal enough. As filtering conditions are fixed, if there is any change about them, the code should be modified accordingly. We have to write code for analyzing dynamic expressions if we want to set dynamic filtering conditions.By doing so, we can get dynamic filtering conditions similar as the SQL theoretically. But programming would be difficult and the adjustment would be comparatively complicated.

If esProc, a programming language specially designed for processing structured (semi-structured) data, is used to help to process Excel data, programming will become much easier. With esProc, we can produce comparatively simple code to process Excel files. Besides, it is also simple for Java to call esProc script.

Programmers may pass the condition that “orders that are placed on and after January 1, 2010 and whose SELLERID field is 18” on to esProc program as the parameter where. See the chart below: 

The value of where is ORDERDATE>=date(2010,1,1) && SELLERID==18. The code ofesProc program is as follows:

A1: Define a file object, import data to it and set the first row as the headline. esProc's IDE can show the imported data visually, as shown in the right part of the above chart. The code for accessing file xlsx is =file("D:/file/orders.xlsx").importxls@xt().

A2: Filter according to the condition. Here macro is used to analyze the expression dynamically. where is the input parameter. esProc will first compute the expression enclosed by ${…}, replace ${…} with the computing result acting as the macro string value, and then interpret and execute the result. The code finally executed is =A1.select(ORDERDATE>=date(2010,1,1) && SELLERID==18).

A3: Write the eligible result set to the Excel file.

The code needn't to be rewritten when the filtering condition is changed, we just need to modify the parameter where. For example, if the filtering condition becomes "orders that are placed on and after January 1, 2010 and whose SELLERID field is 18, or orders whose CLIENT is PWQ", the value of parameter where can be expressed like this:CLIENT=="PWQ"||ORDERDATE>=date(2010,1,1) && SELLERID==18. After the code is executed, the result set of A2 is as follows:

This piece of code can be called in Java application by using JDBC provided by esProc. The code called in Java for saving the above esProc program as test.dfx is as follows:
// create a connection between esProc and JDBC
Class.forName("com.esproc.jdbc.InternalDriver");
con= DriverManager.getConnection("jdbc:esproc:local://");
//call the esProc program (the stored procedure) in which test is the file name of dfx
com.esproc.jdbc.InternalCStatementst=(com. esproc.jdbc.InternalCStatement)con.createStatement();
//set parameters
st.setObject(1,"ORDERDATE>=date(2010,1,1) && SELLERID==18 || CLIENT==\"PWQ\"");//the parameter is the dynamic filtering condition
//execute esProc stored procedure
st.execute();

For the script with relatively simple code, it is not necessary to write specially the esProc script file (test.dfx). esProc code can be written directly to the Java application thatcalls esProc JDBC.

    String where="CLIENT==\"PWQ\"||ORDERDATE>=date(2010,1,1) && SELLERID==18 ";
    String resultpath=" D:/file/orders_result.xls ";
    String sourcepath=" D:/file/orders.xls ";
    ResultSet set =     st.executeQuery("=file(\""+resultpath+"\").exportxls@t(file(\""+sourcepath+"\").importxls@t().select("+where+"))");

This piece of Java code calls a line of code from esProc script directly: get data from the Excel file, filter them according to specified conditions and write the result set to the Excel file.