We often encounter the situation that
requires text file data processing. Here we'll look at how to execute
conditioned filtering in text files with Java through an example: read employee
information from text file employee.txt
and select female employees who were born on and after January 1, 1981.
The text file employee.txt is in a format as follows:
EID NAME SURNAME GENDER STATE BIRTHDAY HIREDATE DEPT SALARY
1 Rebecca Moore F California 1974-11-20 2005-03-11 R&D 7000
2 Ashley Wilson F New
York 1980-07-19 2008-03-16 Finance 11000
3 Rachel Johnson F New
Mexico 1970-12-17 2010-12-01 Sales 9000
4 Emily Smith F Texas 1985-03-07 2006-08-15 HR 7000
5 Ashley Smith F Texas 1975-05-13 2004-07-30 R&D 16000
6 Matthew Johnson M California 1984-07-07 2005-07-07 Sales 11000
7 Alexis Smith F Illinois 1972-08-16 2002-08-16 Sales 9000
8 Megan Wilson F California 1979-04-19 1984-04-19 Marketing 11000
9 Victoria Davis F Texas 1983-12-07 2009-12-07 HR 3000
10 Ryan Johnson M Pennsylvania 1976-03-12 2006-03-12 R&D 13000
11 Jacob Moore M Texas 1974-12-16 2004-12-16 Sales 12000
12 Jessica Davis F New
York 1980-09-11 2008-09-11 Sales 7000
13 Daniel Davis M Florida 1982-05-14 2010-05-14 Finance 10000
…
Java's way of code writing is that it
reads data from the file by rows, save them in the List objects, traverse List
objects, and savethe eligible records in the resultingList objects. Lastly, print out the number of eligible employees.
Detailed code is as follows:
public
static void myFilter() throws Exception{
File
file = new File("D:\\employee.txt");
FileInputStream
fis = null;
fis
= new FileInputStream(file);
InputStreamReader
input = new InputStreamReader(fis);
BufferedReader
br = new BufferedReader(input);
String
line = null;
String
info[] = null;
List
sourceList= new ArrayList();
List
resultList= new ArrayList();
if
((line = br.readLine())== null) return;//skip the first line, exit if the file
is null
while((line
= br.readLine())!= null){ //import to the memory from the file
info
= line.split("\t");
Map<String,String>
emp=new HashMap<String,String>();
emp.put("EID",info[0]);
emp.put("NAME",info[1]);
emp.put("SURNAME",info[2]);
emp.put("GENDER",info[3]);
emp.put("STATE",info[4]);
emp.put("BIRTHDAY",info[5]);
sourceList.add(emp);
}
for
(int i = 0, len = sourceList.size(); i < len; i++) {//process data by rows
Map<String,String> emp =(Map)
sourceList.get(i);
SimpleDateFormat
sdf = new SimpleDateFormat("yyyy-MM-dd");
if
( emp.get("GENDER").equals("F") &&
!sdf.parse(emp.get("BIRTHDAY")).before(sdf.parse("1981-01-01"))
)
{ //save the eligible records
in List objects using the conditional
statement
resultList.add(emp);
}
}
System.out.println("count="+resultList.size());//print
out the number of eligible employees
}
The filtering condition of this function is
fixed. If the condition is changed, the conditional statement in the program
should be modified accordingly. Multiple pieces of code are needed if there are
multiple conditions, and the program lacks the ability to handle the provisional,
dynamic conditions. Now we'll rewrite the code and make it universal in some
degree by slightly changing the loop of traversing sourceList:
for (int i
= 0, len = sourceList.size(); i < len; i++) {
Map<String,String>
emp =(Map) sourceList.get(i);
SimpleDateFormat
sdf = new SimpleDateFormat("yyyy-MM-dd");
boolean
isRight = true;
if
(gender!=null && !emp.get("GENDER").equals(gender)){//process
the condition of gender
isRight
= false;
}
if
(start!=null && sdf.parse(emp.get("BIRTHDAY")).before(start)
){//process the starting conditionof BIRTHDAY
isRight
= false;
}
if
(end!=null && sdf.parse(emp.get("BIRTHDAY")).after(end) ){//process
the end condition of BIRTHDAY
isRight
= false;;
}
if
(isRight) resultList.add(emp);//save the eligible records in the resulting list
}
In the rewritten code, gender, start and end are input parameters of the function myFilter. The program can manage
situations that GENDER field equals the input value gender, BIRTHDAY field is greater than or equal to the input value start as well as less than or equal to
the input value end. If any of the
input values is null, the condition will be ignored. Conditions are joined by
AND.
If we want to make myFiltera
more universal function, for example, join conditions with OR or allow
computation between fields, the code will become more complicated, requiring program
for analyzing and evaluating dynamic expressions. This type of program can be
as flexible and universal as database SQL, but it is really difficult to
develop.
In view of this, we
can turn to esProc to assist with this task. esProc is a programming language
designed for processing structured (semi-structured) data. It is quite easy for
it to perform the above universal query task and can integrate with Java
seamlessly so that Java can access and process text file data as flexibly as
SQL does.
For example, to query
female employees who were born on and after January 1, 1981, esProc can import
from external an input parameter "where" as the dynamic condition, see the
following chart:
The value of "where"is:BIRTHDAY>=date(1981,1,1) &&
GENDER=="F". esProc needs only three lines of code as follows:
A1: Define a file object
and import data to it. The first row is the headline with tab as the field separator
by default. esProc’s IDE can visually display the imported data, as shown on
the right of the above chart.
A2: Filter according to
the condition. Here macro is used to analyze the expression dynamically. “where”
is the input parameter. esProc will first compute the expression enclosed by ${…},
then replace ${…} with the computed result acting as macro string value and interpret
and execute the result. In this example, the code we finally execute is =A1.select(BIRTHDAY>=date(1981,1,1)
&& GENDER=="F").
A3: Return the eligible
result set to the external program.
When the filtering condition changes, we just need to change the parameter “where”without rewriting the code. For example, the condition is modified into querying female employees who were born on and after January 1, 1981,or records of employees whose NAME+SURNAME equals “RebeccaMoore”. The code for where’s parameter value can be like this: BIRTHDAY>=date(1981,1,1) && GENDER=="F" || NAME+SURNAME=="RebeccaMoore". After execution, the result set in A2 is shown in the following chart:
Finally, call this
piece of esProc code with Java to get the filtering result by using jdbc provided by esProc. The code called
by Java for saving the above esProc code as
test.dfx file is as follows:
// create
esProcjdbcconnection
Class.forName("com.esproc.jdbc.InternalDriver");
con=
DriverManager.getConnection("jdbc:esproc:local://");
//call
esProc program (the stored procedure) in which test is the file name of dfx
st =(com.esproc.jdbc.InternalCStatement)con.prepareCall("call
test(?)");
//set
parameters
st.setObject(1,"
BIRTHDAY>=date(1981,1,1) && GENDER==\"F\" ||NAME+SURNAME==\"RebeccaMoore\"");//the
parameter is the dynamic filtering condition
// execute
esProc stored procedure
st.execute();
//get the result
set: a set of eligible employees
ResultSet
set = st.getResultSet();
When writing script of relatively simple code, we may write the esProc
code directly into Java code that calls the esProc JDBC. This can save us from having
to write the esProc script file (test.dfx):
st=(com.
esproc.jdbc.InternalCStatement)con.createStatement();
ResultSet set=st.executeQuery("=file(\"D:\\\\esProc\\\\employee.txt\").import@t().select(BIRTHDAY>=date(1981,1,1)&&GENDER==\"F\" ||
NAME+SURNAME==\"RebeccaMoore\")");
This piece of Java code directly calls a line of code from esProc script: get data from the text file, filter them according to the specified condition and return the result set toset, the ResultSet object.
没有评论:
发表评论