Data Input / Output and JUnit
[Practical 8 of 11 - Part 2]
How To Begin:
When starting to satisfy the test conditions the first thing to be considered is the order
of code execution. Figure 19 shows the order of execution for the CSVReader class.
Code execution enters the CSVReader class from a calling class first through the
dataLoad method. The dataLoad method in turn makes several calls to the
parseLine method to parse the fields out of each line of the input file. Finally, once
the file processing has been completed the calling class makes a call directly to the getData
method to retrieve the data.
When deciding where to start with satisfying the test conditions the method dependency dictates the code
creation order. The call to getData happens after the other methods have occured, so this method
is dependent on the other two methods to return successfully. Therefore, the getData test conditions
will be the last to be satisfied.
The loadData method is dependent on calls to the parseLine method returning correctly.
However, the parseLine method is not dependant on any other method in our test scope. Therefore, we
first get the test conditions for parseLine to be satisfied and then loadData and finally
getData. We work up through the method dependencies.
Specifying Delimiters:
Before we do anything with the code for satisfying the test conditions we need to be able to specify
the delimiters for our file. Because the CSVReader and CSVWriter both need
to have this functionality the code is placed in the super class that both of these
inherit or extend from, CSV.
We want delimiters to be specified from a list of valid character selections, but nothing outside of this
list. We could make the delimiter a String data type with no other limits but that would
allow anything to be entered, and the behaviour we desire is to have a selection list from which users choose.
This could be achieved through the use of constants (declaring a variable final static meaning
it cannot be changed and can be accessed at the class level. Anything declared final cannot
be altered after declaration so a final method cannot be overridden). This approach would be
messy and difficult to control.
Fortunately for us Java provides a data type that we can specify very rigidly and very elegantly. It is called
an enum.
Insert the code shown in Figure 20 into your CSV class. Although it looks quite complicated
most of this code you will be relatively familiar with. Lines 36 through to 53 provide a private
variable to store the current set delimiter which is instantiated to be a Delimiter type and
a COMMA type of the Delimiter data type (we will explain this more in a moment) and
two mutator methods to get and set the delimiter.
The real magic is happening between lines 13 and 34. To start with think about an enum as a
small class within a class. All it does is store a specific set of valid values and any variable declared as
this particular enum has to be one of those values. It is a way to enforce strict data types, similar
to domains in SQL if you are familiar with those.
Lines 22 to 25 provide a constructor for the enum, this can only be called from
within the enum declaration. The lines at 29 and 33 provide variables to hold and provide access to
the values in this particular enum.
lines 19 and 20 specify the list of valid enum types, by making calls to the constructor. The constructor
can be called here because it is inside the enum declaration. To create a comma Delimiter
the code Delimiter.COMMA would be used, as it is in setting the default value on line 37.
To access the delimiter character sequence for a hammer delimiter
the code Delimiter.HAMMER.delimiter would be used, and to get the description
Delimiter.HAMMER.description. These calls would return '¬' and 'hammer' respectively.
Implementing parseLine:
Now that the delimiter can be specified the parseLine method can be written. Lets start by
using the inbuilt functionality of the String class. Type the line
return line.split(this.getDelimiter().delimiter); into the parseLine method as
shown in Figure 21.
Run the tests as you did at the end of part 1, right click the file and select Test File. All of the tests conditions are still failing. Lets have a closer look at why.
We can see the output from our tests in Figure 22. Note: you can toggle the view of failed and passed tests and those that resulted in an unexpected error by using the three buttons (green tick, yellow exclamation and red exclamation)
By clicking on the little arrows at the side of the tests we can
see more details about why the test failed. We can see that the testParseLine failed because of a space
before the character 2, the square brackets highlight the character causing the discrepancy. The test
testParseLineEmptyValues failed because of the length of the two arrays different, expected = 7 and
actual = 5!
Lets have a closer look at what is happening. To do this we are going to use functionality in the IDE to debug the code line by line. First left click in the margin of the text editor to add a breakpoint. This is a line of code where the execution will stop and wait when we run in debug mode. Figure 23 shows the location of the breakpoint.
Next right click on the CSVReader class in the project area and select Debug
Test File. When the code execution reaches your breakpoint it will turn green as shown
in Figure 24.
At the bottom of the screen in the output area you will have a variables tab which is active. where
it says <Enter new watch> type the line of code that we are interested in seeing
the return values for line.split(this.getDelimiter().delimiter) and press return.
A watch expression is a way of viewing what is in a variable of method return during the debug process. You can add, alter and delete watch expressions as you see fit to help you understand what your code is doing.
Clicking on the arrow to the left of the new watch expression expands it showing all of the elements in
the array returned by the split comment, Figure 25. We can see that the split is returning leading and
trailing spaces for the values between delimiters, the spaces are in the original input line, shown at
the bottom of the variables display.
We need to chop the extra spaces from the beginning and end of each value. There is a function to help us
do just that in the String class. First we need to exit the debugging session.
There are a series of debug buttons that help you step through the code line by line once you have hit a breakpoint, explained in Figure 26.
Stop debugging by clicking the stop debugging button. Left click in the margin on your breakpoint and that will also be removed.
This is a very brief introduction to debugging code there are more in-depth tutorials on-line such as this one from netbeans.org.
To remove the leading and trailing white space we need to use the trim in the
String class. Adjust your code in the parseLine method to reflect
that in Figure 27.
The comments above each block of code explains what is happening.
Re-run your tests, you should now see that one of them has passed, Figure 28.
The second parseLine test is still failing with the difference in size of array returned.
The test clearly passes in 6 commas in the line to be parsed, String line = ",,val1, ,2,,";.
Lets look at the documentation for the String.split method to see what its behaviour is.
Go to the parseLine method and highlight the split method call and then select
Show Documentation from the Source menu. You will see the documentation
for the split method appear on screen as in Figure 29.
The documentation states that this method will remove trailing empty strings, which is the problem. The last two commas in the input line have no value following them and therefore they are not being returned because they equate to empty strings.
The documentation does also say that the method is overloaded and there is a two argument implementation which has extended behaviour. Click on the link to the two-argument split.
Note: full documentation for Java SE 7 can be found on the Oracle Java home pages. Documentation for older releases is also available.
The documentation is shown in Figure 30. The two argument split method will return all of the values
if the second parameter, limit is a negative number.
Change the call to the split method as shown in Figure 31 and then rerun the tests.
you will now see the output as shown in Figure 32, both of the parseLine tests pass!
Implementing loadData:
Insert the code shown in Figure 33 into your loadData method.
When you have inserted the code you will have a compile error warning in the left hand margin. Click on the warning and select to Add import for java.io.BufferedReader. The compile error marker will persist. Click on it again and select to Add import for java.io.FileReader. Your imports are now correct.
Most of this code will be relatively familiar to you from previous practical exercises and the lectures. Two
lines are more complex, BufferedReader br = new BufferedReader(new FileReader(file));
and while ( (line = br.readLine()) != null ){.
The first simply uses an anonymous object of type FileReader to bridge between the File
parameter and the BufferedReader constructor. BufferedReader does not have a constructor
which will take a File object as a parameter. We could have created a referenced FileReader
object, it is just neater and more elegant this way.
The second assigns the next line read from our BufferedReader in the variable line.
Each time this happens the line variable is checked to see if it has been set to null indicating
that the end of the file has been reached. The code cycles through the file until the return is null.
This is OK, the file is being read but we are not putting the returned values anywhere. Ideally we would use an array
to stored these values but we don't know the length of the file. To get the file length we could open up the
BufferedReader once to count the lines, close it, set up the array, open up the BufferedReader
again to read in the values, but this seems wasteful and not very elegant.
There is another way. Java provides another set of classes for storing groups of data together. These classes are known
as the collections framework. There are different structures to optimise dealing with data of different
structures. We are not going to look at all of these, you can read about them in the books recommended on the
recommended reading list. We are going to use one specific type of collection
called an ArrayList.
The major benefit of the ArrayList is that you don't have to dimension it before you use it. It expands as
you add objects to it! This is great, and I hear you cry so why bother with arrays? The truth is that for most small
tasks the collections framework is ideal. However, you cannot hold primitive data types in a collection. This again
is OK because Java supplies wrapper classes for all of the primitive data types. So we can wrap a primitive to store it
as an object as easily as Double d = new Double(yourPrimitiveDouble);.
Again great. However, it is true that a wrapper object takes up more memory than a primitive. Collections are also slower to cycle than primitive arrays. These are generally not concerns unless you are dealing with lots of data, but in social science modelling we can be! So knowing both methods is essential. Additionally the collections framework was not in early Java releases so you may come across legacy code that uses arrays and understanding them is useful.
Lets create an ArrayList object to hold our data. Type in the code as an instance variable as shown in
Figure 34, line 20. Click on the compile error marker and select to Add the import for java.util.ArrayList.
Add the returned String array into the new ArrayList as shown on line 35.
The last thing to do in this method is check that the number of fields is always the same for each line in the file. Adjust the code to be consistent with Figure 35.
On line 30 we create a new variable to hold the count of fields. The if statement starting at line 37
checks to see if we are on the first line using the ArrayList.isEmpty method. If this is the first
line then we assign the length of the returned array of values from parseLine into the variable
fields. If it is not the first line we check the length of the returned array against our field count
variable fields. If the values are not the same a new IOException is thrown.
Rerun the tests on the CSVReader file. You should now see three tests passing as in Figure 36.
Implementing getData:
The final change to make is to the getData method to make it return the data in the correct format.
Make the changes to the getData method as shown in Figure 37. Our spatial interaction model uses
arrays rather than ArrayList objects so we are going to return a two dimensional array.
Line 69 creates a new two domensional String array using the ArrayList.size method
to supply the size of the first dimension.
Line 70 uses a converter method built into ArrayList class called toArray which copies
the contents from the ArrayList into the array supplied. This is OK because our ArrayList data
contains String arrays. Remember that multidimensional arrays are just arrays of arrays...
Line 71 returns the new two dimensional String array.
Rerun the tests for CSVReader, you should now see them all pass, Figure 38.
Summary:
- Test driven development is more of a complete programming style than a tool.
- Test driven development is powerful and produces robust easily maintained code.
- Test driven development is time consuming.
- Line by line debugging is a useful and powerful tool to assist in understanding problems with code.
- Full Application Programmable Interface (API) documentation is available online for the Java language.
- The collections framework provides a flexible alternative to arrays.
- Enums are a useful way of enforcing strict data types where select lists of information are required.
- Enums look like a small class within a class but there constructor cannot be accessed outside of the enum.
- The Java core language provides wrapper classes for all of the primitive data types.