Video based identification of benthic objects

CB 7.01. Milestone report for task 1.2.6, Development of enhanced video based techniques for the identification of benthic objects. May 2006.
James Wise. Research Officer School of Plant Biology, Faculty of Natural and Agricultural Sciences. The University of Western Australia. Research Associate RMIT University Victoria.

Table of Contents

1.0 Introduction and Problem Definition

1.1 Introduction
1.2 Classification Groups

2.0 Initial Classifier Selection

3.0 The Test Data

3.1 Origins of the Data
3.2 The Data Features
3.3 Jittering
3.4 Data Quality
3.5 Data Details

4.0 The Test Plan

4.1 Leave-One-Out Testing
4.2 Classification Possibilities
4.3 The Confusion Matrix

5.0 Initial Testing

5.1 Initial Findings
5.2 Test Data Modifications
5.3 Refined Findings
5.4 Changes to Testing Requirements and Data Sets

6.0 Final results

6.1 General Fish Results
6.1.2 A Note About The SVM Classifier
6.2 Analysis of Fish Results
6.3 The Final Fish Classifier
6.4 General Sponge Results
6.5 Analysis of Sponge Results
6.6 The Final Sponge Classifier
6.7 Classification Summary
6.8 Future Work and Possible enhancements

7.0 Project Summary

8.0 References

9.0 Appendix

List of Tables

Table 3.5.1 The 10 specific fish species tested
Table 3.5.2 Sponge classes and example images
Table 4.3.1 Example confusion matrix, with 3 classes
Table 6.1.1 Best of network results for fish classifications
Table 6.3.1 Confusion matrix for final fish classifier and expanded high quality data set
Table 6.3.2 Confusion matrix for final fish classifier with all data
Table 6.4.1 Best of Network Results for Sponge Classifications
Table 6.6.1 Confusion matrix for final sponge classifier

Back to top of page


1.0 Introduction and Problem Definition


1.1 Introduction

One of the most difficult tasks for marine scientists is the accurate identification of benthic objects. Historically identification of benthic objects has been a manual process. Over the past five to ten years traditional quadrat based sampling methods have been extended and enhanced by video based techniques. As video processing techniques and available computer hardware has advanced, the possibilities for video based sampling have also advanced. Taking advantage of these new technologies, an image-based application capable of automated classification of benthic objects has been developed. The application combines work from many diverse fields such as marine science, image processing, and artificial intelligence, resulting in a computer application capable of automated classification of benthic objects. This report outlines the classification system and how it has been built. Several classifiers were compared, using different features derived from the original images, in order to determine the best overall classification system.

The problem of correctly classifying benthic objects is composed of many highly complex tasks. However the whole process can be broken down into a series of smaller, simpler tasks: object identification, object extraction, feature selection, feature extraction, classifier (neural network) selection, classifier training, and finally the actual classification of the selected features with the classifier of choice. Wise (2004, 2005a) and Seager and Wise (2004) detail the object identification, object extraction and feature extraction/selection stages of this process, with Wise (2005b) detailing initial research into the selection of appropriate classifiers.

This report will focus on the training, testing and eventual selection of an appropriate classification system for benthic objects. The process of determining the best classifier is very complex; hence it has been broken down into several steps. The first step is to devise a rigorous testing methodology that will allow the different classification systems to be compared on equal terms. In addition to a detailed understanding of the testing methodology, the interpretation of the results needs to be clear and concise; not a simple task when dealing with complex classification systems such as neural networks. The last step is to isolate and refine the best performing classifier that will ultimately result in the best possible classification system being developed.

1.2 Classification Groups

This report will investigate the performance of various classifiers given the task of benthic object classification. Two distinct groups of benthic objects have been selected for study: fish and sponges. The fish group consists predominantly of saltwater fish from Western Australia, however some additional species have been included to provide a sufficiently broad test group. The species selected for the sponge group are also native to Western Australia The sponges vary in colour, shape and size.

Back to top of page


2.0 Initial Classifier Selection

Wise (2005b) looked at several classification systems ranging from basic Bayesian based classifiers to complex neural networks such as support vector machines. Based on this review the four most applicable classifiers were selected for implementation. These classifiers were: a back-propagation neural network (back-prop) that uses a genetic algorithm for increased training speed (Salomon and Van Hemmen, 1996), an adaptive resonance theory type II network (ART), a Kohonen network, and a support vector machine (SVM). Of these four classifications systems both the back-prop and the SVM feature supervised learning, whereas the ART and Kohonen networks are both unsupervised networks.

The back-prop network was selected for implementation because it is a well-known and well-understood neural network classification system. Back-prop networks have been in use in commercial classification systems for several decades. Back-prop networks generate definite outputs, which allow for a simple and easy to understand classification of the data. Due to these features a back-prop network appeared to be a highly suitable network for implementation.

A Kohonen network is an unsupervised network that offers a unique self-organising output of classification results. Based around a multi-dimensional arrangement of output nodes, the results of a Kohonen network not only offer a node specific classification, but also information about how the classification relates to other neighbouring classes (this is the self-organising aspect). This requires slightly more interpretation on behalf of the user, but also provides more output information. This form of output could be of use when classifying similar benthic objects, as it allows for an automated computer-based classification, while providing room for some human interpretation of results. Hence a Kohonen network was selected for implementation on the basis of its self-organising properties and unique presentation of output data.

The ART network was selected for implementation because of its method of determining the appropriate number of output classes. ART based networks self determine the appropriate number of output classes, based primarily on the setting of a vigilance parameter. This system offered an interesting alternative to the other networks selected for implementation.
Support Vector Machines are a relatively new form of supervised classification system. They have been used with success to solve highly complex classification problems (Tsantis et al., 2005). Given the complexity of fish and sponge classification it seemed likely that an SVM might be a successful classifier, where other classification systems may fail.

Back to top of page


3.0 The Test Data


3.1 Origins of the Data

The original purpose of the project was to develop an application capable of automated classification of benthic objects. The raw data is collected in the form of digital imagery as either audio video interleaved (AVI) files or still images. In both instances the data is treated as still imagery as AVI files are examined on a frame-by-frame basis. Each of the classification systems take a feature vector of values as an input; how such a feature vector is generated from the original still image is briefly outlined below. Initially the image is enhanced. This typically involves focusing on a specific area of the image containing the object of interest. Subsequently contrast enhancement and brightness adjustments are made. The next step is to identify the outline of the object. This process is automated, although in some circumstances user definition of the outline is required. Once the object has been located and an outline acquired the object can be extracted. Extraction simply involves isolating all of the pixels that belong to the object of interest from the rest of the image. This results in an image that has a black background and the extracted object of interest in the foreground. It is this extracted image that is used to generate the various features.

3.2 The Data Features

The specific features themselves were discussed in detail in Wise (2005a), and are briefly outlined here. The first and most obvious feature is the outline of the object. In fish classification the outline is a strong indicator of which class the object belongs to. However, the object outline cannot be used in sponge classification. In order to remove reliance on the initial size and orientation of the object a method of representing shapes in a scale and orientation independent representation was required. A Fourier descriptor provides such a representation. Fourier descriptor values are both resolution and rotation independent, making them perfect candidates for use in the input feature vector. It was decided to take a total of 15 Fourier descriptor samples to represent the shape of an object. The software application that was used to generate the feature vectors from the original image data is capable of controlling the number of features selected, but it has been found that using 20 Fourier descriptor samples did not give significantly greater accuracy in results, whilst using only 10 yielded significantly poorer results.

The next feature concerns the texture of the object. This texture feature is extracted using a wavelet transformation of the image. A wavelet filter bank is generated by scaling and rotating a ‘mother wavelet’. In this case four orientations were used at each of six scales in a frequency range of 0.01 to 2.0 hertz. This produced 24 complex wavelet filtered images. The complex results of the wavelet-transformed images were treated as phase and magnitude components, and used to generate statistics across the 24 filtered images. The statistics used were the density, mean, standard deviation, skew and kurtosis. Each image generates five data values for phase and magnitude components, resulting in a feature vector of 240 values.

The last feature is another texture measure based on the principal component analysis (PCA) of a range of image statistics. This method of turning the textual information into numbers is radically different from the previous wavelet method. This method relies upon generating a feature vector for each pixel within the object of interest. This pixel level feature vector is 73 terms long, and is composed of a number of different pixel tests. These tests range from things as simple as the red, green and blue values through to more complex values generated from a gray level co-occurrence matrix centered on the target pixel. The full range of features has previously been discussed in Wise (2005a) under the title of “MegaFeature”. This feature vector is generated for every pixel in the object area and then averaged across the entire sample. Thus every sample in the training set has a 73-value feature vector associated with it. This 73-value feature vector is then used in a principal component analysis across all of the training data in order to isolate the most important features. The first five principal components were selected as these generally accounted for more than 90% of the variation in the data.

3.3 Jittering

Jittering is the process of replicating a sample whilst adding some noise. The samples used in this experiment were jittered once to double the total number of available samples. For each sample the jittering process involved extracting the original object and boundary, then applying a slight rotation, a vertical mirror operation, followed by a blur operation. Jittering and adding noise to the original data samples not only helps create more training data, but also helps the network to better generalise when classifying unseen samples (Sarle, 2002).

It is important to note that while jittering can increase the number of samples and help with classifier generalisation, it is possible that the jittering process can add too much noise to the entire data set. If the jittered samples add too much noise to the data set, the classifiers may not be able to determine an adequate solution or the classifier itself may not converge to a satisfactory result.

3.4 Data Quality

In order to interpret the results of this experiment it is important to understand the influence of data quality on the final results. Good quality images are more likely to result in a well-trained classifier. For example, well-lit, clear, high-resolution images will produce good quality data values in the extracted feature vector. However images that are of a low resolution, are excessively blurred, exhibit interlacing, do not exhibit good colour strength, or even images which have been taken with a lot of noise in the water column will all result in lower quality information being extracted in the feature vector. Low quality data in the feature vector will make it difficult for the classification system to converge towards a suitable result during the training process. Much of data within the training set is of medium or low quality, and whilst this may not represent the best data for training the networks, it is likely to be an accurate representation of the quality of future input data.

Reliance on medium and low quality data may present a significant problem in the training of a reliable network. Whilst training networks with only high quality data will have a much greater chance of resulting in a stable and reliable classifier, the resulting classifier may not be as capable of correctly classifying the noisy and low quality inputs that are expected to be typical of real world use. The quality of the images themselves also differs between the sponge and fish data sets. The fish images are generally of lower quality for several reasons. Most of the fish images originated as frames in interlaced AVI movie files, which are of lower resolution than comparable still image files. The movement of the fish also contributes to lower quality, and fish are harder to get close to, resulting in larger object distances. The sponge images are all still images, and most were captured using artificial lighting. It is also important to note that the sponge images were down sampled from an original resolution of 2272x1704 to 800x600. This was done purely for time constraints, as calculating the data for all the features for a single image at a resolution of 800x600 can take up to 45 minutes, hence the same image at the original size would be approximately eight times as long, or up to 6 hours.

3.5 Data Details

Table 3.5.1 lists the class names and scientific names for the fish classes used in this experiment. Most of the fish classes have been selected due to their proximity to the project location in Western Australia, and the ease of collecting sample imagery. However, some species have been included purely to broaden the scope of the data. An example is the Teira-Batfish group that has been included to complement the Old Wife class. The Old Wife and Batfish exhibit a similar complex shape and they provide a test to establish the importance of shape in making correct classifications.

Table 3.5.1 The 10 specific fish species tested


Class Name / Scientific Name Example Image - Typical of the training images used
Teira Batfish/Platax teira Teira Batfish/Platax teira
Bullseye/Pempheris klunzingeri Bullseye/Pempheris klunzingeri
Old Wife/Enoplosus armatus Old Wife/Enoplosus armatus
Samson Fish/Seriola dumerili Samson Fish/Seriola dumerili
Silver Drummer/Kyphosus sydneyanus Silver Drummer/Kyphosus sydneyanus
Swallowtail/Centroberyx lineatus Swallowtail/Centroberyx lineatus
Tarwhine/Rhabdosargus sarba Tarwhine/Rhabdosargus sarba
Skipjack Trevally/Pseudocaranx dentex Skipjack Trevally/Pseudocaranx dentex
Southern Bluefin Tuna/Thunnus maccoyii Southern Bluefin Tuna/Thunnus maccoyii
Woodwards Pomfret/Schuettea Woodwardi Woodwards Pomfret/Schuettea Woodwardi

A similar table has been included for the six sponge species being used in the classification analysis. As not all of the sponge species are fully documented, the scientific names are not given. However, it can be seen that the sponges consist of six groups, ranging in size, shape and colour . Three of the sponge groups are of an orange/red/pink colour , one group is blue- grey , one group is dark brown and the last group is a dark green colour . All groups exhibit different textures. By using several groups exhibiting similar colour it can be determined if the classification systems are capable of separating groups using texture-based features, and not by colour alone.

Table 3.5.2 Sponge classes and example images

Sponge Group Example Image Sponge Group Example Image
Orange Group 1 Orange Group 1 Blue Grey Group 4 Blue Grey Group 4
Orange Red Group 2 Orange Red Group 2 Brown Haliclona Group 5 Brown Haliclona Group 5
Orange Pink Group 3 Orange Pink Group 3 Green Haliclona Group 6 Green Haliclona Group 6

4.0 The Test Plan

The test plan involved running each of the different networks with different network configurations. The networks and parameters that were varied are listed below:

  • Back-prop – different numbers of hidden layers and hidden layer neurons
  • Kohonen – different output map dimensions
  • ART – different vigilances and maximum number of output classes
  • SVM – different kernels and values of the “C” parameter

Each of the networks was tested with many combinations of input feature vectors. For fish classifications the different combinations tested are given below:

  • Shape only
  • Shape and wavelet
  • Shape and PCA
  • Shape, wavelet, and PCA

For the sponge classifications the following feature combinations were tested:

  • Wavelet only
  • PCA only
  • Wavelet and PCA

Results of these initial classifier trials were used to compare different network performance, and guide any required network or input data tuning.

4.1 Leave-One-Out Testing

Classifier performance is generally measured by training a classifier with a data set, then testing its performance on an independent data set. Ideally the training and testing data sets are of the same size and are completely independent. This ideal situation is rarely attainable as it requires a very large data set to divide amongst training and testing. A standard approach when limited training and test data is available is leave-one-out testing. Leave-one-out testing has been found to be effective when limited data are available for testing and training of the network (Sarle, 2002). Leave-one-out testing works by removing one sample from the training data, training the network and then testing the excluded sample. This process is repeated once for each item of data in the training set, resulting a network being trained and results generated for every data sample in the training set. At the end of this process the number of correct classifications can be calculated, indicating the overall performance of the classifier. The drawback of the approach is that the classifier must be trained many times to establish its performance.

Many variations on the leave-one-out process exist, the most obvious is to change the number of samples excluded during the training of a network. A modified system of the leave-one-out methodology was used with the complete fish data set, resulting in eight iterations of leave-ten-out. When using the smaller high quality fish data set, a system of ten iterations of leave-two-out was used. For the sponge network comparisons the system used was five iterations of leave-three-out.

4.2 Classification Possibilities

There are three possible classification results from a neural network: classification into the correct class, classification into an incorrect class, or null classification. A null classification arises when a pre-determined classification confidence level is not reached. Some of the classification systems under consideration are capable of giving a confidence level for every classification. The current implementations of the back-prop and Kohonen networks generate a confidence value. For the back-prop and Kohonen networks, any classification that fails to reach a pre-defined confidence level results in a null classification. All classification results are tabulated in a Confusion Matrix.

4.3 The Confusion Matrix

A confusion matrix is a matrix with N rows, and N+1 columns, where N is equal to the number of classification categories. The last column, the N+1 column, is used to record the null classifications. During evaluation of classification performance a series of samples is presented to the classifier; the result of each sample presentation is recorded as an entry in the confusion matrix. A correct classification will increase the number in the diagonal location that represents that class. Thus if Batfish is the first class, a correct Batfish classification would increase the value held at [1,1] by one. If the Batfish class is represented by the 1st column and row, a batfish sample that was unable to be classified (null classification) would increase the last column value in row one by one. If a sample is classified incorrectly it will be placed into the row of the actual sample and the column of the sample it has been classified as.

Table 4.3.1 Example confusion matrix, with 3 classes


  Batfish Samson Fish Old Wife Null Classifications
Batfish 33 0 4 3
Samson Fish 0 38 0 2
Old Wife 8 2 28 2

In Table 4.3.1 it can be seen that 33 of the batfish were classified correctly, four were classified incorrectly as Old Wife, and three were null classifications, 38 of the Samson Fish were classified correctly and two were null classifications. In the Old Wife group eight samples were classified as Batfish, two as Samson Fish, 28 correctly classified as Old Wife and two were null classifications. It can be seen from this example that the confusion matrix not only provides information about the number of correct classifications, but also extra information regarding null classifications and the nature of misclassifications.

Each of the classification systems has a slightly different way of outputting classification results. All of the classifiers generate an output node for every classification. Some classifiers generate additional information in the form of a confidence value. This confidence value can be used to separate reliable classifications from less reliable classifications or, in this experiment, successful classifications from null classifications.

For the Back-prop network a confidence level of greater than 50% was deemed as an appropriate value to separate the correct and incorrect classifications from the null classifications. Samples with less than a 50% confidence value were automatically considered a null classification. Samples with greater than a 50% confidence value were considered classified, regardless of whether or not the classification was correct.

The Kohonen network has very different requirements due to the different nature of its output. In order for a successful classification in a Kohonen network it was deemed that the test sample must have a confidence value of greater than 30% and be matched with a node that, during training, was matched with more of the target class than any other class. Because a Kohonen network can implement more output nodes than input classes, the requirements for the confidence value have been relaxed. It is also possible for a test sample to be classified to an empty output node; in this case a null classification will be generated.

Both the ART and SVM networks use similar classification requirements to the back-prop network. However, since neither the ART nor the SVM networks give confidence levels for the resulting classification, these classifiers are treated in a different manner. In the case of an SVM network input samples are always classified into one group, meaning that a SVM classifier will never have any results in the null classification category. In the case of an ART network a test sample can be marked as stable (which represents a confidence of 100%), or not stable (a confidence of 0%), when a classification is made. For the data collection purposes of the confusion matrix all the SVM results will be placed into the appropriate classes. As for the ART based classifiers, only results that are deemed unstable will be placed in the null classification column on the confusion matrix.

Back to top of page


5.0 Initial Testing


5.1 Initial Findings

As the researchers had previous experience in the successful application of back-prop networks, these were selected as the starting point for testing. This initial round of testing made use of the fish data set, and testing results were not encouraging. Four networks were tested with the following configurations:

256-30-10
256-50-25-10
256-15-10
256-128-64-10

The feature vector inputs were 15 shape-based inputs and 241 wavelet based texture features. The above network architectures were tested using the complete data set of eighty samples from each class. All of the networks produced null classifications for all samples. That is, none of the samples across any of the networks were classified with a confidence value of greater than 50%. There are many reasons that could have contributed to this result; the poor quality of the input data may be providing too much noise to the network, the requirements for a classification may have been too restrictive (the required confidence value may be to high), the network may not have been trained for a sufficient number of iterations to allow it to properly converge towards a solution, the actual configuration of the network itself may not be suited to this specific problem, or the classification problem in question is simply too complex to be learned by this network given the data used. In light of these initial results several modifications were made to the test data and the method of recording results in the hope that a successful network could be built.

5.2 Test Data Modifications

Several different tests were conducted in an attempt to discover what was causing the networks to generate such poor results. Firstly a smaller fish data set was created. This data set consisted of 20 high-quality data samples from each class. It was hoped that the use of this smaller high-quality data set would reduce the effects of the low quality and noisy data in training the networks. Secondly a series of tests were done comparing only two classes at a time (rather then the entire data set consisting of 10 classes). The goal of these two class tests was to reduce the overall complexity of the classification problem. Lastly a network was tested while ignoring the confidence values, to see if the confidence value had been set too high. Since multiple network configurations had already been tested, the actual network configuration was not suspected to be a likely cause of the problem. Similarly the previous networks had been trained to 10,000 iterations, which would indicate that the number of training iterations was sufficient (for the accelerated back-prop training algorithm being used). It was anticipated that with these modifications to the testing data set, the true nature of the problem could be ascertained.

In the process of selecting samples for the “high-quality” test set, samples were selected based on a number of criteria. These criterua were; actual image quality, the number of pixels that the object is represented by (more is better), the amount of colour present within the image, and the overall shape definition of the object.

5.3 Refined Findings

Results of tests with the high quality fish data set were informative and surprising. Initially, five two-class networks were trained to test the complexity theory. Of these five networks four showed accurate classification results. All five networks managed to correctly classify over 130 of the 160 samples, and three of the five networks resulted in over 140 correct classifications. However, one network showed only just above a 50% classification rate. This indicates that while the networks in question were more capable of successfully classifying objects when dealing with a two-class classification problem compared to a 10-class classification problem, some of the two class classification results were still poor. It is also important to note that none of the two-class networks yielded any null classifications.

The networks trained with just the smaller high quality data set were not highly successful, thus indicating that the quality of data may not have been a significant contributor to the poor quality of the classification results. Initially two classes showed good results, but the remaining classifications had a confidence value too low to be deemed a successful classification, thus the confusion matrix had the majority of values in the last null classification column. Given the strange appearance of these results it was decided to run the test again, ignoring the (greater than 50%) confidence value requirement for a successful classification. This resulted in some very positive results. Of the total of 150 samples tested, 112 were classified correctly, leaving only 38 incorrect classifications. This result indicated two things, that training using the high quality data set gave much better results, and secondly that the confidence value was probably set too high – there were many correctly classified samples that had low confidence values. Given these results it was decided to use the high quality subset of the data samples for testing the remaining networks, as well as to remove the confidence level requirement from all classifications. It was also decided to use this new method for the classification of the sponges.

5.4 Changes to Testing Requirements and Data Sets

In addition to using the high quality data set, further testing with the back-prop and Kohonen networks have ignored the confidence value. However results from an ART network will still use the unstable value of an output node to indicate a null classification. Together with the change in the method of tabulating results, and the new high quality data set, two additional sets of fish training and testing data were created.

The first additional test set was an expanded version of the high quality data set. It consisted of the original high quality data set, plus ten of the jittered high quality samples. The last set of test data consists of all of the samples not selected for the high-quality set. This last test set was used to evaluate the ability of a network to generalise over a broad range of data. This means that a network, which has been trained with the initial training set, can be tested against the entire remaining data set. This last test set enables a second fully independent set of results to be acquired from each network configuration. The intention was not to use this last test set in the initial training, but to use it as a secondary testing means for the detailed testing and analysis of specific network configurations.

Back to top of page


6.0 Final results


6.1 General Fish Results

With the revised test set the different classifications systems generated some significant results. The results of initial testing with the revised data set indicated that the back-prop networks were the most successful. The best configuration of all the networks managed to correctly classify 87.5% percent of the test samples. This configuration was a back-prop network using only the shape feature. However when tested against the entire data set (of all 80 samples per class) the shape only network performed very poorly when compared to the shape and wavelet network. The next best network was also a back-prop network, managing a correct classification rate of 83.5%, which is significantly better than the best Kohonen result of 65.5%, the SVM rate of 63%, and the best ART result of just 32.5%. In each of these cases the best results were obtained using both the shape and Gabor wavelet features (an input feature vector of 256 values). Table 6.1.1 summarises the results, with full confidence matrices given in the appendix.

Table 6.1.1 Best of network results for fish classifications


Network Best Classification Rate Configuration Details
Back-prop 83.5% 256-25-10 (Shape + Gabor Wavelets)
Back-prop (Shape only) 87.5% 15-12-10 (Shape only)
Kohonen 65.5% 256- 5x5 (Shape + Gabor Wavelets)
ART 32.5% 256-Max 20 outputs Vigilance = 0.975 (Shape + Gabor Wavelets)
SVM (Slightly different testing methodology) 63.0% Linear Kernal C = 0.1 (Shape + Gabor Wavelets)

These results clearly indicate that the back-prop network is the best performing classifier for the given fish species. Further testing was conducted on the back-prop network to establish the optimum network configuration.

6.1.2 A Note About The SVM Classifier

Due to the extreme complexity of the SVM algorithm it was decided to implement a SVM network that was available as open source on the Internet (Joachims, 1998). At the time this appeared to be a perfect solution as the SVM code provided was based on a scientific paper used as a reference in a previous milestone report (Crammer et al., 2001). However it was found that when dealing with multiple repetitions of very large training sets this code would fail. As a result the training of the SVM network consisted of two iterations of leave-five-out, which yielded only 100 results. Whilst this resulted in a significantly smaller test set, the number of samples used for training is only slightly reduced. It should also be noted that due to the smaller size of the sponge data set, this problem did not exhibit itself during the training and testing of the various sponge networks.

6.2 Analysis of Fish Results

Unsurprisingly, shape proved to be an important factor in fish classification. Using the shape feature alone the back-prop network managed to correctly classify 87.5% of the samples, similarly the Kohonen network managed to correctly classify 52.5% of the samples using only the shape feature. Conversely, the PCA feature does not appear to contribute significantly to the classification performance. Networks trained with only the shape and PCA values showed better classification results than networks trained on shape alone, but inferior results when compared to networks trained using the shape and wavelet features. This indicates that there is some useful information in the PCA feature, however the wavelet feature contains more useful information. The poor performance of the PCA feature may be due to the original 73 values representing an average of values across the entire object in question. It is likely that this averaging is “blurring” the results of the PCA data features too much for it to be useful in this context.

The Kohonen networks were the most successful at classifying the Batfish and Swallowtail groups. These two groups together with the Silver Drummer group were amongst the most successful groups in the back-prop and ART based networks. However the SVM based networks were the most successful at classifying the Samson Fish and Bluefin Tuna groups. Across all of the networks Batfish were often misclassified as an Old Wife, which is not surprising given the strong similarity between the shapes of these two species (see the images in table 3.5.1). The Swallowtail group was the only other group to receive a significant amount of incorrect Batfish classifications. Considering the very different shapes of these two species, it appears that similarities in texture are contributing to the these misclassifications.

Using the high quality data set all the networks had difficulty classifying either Skipjack Trevally or Tarwhine. It is likely that the poor classification rates for the Tarwhine and Skipjack Trevally groups are due to the lack of unique or distinguishing features within those groups. The best back-prop network only managed a correct classification rate of 45% and 65% for the two species respectively. The best performing Kohonen network still had trouble classifying the Skipjack Trevally and Silver Drummer groups, yielding a correct classification rate of just 50%. The SVM network achieved a classification rate of 0% for the Skipjack Trevally group. The next best result for the SVM network was with the Bullseye group that had a correct classification rate of just 40%, indicating that while the average classification rate for this SVM network was reasonably high, at 63%, at least two of the groups had extremely poor classification results. Interestingly, if the results of these two groups are ignored the average classification rate of the SVM network rises to 74%.

The distribution of misclassification rates did not show any significant trends except for the ART based networks that had a strong tendency to misclassify samples into either the Silver Drummer or Swallowtail classes.

6.3 The Final Fish Classifier

The final fish classifier used a 256-25-10 back-prop network. This configuration was selected because it provided superior results when compared to other networks, namely a 256-30-10 and a 256-20-10 network. The final network was trained using the high quality data set that was extended with 10 of the jitter samples for each class. Using ten iterations of leave-three-out testing and the expanded high quality data set the network obtained a correct classification rate of 80.3%, when tested against just the high quality data subset the network managed a correct classification rate of 92.5%. The confusion matrix for this result is given in table 6.3.1.

Table 6.3.1 Confusion matrix for final fish classifier and expanded high quality data set.


  Batfish Bullseye Old Wife Woodwards Pomfret Samson Fish Bluefin Tuna Silver Drummer Swallowtail Tarwhine Skipjack Trevally
Batfish 19 0 1 0 0 0 0 0 0 0
Bullseye 0 14 0 0 5 1 0 0 0 0
Old Wife 0 0 20 0 0 0 0 0 0 0
Woodwards Pomfret 0 0 0 20 0 0 0 0 0 0
Samson Fish 0 0 0 0 20 0 0 0 0 0
Bluefin Tuna 0 0 0 0 2 18 0 0 0 0
Silver Drummer 0 0 0 0 0 0 20 0 0 0
Swallowtail 0 0 0 0 0 0 0 20 0 0
Tarwhine 0 0 0 0 0 0 0 0 20 0
Skipjack Trevally 0 0 0 0 6 0 0 0 0 14

When tested against the entire data set including all original and jittered samples a correct classification rate of 72.3% was achieved, the detailed results of this can be seen in table 6.3.2.

Table 6.3.2 Confusion matrix for final fish classifier with all data.


  Batfish Bullseye Old Wife Woodwards Pomfret Samson Fish Bluefin Tuna Silver Drummer Swallowtail Tarwhine Skipjack Trevally Total Samples
Batfish 65 0 9 0 4 2 0 0 0 0 80
Bullseye 0 38 0 0 17 12 5 9 5 0 86
Old Wife 20 0 62 0 0 0 0 0 0 0 82
Woodwards Pomfret 0 0 3 74 5 0 0 0 0 0 82
Samson Fish 0 0 0 0 80 0 0 0 0 0 80
Bluefin Tuna 2 0 0 0 17 77 0 0 1 0 97
Silver Drummer 0 0 0 0 22 2 79 0 1 0 104
Swallowtail 0 0 0 0 0 0 0 88 0 1 89
Tarwhine 0 0 0 0 9 4 5 16 58 0 92
Skipjack Trevally 10 0 0 0 41 12 2 4 5 33 107
                      899

When tested against the entire data set, the final classification rates are approximately 20% higher than the next best performing network. Some misclassifications of note are the Bullseye class that is commonly misclassified into the Samson Fish class. Also of interest are the Silver Drummer results, given a correct classification rate of approximately 71%, the misclassified samples were distributed amongst three classes, with the Samson Fish and Blue fin Tuna classes accounting for 25% of these (leaving a minor 1% of misclassified samples going into the Tarwhine class). Skipjack Trevally had the worst classification rate of only 31%. The misclassifications of the Skipjack Trevally samples were distributed amongst six other classes, with the Samson Fish class managing 39% of the misclassifications.

6.4 General Sponge Results

The results of the initial sponge classifications were similar to the results of the fish classifications, with a few exceptions. When training the sponge classifiers, the data inputs consisted of the Gabor wavelet values, the PCA values or both. Every network was found to generate better results when the wavelet feature was used. None of the networks performed very well using only the PCA data, with the best of any of the networks using just the PCA data attaining a correct classification rate of only 6%. The best performing network was the back-prop network, using just the Gabor wavelet inputs. Table 6.4.1 summarises the results, with confusion matrices given in the appendix.

Table 6.4.1 Best of Network Results for Sponge Classifications


Network Best Classification rate Configuration Details
Back-prop 70.0% 241-15-6
(Wavelets only)
Kohonen 53.3% 241- 3x3
(Wavelets only)
ART 35.5% 241-Max 15 outputs
Vigilance = 0.95
(Gabor Wavelets only)
SVM 61.1% 241 inputs
Linear Kernal, C = 0.1
(Wavelets )

6.5 Analysis of Sponge Results

Sponges can be very difficult to classify, even for a skilled human operator. Colour and texture are important classification cues. However, sponge shape is also important, but in a three-dimensional morphological sense as opposed to the two-dimensional silhouette that was very successful with fish classification. All of the networks tested had very little success in correctly classifying the Green Haliclona group, with the best performing back-prop network only managing a success rate of 13%. This is most likely due to the fact that the images that make up this group exhibit a wide range of lighting conditions. Across all networks the Brown Haliclona group was the most successfully classified group, however it should be noted that this group also received the highest number of misclassifications from other classes. An extreme example of this is the SVM network, when using just the PCA feature, all but 10 of the 90 samples were classified in the Brown Haliclona group. As expected the first three classes all exhibit a large degree of misclassification amongst each other, this was apparent across all networks except the SVM networks which resulted in significant misclassifications into the Blue-Grey group, but not in the first Orange only group.

6.6 The Final Sponge Classifier

The final sponge classifier selected for use is a back-prop network with a 241-15-6 configuration. The confusion matrix for this network is given in table 6.6.1.

Table 6.6.1 Confusion matrix for final sponge classifier


  Orange Orange Red Orange Pink Blue Grey Brown Haliclona Green Haliclona Null
Orange 11 1 0 1 0 2 0
Orange Red 2 11 0 1 0 1 0
Orange Pink 0 1 13 1 0 0 0
Blue Grey 0 0 2 13 0 0 0
Brown Haliclona 0 1 0 0 13 1 0
Green Haliclona 4 2 2 1 4 2 0

The correct classification rate for this classifier is 70%. In line with the results of the other networks, the Green Haliclona group has the poorest classification results. The misclassifications of the Green Haliclona group are distributed between all five other groups, with the most misclassifications going to the Orange group and the Brown Haliclona group. The rest of the groups all show much higher results, with correct classification results of between 66% and 86% compared to a rate of 13% for the Green Haliclona.

6.7 Classification Summary

The results presented above demonstrate it is possible to construct a reliable classification system for benthic objects. There are many factors that contribute to the overall success of the classification system such as: the quality of the original data, the amount of training samples, the nature of the objects to be classified, and the network selected as the classifier.

Fish classification was far more successful than sponge classification. This is primarily due to the use of shape as a fish feature descriptor. A three-dimensional shape measure should improve sponge classification, but would be difficult to implement, and slow to execute.

6.8 Future Work and Possible enhancements

There are many subtle improvements that could be made to the existing classification infrastructure. The lack of significantly useful information in the PCA feature was surprising, and is worthy of a more detailed investigation. One option would be to segment the generation of the PCA data to physically distinct parts of the object. For example the inner texture of a fish could be treated as one segment, the fins and tail as another, and the eye and mouth area as the last segment.

It is likely that the wavelet feature could also be improved by standardising the orientation of the images prior to the wavelet transform. Another improvement would be the integration of stereo image support, which would allow for much greater accuracy when dealing with object shapes by moving from a two-dimensional to a three-dimensional shape. This would also allow shape information to be used as a sponge feature.

Classifier performance could also be improved by increasing the amount of high quality training data. Making efforts to control lighting conditions and improve the general image quality could make further improvements to classifier performance. Using non-interlaced video footage, and not down sampling existing imagery could also help to increase classifier performance. Whilst using higher resolution imagery would result in longer times for feature generation it could also help to increase the overall classifier accuracy.

There may also be potential for improvements to the classification speed of the JEHP application. Currently the classification process can take up to 50 minutes to build the required feature vector, however there is scope for improvements through changes to the actual implementation of the program.

Back to top of page


7.0 Project Summary

The aim of the project was to research and develop a system capable of automated detection, extraction and classification of benthic objects. This involved research across many areas including image processing, feature extraction and artificial intelligence. The research resulted in the design and implementation of a computer application capable of classifications tasks. A significant part of the development process included collecting a library of calibrated imagery to serve as input data for the classification process.

The first phase in this project was research into different methods of extracting objects from images. This research involved image-processing topics such as edge detection, active contour models, image enhancement, and region growing. In addition, several feature extraction methods were investigated to identify what features would be the most useful in the classification process.

As a result of the research completed up to this point a software application was developed to support the implementation and further investigation of image processing and object extraction techniques. This application also served as the basis for investigating the effectiveness of different image features such as shape descriptors, and texture features in the form of wavelets, grey level co-occurrence matrices, and image covariance analysis. As part of the investigation into different image features a powerful system of region growing and image segmentation was developed based upon a number of customisable pixel based image features.

The next phase of the project involved the collection of a library of benthic objects to serve as test organisms. This benthic object library consists of a large number of still images and movie files of assorted fish and sponge species. Some of the imagery that makes up the library is measured and calibrated, and large amounts of the library are identified and classified. This library of benthic objects served as the raw data for testing the effectiveness of different image features and as the test and training data for comparing various classification systems.

Considering the image library content and selected feature extraction tools, a review of modern classification techniques was undertaken. This looked at several different classification systems ranging from Bayesian classifiers to complex neural networks systems such as support vector machines. Each classifier was reviewed with respect to its abilities to solve complex image based classification problems. Following this review, four of the most promising classifiers were implemented in a software application designed for training, testing and comparing different classifiers. The four methods implemented were a standard back-propagation network, a Kohonen network, an adaptive resonance theory network, and a support vector machine. The performance of each of the classifiers was then compared using combinations of image features derived form the images within the image library.

Algorithms and techniques resulting from the research program have been implemented in a user-focused application called JEHP. This application was developed with the goal of being a simple, easy to use, marine science oriented application. JEHP is capable of basic counting and measuring tasks, as well as more complex tasks such as image segmentation and object classification.

The JEHP application allows users to access the final fish and sponge classifiers developed as part of this project. Both classifiers are back-propagation neural networks, and provide users with a classification for the object in question. Whilst this project has dealt with a limited subset of benthic species it is likely that the results achieved here could be replicated with a greater selection of benthic objects, including sponges, coral and fish.

Back to top of page


8.0 References

  1. Crammer, K., Singer, Yoram (2001). "On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines." Journal of Machine Learning Research: 265-292.
  2. Joachims, T. (1998). Making large-Scale SVM Learning Practical. Advances in Kernel Methods - Support Vector Learning'. A. J. S. B. S. o. C. J. C. Burges. Cambridge, USA, MIT Press,. http://svmlight.joachims.org/
  3. Salomon, R., and Van Hemmen, J. L., 1996. Accelerating Backpropagation through Dynamic Self-Adaption. Neural Networks, Vol. 9, pp. 589-601.
  4. Sarle, W. (2002). comp.ai.neural-nets FAQ.
  5. ftp://ftp.sas.com/pub/neural/FAQ.html
  6. Seager, J. and Wise, J.D.G., 2004. Milestone Report 1.2.3 Collect a calibrated image library of test organisms. Implement manual measurement of the three dimensional structure of epibenthic organisms. December 2005 milestone report for CWHM Project, Coastal CRC.
  7. Tsantis, S., Cavouras, D., Kalatzis, I., Piliouras, N., Dimitropoulos, N., Nikiforidis, G., 2005. Development of a support vector machine-based image analysis system for assessing the thyroid nodule malignancy risk on ultrasound.
  8. Ultrasound in Medicine & Biology, Vol. 31, Is.11, pp 1451-1459.
  9. Wise, J.D.G., 2004. Milestone Report 1.2.2 Review the applicability of image processing techniques, AI, wavelets, region growing, etc. to extracting epibenthic organisms from imagery (still and video). May 2004 milestone report for CWHM Project, Coastal CRC.
  10. Wise, J.D.G., 2005a. Milestone Report 1.2.4 Implementation of appropriate techniques from 1.2.2 in software. May 2005 milestone report for CWHM Project, Coastal CRC.
  11. Wise, J.D.G., 2005b. Milestone Report 1.2.5 Review of Classification techniques for classifying benthic organisms extracted from images using 1.2.4 December 2005 milestone report for CWHM Project, Coastal CRC.

Back to top of page


9.0 Appendix

Confusion matrix for best Back-prop network trained with the high quality fish data set. Configuration 256-25-10

  Batfish Bullseye Old Wife Woodwards Pomfret Samson Fish Bluefin Tuna Silver Drummer Swallowtail Tarwhine Skipjack Trevally Null
Batfish 19 0 1 0 0 0 0 0 0 0 0
Bullseye 1 13 0 0 1 5 0 0 0 0 0
Old Wife 0 0 20 0 0 0 0 0 0 0 0
Woodwards Pomfret 0 0 0 20 0 0 0 0 0 0 0
Samson Fish 0 0 0 0 20 0 0 0 0 0 0
Bluefin Tuna 0 1 0 0 1 17 0 0 0 1 0
Silver Drummer 0 0 0 0 0 3 16 0 1 0 0
Swallowtail 0 0 0 0 0 0 0 20 0 0 0
Tarwhine 0 0 0 2 2 2 1 0 13 0 0
Skipjack Trevally 0 0 0 0 2 5 0 2 2 9 0

Correct classifications = 167 -> 83.5%; Incorrect classifications = 33 ->16.5%
Confusion matrix for the best Kohonen network, trained with the high quality fish data set. Configuration 256 -> 5x5

  Batfish Bullseye Old Wife Woodwards Pomfret Samson Fish Bluefin Tuna Silver Drummer Swallowtail Tarwhine Skipjack Trevally Null
Batfish 14 0 4 0 0 0 0 2 0 0 0
Bullseye 0 12 0 1 0 0 4 0 2 1 0
Old Wife 3 0 15 0 0 0 0 2 0 0 0
Woodwards Pomfret 0 5 0 15 0 0 0 0 0 0 0
Samson Fish 0 0 0 0 11 1 5 0 2 1 0
BlueFin Tuna 0 0 0 0 2 15 2 0 0 1 0
Silver Drummer 0 3 0 0 3 2 10 0 2 0 0
Swallowtail 0 1 0 0 0 0 0 18 0 1 0
Tarwhine 0 0 0 3 3 0 1 0 11 2 0
Skipjack Trevally 0 2 0 0 1 0 1 2 4 10 0

Correct classifications = 149 -> 74.5%; Incorrect classifications = 51 -> 25.5%
Confusion matrix for best ART network trained with the high quality fish data set.
Configuration 256 inputs, max 20 classes, vigilance = 0.975

  Batfish Bullseye Old Wife Woodwards Pomfret Samson Fish Bluefin Tuna Silver Drummer Swallowtail Tarwhine Skipjack Trevally Null
Batfish 7 0 4 4 0 0 0 5 0 0 0
Bullseye 0 4 0 5 3 0 6 2 0 0 0
Old Wife 2 0 10 0 0 0 1 6 0 1 0
Woodwards Pomfret

2 1 0 9 2 0 4 1 0 1 0
Samson Fish 0 3 0 1 6 0 9 1 0 0
BlueFin Tuna 0 2 0 0 7 0 9 1 1 0 0
Silver Drummer 0 2 0 0 7 0 10 0 1 0 0
Swallowtail 0 0 0 2 2 0 0 15 0 1 0
Tarwhine 0 3 0 3 4 0 7 1 2 0 0
Skipjack Trevally 0 3 0 2 6 0 3 4 0 2 0

Correct classifications = 65 -> 32.5% Incorrect classifications = 135 -> 67.5%
Confusion matrix for the best SVM network trained with the high quality fish data set. Linear kernel, C = 0.1

  Batfish Bullseye Old Wife Woodwards Pomfret Samson Fish Bluefin Tuna Silver Drummer Swallowtail Tarwhine Skipjack Trevally Null
Batfish 8 0 1 0 0 0 0 1 0 0 0
Bullseye 1 4 0 0 0 1 0 3 1 0 0
Old Wife 3 0 7 0 0 0 0 0 0 0 0
Woodwards Pomfret 1 0 0 6 0 0 1 0 2 0 0
Samson Fish 0 0 0 0 9 0 0 0 1 0 0
BlueFin Tuna 0 0 0 0 1 9 0 0 0 0 0
Silver Drummer 0 0 0 0 0 0 8 0 2 0 0
Swallowtail 1 0 1 0 0 1 0 5 2 0 0
Tarwhine 0 1 0 0 0 2 0 0 7 0 0
Skipjack Trevally 2 1 0 0 1 3 0 0 3 0 0

Correct classifications = 63 -> 63% Incorrect classifications = 37 ->37%
Confusion matrix for the best back-prop network trained with the sponge data set.
Configuration: 241-15-6.

  Orange Orange Red Orange Pink Blue Grey Brown Haliclona Green Haliclona Null
Orange 11 1 0 1 0 2 0
Orange Red 2 11 0 1 0 1 0
Orange Pink 0 1 13 1 0 0 0
Blue Grey 0 0 2 13 0 0 0
Brown Haliclona 0 1 0 0 13 1 0
Green Haliclona 4 2 2 1 4 2 0

Correct classifications = 63 -> 70% Incorrect classifications = 27 -> 30%
Confusion matrix for the best Kohonen network, trained with the sponge data set.
Configuration: 241 -> 3x3 output network.

  Orange Orange Red Orange Pink Blue Grey Brown Haliclona Green Haliclona Null
Orange 2 3 0 1 3 1 0
Orange Red 0 6 1 1 2 0 0
Orange Pink 0 3 3 3 1 0 0
Blue Grey 1 2 0 3 3 1 0
Brown Haliclona 0 1 0 1 9 0 0
Green Haliclona 1 0 0 0 4 9 0

Correct classifications = 63 -> 70%; Incorrect classifications = 27 -> 30%
Confusion matrix for the best ART network, trained with the sponge data set.
Configuration: 241 -> max 15 output classes, Vigilance = 0.95.

  Orange Orange Red Orange Pink Blue Grey Brown Haliclona Green Haliclona Null
Orange 3 3 0 5 4 0 0
Orange Red 0 3 0 6 6 0 0
Orange Pink 0 0 0 7 8 0 0
Blue Grey 0 0 0 13 2 0 0
Brown Haliclona 0 0 0 0 13 0 2
Green Haliclona 2 0 0 6 5 0 2

Correct classifications = 32 -> 35.5% Incorrect classifications = 58 -> 64.4%
Confusion matrix for the best SVM network, trained with the sponge data set.
Configuration: Linear Kernel C= 0.1.

  Orange Orange Red Orange Pink Blue Grey Brown Haliclona Green Haliclona Null
Orange 9 2 2 2 0 0 0
Orange Red 0 13 0 2 0 0 0
Orange Pink 0 1 8 3 3 0 0
Blue Grey 1 0 3 10 1 0 0
Brown Haliclona 0 0 0 0 15 0 0
Green Haliclona 4 2 1 2 6 0 0

Correct classifications = 55 -> 61.1% Incorrect classifications = 35 -> 38.9%

Back to top of page