1.
INTRODUCTION
1.1
The Problem Considered
There are many computational systems capable of
learning to perform a task. Most of
these systems have a fixed structure, and learn by altering various parameters until
an internal representation of the problem is built up. However, there is no systematic method for
learning the structure of such systems.
The process of manually evaluating a task to
determine which structures might be suitable is a time-consuming and complex
one, and often requires a very high level of domain knowledge. This project is concerned with automatically
generating the structure, without requiring an in-depth knowledge of the task.
Three main systems will combine to produce an overall system:
1.
Artificial
neural networks will be used as “experts” capable of learning a simple task
given suitable input-output pairs.
2.
A
modular network (sometimes referred to in literature as a “mixture of experts”)
will govern a set of these experts and learn which are most suited to
particular regions of input space. This
has the effect of decomposing the task into a set of sub-tasks, each performed
by a different expert.
3.
Finally,
a genetic algorithm will take a population of experts and “breed” them to
combine the best properties of existing experts to produce new experts more
capable of performing the task.
The combination of a modular network, which produces
a decision as to which experts are most suited to a sub-task, and a genetic
algorithm, which makes beneficial structural changes to the experts, should
allow expert structures to be built.
Its worth noting that a modular network is a type of
neural network, and modular networks can themselves be used as experts to
produce a hierarchical structure [1].
1.2
Practical Relevance
Neural networks have been shown to be very
successful learning models. The range
of tasks they can learn to perform is vast, and their ability to generalise
makes them particularly impressive.
Whilst many of the applications they have been applied to are fairly
abstract, they have also been used to perform many popular tasks such as object
and speech recognition. But probably
the most fundamental use is research into the human brain, on which neural
networks are modelled.
The manual design of neural networks a laborious and
complex process. If the design can be
achieved quickly by an automated process, then the time saved can be put to
better use evaluating the structures that have emerged. Also, the solution to tasks that cannot
currently be performed may become possible.
The evaluation of the structures generated would in turn lead to a
greater understanding of those tasks.
1.3
Previous Work
1.3.1 Neural Networks
Neural networks are very common in the field of artificial
intelligence. A range of variations of
both the structure and training methods has been devised, but the general
principles remain unchanged. Material
on neural networks can be found in a great deal of artificial intelligence
literature, such as [7] or [8].
Modular networks are a more recent development of
neural networks, see Jacobs, Jordan, Nowlan, & Hinton (1991) [3]. Since this is a relatively new idea only a
few variations have been developed. The
most substantial of these are the use of a localised gating for the modular
network, Ramamurti & Ghosh [9], and the EM training algorithm by Jacobs,
and Jordan (1993) [4].
The neural networks I will be using are standard,
and have neither a structure nor use training techniques other than those
originally developed.
1.3.2 Genetic Algorithms
Genetic algorithms are also very common. The basic algorithm is the same for most of
these systems [7]. A genetic algorithm
uses a number of other algorithms. An
example is the mutation algorithm that is responsible for making small random
changes to the population. Many
implementations of these algorithms exist.
There is a difference between existing genetic
algorithms and the algorithm used in this project. While most genetic algorithms encode the population members into
bit-strings before operating on them, my algorithm works directly on the expert
networks, without encoding them first.
1.3.3 Structure Generation
There exist systems that can build up the structure
of neural networks. Two main techniques
are used - both rely on some form of genetic algorithm. The first method applies only mutation. This will alter the structure of the
networks, such as adding or removing nodes, to attempt to produce a better
network. The mutations that are applied
can either be randomly chosen, or are determined by a method to predict which
will produce the best network [6]. The
second method uses a genetic algorithm that implements some form of crossover
as well as mutation. This both combines
the structural properties of existing networks in the population, and applies
mutation, to create new networks. The
mutation is nearly always chosen using a random process.
There also exist systems to build the structure of
modular networks. The techniques alter
the number of experts in the modular network.
There are two ways this can happen.
Either, a small initial modular network is created and then grown by
adding experts. Or, a large initial
network is created and shrunk, by removing experts. The network size will continue to be altered until an optimal
size is found [9], [10].
I know of no techniques that use genetic algorithms
to breed the experts for a modular network.
1.4
Objectives
There are four stages to this project. The three systems described above (1.1) must
be coded. These systems must then be
combined to make an overall system capable of generating the structure of
experts, and hence the structure of the modular network. There is a specific order to these stages:
1.
The
code to instantiate and train a neural (expert) network must be written. Once this is completed, they can be used as
experts in a modular network.
2.
It
seems a natural progression to write the code to instantiate and train a
modular network next.
3.
The
genetic algorithm can be coded.
4.
Finally,
the three previous sub-systems must be combined to produce an overall system
capable of automatically generating the structure. I will refer to this system as the “modular breeder”.
The bulk of the work will be in the first three
stages. The fourth stage will call
methods defined on the previous three sub-systems, to produce the overall
system. Most of the computational work
will be done by the three sub-systems.
1.5
Working Environment
I primarily
used machines running Linux, with the Java Development Kit (jdk). A number of versions of jdk are available on
the university machines. The most
recent is version 2.0. My code is known
to be compatible with versions 2.0, 1.1.7, and 1.1.8. However, the graphical interface requires Swing version 1.1,
rather than version 1.0 which is provided on the university computers. Downloading the required classes from Sun’s
website and appending the location of the classes to my classpath easily solved
this.
My program can write information to files, such as
the error of the system as it is trained on a given task. The information is written in a format that
is compatible with MATLAB. This allows
MATLAB to be used to read the files and generate graphs.
1.6
Outline of Remaining Report
The remainder of the report is organised as
follows. Section 2 presents the theory
behind the three sub-systems, and describes their combination into the complete
system. Section 3 illustrates the
design and implementation of the system.
This includes the requirements of the system, the choice of language,
and the structure of the program.
Section 4 introduces the experiments that were carried out to test the
system, why they were chosen, and presents various results that demonstrate the
system’s current performance. Finally,
section 5 gives conclusions of the project, including the achievements,
possible improvements and extensions.