1. INTRODUCTION

 

 

1.1             The Problem Considered

 

There are many computational systems capable of learning to perform a task.  Most of these systems have a fixed structure, and learn by altering various parameters until an internal representation of the problem is built up.  However, there is no systematic method for learning the structure of such systems. 

 

The process of manually evaluating a task to determine which structures might be suitable is a time-consuming and complex one, and often requires a very high level of domain knowledge.  This project is concerned with automatically generating the structure, without requiring an in-depth knowledge of the task. 

 

Three main systems will combine to produce an overall system:

 

1.      Artificial neural networks will be used as “experts” capable of learning a simple task given suitable input-output pairs.

2.      A modular network (sometimes referred to in literature as a “mixture of experts”) will govern a set of these experts and learn which are most suited to particular regions of input space.  This has the effect of decomposing the task into a set of sub-tasks, each performed by a different expert. 

3.      Finally, a genetic algorithm will take a population of experts and “breed” them to combine the best properties of existing experts to produce new experts more capable of performing the task.

 

The combination of a modular network, which produces a decision as to which experts are most suited to a sub-task, and a genetic algorithm, which makes beneficial structural changes to the experts, should allow expert structures to be built.

 

Its worth noting that a modular network is a type of neural network, and modular networks can themselves be used as experts to produce a hierarchical structure [1].

 

 

1.2             Practical Relevance

 

Neural networks have been shown to be very successful learning models.  The range of tasks they can learn to perform is vast, and their ability to generalise makes them particularly impressive.  Whilst many of the applications they have been applied to are fairly abstract, they have also been used to perform many popular tasks such as object and speech recognition.  But probably the most fundamental use is research into the human brain, on which neural networks are modelled.

 

The manual design of neural networks a laborious and complex process.  If the design can be achieved quickly by an automated process, then the time saved can be put to better use evaluating the structures that have emerged.  Also, the solution to tasks that cannot currently be performed may become possible.  The evaluation of the structures generated would in turn lead to a greater understanding of those tasks.

 

 

1.3             Previous Work

 

1.3.1 Neural Networks

 

Neural networks are very common in the field of artificial intelligence.  A range of variations of both the structure and training methods has been devised, but the general principles remain unchanged.  Material on neural networks can be found in a great deal of artificial intelligence literature, such as [7] or [8].

 

Modular networks are a more recent development of neural networks, see Jacobs, Jordan, Nowlan, & Hinton (1991) [3].  Since this is a relatively new idea only a few variations have been developed.  The most substantial of these are the use of a localised gating for the modular network, Ramamurti & Ghosh [9], and the EM training algorithm by Jacobs, and Jordan (1993) [4].

 

The neural networks I will be using are standard, and have neither a structure nor use training techniques other than those originally developed.

 

1.3.2 Genetic Algorithms

 

Genetic algorithms are also very common.  The basic algorithm is the same for most of these systems [7].  A genetic algorithm uses a number of other algorithms.  An example is the mutation algorithm that is responsible for making small random changes to the population.  Many implementations of these algorithms exist.

 

There is a difference between existing genetic algorithms and the algorithm used in this project.  While most genetic algorithms encode the population members into bit-strings before operating on them, my algorithm works directly on the expert networks, without encoding them first.

 

1.3.3 Structure Generation

 

There exist systems that can build up the structure of neural networks.  Two main techniques are used - both rely on some form of genetic algorithm.  The first method applies only mutation.  This will alter the structure of the networks, such as adding or removing nodes, to attempt to produce a better network.  The mutations that are applied can either be randomly chosen, or are determined by a method to predict which will produce the best network [6].  The second method uses a genetic algorithm that implements some form of crossover as well as mutation.  This both combines the structural properties of existing networks in the population, and applies mutation, to create new networks.  The mutation is nearly always chosen using a random process.

 

There also exist systems to build the structure of modular networks.  The techniques alter the number of experts in the modular network.  There are two ways this can happen.  Either, a small initial modular network is created and then grown by adding experts.  Or, a large initial network is created and shrunk, by removing experts.  The network size will continue to be altered until an optimal size is found [9], [10].

 

I know of no techniques that use genetic algorithms to breed the experts for a modular network. 

 

 

1.4             Objectives

 

There are four stages to this project.  The three systems described above (1.1) must be coded.  These systems must then be combined to make an overall system capable of generating the structure of experts, and hence the structure of the modular network.  There is a specific order to these stages:

 

1.      The code to instantiate and train a neural (expert) network must be written.  Once this is completed, they can be used as experts in a modular network. 

2.      It seems a natural progression to write the code to instantiate and train a modular    network next. 

3.      The genetic algorithm can be coded.

4.      Finally, the three previous sub-systems must be combined to produce an overall system capable of automatically generating the structure.  I will refer to this system as the “modular breeder”.

 

The bulk of the work will be in the first three stages.  The fourth stage will call methods defined on the previous three sub-systems, to produce the overall system.  Most of the computational work will be done by the three sub-systems.

 

 

1.5             Working Environment

 

 I primarily used machines running Linux, with the Java Development Kit (jdk).  A number of versions of jdk are available on the university machines.  The most recent is version 2.0.  My code is known to be compatible with versions 2.0, 1.1.7, and 1.1.8.  However, the graphical interface requires Swing version 1.1, rather than version 1.0 which is provided on the university computers.  Downloading the required classes from Sun’s website and appending the location of the classes to my classpath easily solved this.

 

My program can write information to files, such as the error of the system as it is trained on a given task.  The information is written in a format that is compatible with MATLAB.  This allows MATLAB to be used to read the files and generate graphs. 

 

 

1.6             Outline of Remaining Report

 

The remainder of the report is organised as follows.  Section 2 presents the theory behind the three sub-systems, and describes their combination into the complete system.  Section 3 illustrates the design and implementation of the system.  This includes the requirements of the system, the choice of language, and the structure of the program.  Section 4 introduces the experiments that were carried out to test the system, why they were chosen, and presents various results that demonstrate the system’s current performance.   Finally, section 5 gives conclusions of the project, including the achievements, possible improvements and extensions.