In an attempt to demystify artificial neural
networks (ANNs) and to set the
tone of this paper, it is best to start by
saying that ANNs are mathematical functions
of the form y = f(x, w), where y is
the output or response that is determined
from a given input x by a series of calculations
through a network of connected
processing elements or nodes. This article
demonstrates the feasibility of utilizing a
Microsoft® Excel™ workbook for defining,
training, and subsequently using ANNs as
a prediction tool. The example given here
involves the prediction of nitrogen oxide
(NOx) emissions from boilers. It is very
important to be able to predict the emissions
of pollutants such as CO and NOx
in boilers in order to determine the effectiveness
of the combustion process and to
evaluate the environmental contribution
of gas emissions. This requires measured
data such as effluent temperature and gas
flow rate.
The nodes or neurons are arranged in
layers and are connected by unidirectional
communication arcs, which carry
numerical data. The input nodes represent
a set of variables, x, and the output
nodes correspond to the predicted output
set of y variables. These two sets of
nodes form the input and output layers,
respectively. The information between
layers is processed through various intermediate
layers, the so-called hidden
layers. The parameters that modify the
information transferred from one layer
to the next, w, represent the strength of
the connection between the nodes, and
whose numeric values give the network
its predictive power.
Figure 1 - Schematics of a three-layer neural network model.
The neural network in Figure 1 contains
three layers. The input layer has two nodes,
I1 and I2, one for T (temperature) and one
for F (flow), respectively. The hidden layer
has three nodes, H1, H2, and H3; the output
layer has only one node, O1, which
corresponds to the calculated concentration
of NOx.
Calculations in the ANN start
when values are applied to the
input nodes. These inputs are then
multiplied by the weights wk,i connected
to all nodes in the hidden
layer. The products are summed
up for each node and a bias is subtracted
(sum = Σxkwk,i – bk). xThis
sum is passed along to an activation
function to produce the output
of the node; the sigmoidal
function is the most commonly
used activation function, f(sum) =
1/(1+exp(-sum)). The resulting values
are then used as the inputs to
the output layer. Each neuron in
the output layer receives weighted
inputs minus bias from each neuron in the
hidden layer to determine the activation
function of the output layer, which corresponds
to the neural network output.
The activation function serves to model
nonlinear behaviors. The input nodes act
as distribution nodes, with no calculations
associated; the inputs are just transferred
to the nodes in the hidden layer.
For example, in the neural network model
of Figure 1, operations are performed in the
hidden nodes that receive the two inputs
(input #1 has a value of 0.4 and input #2 a
value of 0.6). Each input has a corresponding
weight that connects to the hidden
nodes (weight #1 connects input #1 and
has a value of –1.26, etc.). Three values
of f(sum) will be calculated, one for each
node in the hidden layer 0.341, 0.979, and
0.947, respectively. Then, the weighted
sum of inputs and calculation of the activation
function for the output node are
applied, and a value of 0.609 is obtained
in the network output.
This particular method of performing the
calculations in the ANN is associated with a
structural design named feed forward. If the
actual value of the target is equal to 0.62, the
error between the target value and the output
would be equal to –0.011 (i.e., 0.62 – 0.609 = –0.011). All errors for all the observations in
the dataset are calculated for these particular
sets of weights and bias values.
When a neural network is trained, the
weights and bias terms of neurons are
adjusted individually. Training a neural
network is an iterative process; it uses a
nonlinear optimization algorithm to obtain
the optimal values of the weights. The data
must be divided into two subsets: a training
dataset and a test dataset. An error back-propagation
is the most widely used algorithm,
and it is known as a learning algorithm.
A full mathematical explanation of
this algorithm can be found elsewhere.1–3
Initially, the connection weights and bias
are set to random values. The weights are
then adjusted until a training error criterion
is minimized, that is, until learning is
successful. The most common convergence
criterion is the mean squared error (MSE),
which is the squared difference between
the actual output and the predicted output,
divided by the number of observations
(patterns) of the training dataset.
Neural networks were studied as early as
the 1940s by McCulloh and Pitts.4 However,
they did not become popular until
around 1985, when the method of back-propagation
for training neural networks
was introduced by Rumelhart et al.5 Neural models are now enjoying a resurgence, and
there is a substantial amount of research
in the area of neural networks, because of
their ability to represent nonlinear relationships,
which is useful in making function
approximation, forecasting, and recognizing
patterns.6,7
Use of Excel to train a neural network
The objective of this study was to develop
a simple, easily understood methodology
for training neural networks using an Excel
workbook. Data from a boiler stack were
used to build and train a neural network
that predicts the concentration of NOx
emissions. A combustion analyzer was used
to measure NOx in the flue gas.
The Excel workbook that predicts the
NOx emissions consists of three modules
(DATA, TRAINER, and QUERY). The
DATA worksheet is an interface for users
to keyboard the data and normalize them.
The TRAINER worksheet is an interactive
interface and work area. The user
can interact with the spreadsheet by setting
and adjusting parameters needed for
training the neural network. The QUERY
worksheet is designed for query and computing
results.
Configuring the DATA worksheet
The data must be preprocessed in order for
the ANN to effectively learn from them.
All values that appear to be scattered
far away from the majority of values are
considered outliers and must be excluded
from the dataset. The numeric data of all
the variables are linearly scaled between
0 and 1, considering the minimum and
maximum values. This scaling of variable
values is made in order to avoid using
data spanning different orders of magnitude.
Following the common practice,
the dataset to train the network was made
by randomly selecting about 70% of the
database. The remaining 30% of data
were then used to check the generalization
capability of the model.
Microsoft Excel makes it possible to
access more than one worksheet. Each
worksheet is represented in the bottom
section of the interface with a tab. The
left-most tab is marked Sheet1. The
second from the left is marked Sheet2.
The last is marked Sheet3. To rename a worksheet, the user double-clicks its
sheet tab and then types a new name.
Once this is done, the user changes
the tabs Sheet1, Sheet2, and Sheet3
to the new names DATA, TRAINER,
and QUERY, respectively. To move
between the different worksheets, the
corresponding tabs are clicked.
Figure 2 - DATA worksheet.
The steps below are followed:
- Enter the number for each record
into column A, type 1 into cell
A4, and the equation =A4+1 into
cell A5. Then highlight cell A5;
the cursor box will enclose the
cell, and a small black square will
be seen at the lower right corner
called the fill handle. Point to the
square (the pointer should change
to a thin cross), press and hold the
left mouse button, drag the pointer
downward to cell A33, and release.
This yields a column of serial numbers
by copying the equation from
cell A5 down to and including A33.
See Figure 2.
- Enter the headings for columns B, C,
and D into rows 2 and 3.
- Enter the values for the flow, temperature,
and NOx into columns B, C, and
D, respectively. These will correspond
with the actual values for input nodes
1 and 2 and the output node of the
network.
- Enter the function to determine the
largest number for each input and output
node, starting in cell B36 to cell
D36, using the MAX function. To do
so, position on cell B36 and click on
the Insert Function button to the left
of the formula bar. Several functions
are available; select MAX and the
functions arguments dialog box will
pop up. With the mouse, select range
B4:B33, click the OK button, then
into cell C36 enter MAX(C4:C33);
into cell D36 enter MAX(D4:D33).
- Enter the minimum values for data
values for each input and output node
as in step 4, using the MIN formula,
into cells B37 to D37.
- The next step is to normalize the data
by subtracting the minimum value
and dividing by the difference of the
maximum minus the minimum. The
normalized values for input #1 of column
B will be in column E. Into cell
E4 type (B4-$B$37)/($B$36-$B$37).
Copy this formula into the rest of column
E; use the fill handle to copy values
into neighbor cells from rows 4 to
33, as in step 1. When this is done, cell
references that do not include $ signs
are updated (cell address B4 in the formula
changes to B5, B6, etc.), whereas
cell references with $ signs (cells with
absolute address) are not; a simple way
to do this is to hit the function key F4
when typing the cell address. This is
important because $B$36 and $B$37
are fixed parameters. Do the same for
input #2 values on column F and the
output values on column G using the
proper address for max. and min. Save
the workbook.
This ends the easy part of laying out the
DATA worksheet. The TRAINER worksheet
is prepared as shown below.