Implementing a Neural Network with Backpropagation in Python to Perform an XOR Gate

Cover: Diagram comparing the
CNOT logic gate of a quantum computer (on the left)
and a classical computer’s w:XOR gate (on the right).
By George.ad.Stamatiou and Alksentrs.
April 1, 2009. Source: Wikimedia Commons.

Maurício Pinheiro

1. Introduction to Neural Networks and Backpropagation

Neural networks are a type of machine learning algorithm [1] inspired by the structure and function of the human brain. They are capable of learning complex patterns and relationships in data, making them useful for tasks such as image classification, natural language processing, and predictive modeling.

At a high level, a neural network consists of interconnected nodes, or neurons, organized into layers. Each neuron receives input from other neurons or external data, processes that input using a mathematical function, and passes the output to other neurons in the next layer. The output of the final layer represents the prediction or output of the network.

Training a neural network involves adjusting the weights and biases of the neurons so that the network can make accurate predictions on new data. Backpropagation [2] is a widely used algorithm for optimizing the weights and biases of a neural network. It works by propagating the error, or difference between the predicted output and the actual output, back through the network and using that information to update the weights and biases.

During backpropagation, the output of the final layer is compared to the true output, and the error is calculated. The error is then propagated backward through the network, with each neuron in each layer contributing to the error based on its weighted connections to the neurons in the next layer. This allows the algorithm to determine how much each weight and bias contributed to the error, so they can be adjusted to improve the accuracy of the network’s predictions.

Backpropagation is an iterative algorithm, which means it is run multiple times on a dataset to continually improve the weights and biases of the network. By minimizing the error between the predicted output and the true output, the network becomes better at making predictions on new data.

2. Defining the Problem: Implementing an XOR gate with a Neural Network

The XOR gate is a logical operation that takes two binary inputs and returns a binary output. The output is true (1) if the inputs are different, and false (0) if they are the same.

The truth table for the XOR gate is as follows:

Input 1Input 2Output
000
011
101
110
XOR truth table

The XOR operation is an example of a problem that cannot be solved using a linear model, such as a simple regression or classification algorithm. Instead, a nonlinear model is required to accurately capture the relationships between the inputs and the outputs. A neural network is a type of nonlinear model that can be used to solve the XOR problem.

To learn to perform the XOR operation, a neural network must be trained on a dataset of input-output pairs. For the XOR gate, the input is a pair of binary values, and the output is a single binary value representing the result of the XOR operation.

The key to the neural network’s ability to learn the XOR operation is its ability to model complex nonlinear relationships between the inputs and outputs. By using multiple layers of neurons, each with their own nonlinear activation functions, the network can learn to represent complex functions and decision boundaries.

The backpropagation algorithm then allows the network to adjust its weights and biases to minimize the error between its predicted output and the true output, ultimately learning to accurately perform the XOR operation.

3. Implementing the Neural Network in Python

Now that we have defined the problem of implementing an XOR gate with a neural network, we can move on to the implementation. In this section, we will explain the code provided and how it trains the neural network to perform the XOR operation. We will also discuss the parameters used in the code and how they affect the performance of the neural network.

For that we will implement a neural network consisting of three layers: an input layer and, hidden layer and the output layer. The input layer has two neurons, which correspond to the two input values for the XOR gate. The hidden layer has three neurons, which were chosen arbitrarily. The output layer has one neuron, which corresponds to the output value of the XOR gate. Therefore, the network has a total of six neurons, including the input and output layers. The weights and biases for each layer are learned during the training process, which uses backpropagation to update the weights and biases.

A three layer Perceptron net capable of calculating XOR. original PD-version (from ru-Wikipedia) by Сергей Яковлев, vector version by Alex Krainov. January 20, 2009
Source: Wikimedia Commons.

The code uses the Python programming language and the NumPy library for numerical computations. The neural network is implemented using a feedforward architecture, with one hidden layer and one output layer. The activation function used for all neurons is the sigmoid function, which squashes the neuron’s output between 0 and 1.

import numpy as np

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def train(X, y, n_hidden, n_epochs, lr):
    n_input = X.shape[1]
    n_output = y.shape[1]

    # initialize weights and biases
    w1 = np.random.randn(n_input, n_hidden)
    b1 = np.zeros((1, n_hidden))
    w2 = np.random.randn(n_hidden, n_output)
    b2 = np.zeros((1, n_output))

    # train the model for n_epochs
    cost_values = []
    for i in range(n_epochs):
        # forward pass
        z1 = np.dot(X, w1) + b1
        a1 = sigmoid(z1)
        z2 = np.dot(a1, w2) + b2
        y_pred = sigmoid(z2)

        # compute cost
        cost = np.mean(-y * np.log(y_pred) - (1 - y) * np.log(1 - y_pred))
        cost_values.append(cost)

        # backward pass
        dz2 = y_pred - y
        dw2 = np.dot(a1.T, dz2)
        db2 = np.sum(dz2, axis=0, keepdims=True)
        dz1 = np.dot(dz2, w2.T) * a1 * (1 - a1)
        dw1 = np.dot(X.T, dz1)
        db1 = np.sum(dz1, axis=0)

        # update weights and biases
        w1 -= lr * dw1
        b1 -= lr * db1
        w2 -= lr * dw2
        b2 -= lr * db2

    return w1, w2, b1, b2, cost_values

# train the model
X = np.array([[0,0], [0,1], [1,0], [1,1]])
y = np.array([[0], [1], [1], [0]])
w1, w2, b1, b2, cost_values = train(X, y, n_hidden=3, n_epochs=1000, lr=0.1)

# print final weights and biases
# print final weights and biases
print("Final weights and biases:")
print(f"w1 = {w1}")
print(f"b1 = {b1}")
print(f"w2 = {w2}")
print(f"b2 = {b2}")

The code consists of a single function, train(), which takes five arguments: X, y, n_hidden, n_epochs, and lr. X is the input matrix, y is the output matrix, n_hidden is the number of neurons in the hidden layer, n_epochs is the number of training epochs, and lr is the learning rate, which controls the step size of the weight updates during training.

The first step of the train() function is to initialize the weights and biases of the neural network. This is done using the np.random.randn() function to generate random values for the weights, and the np.zeros() function to initialize the biases to zero.

Next, the function enters a loop that runs for n_epochs iterations. In each iteration, the neural network performs a forward pass, followed by a backward pass, and updates its weights and biases using the backpropagation algorithm.

During the forward pass, the input X is multiplied by the weight matrix w1, and the bias vector b1 is added.

The resulting values are then passed through the sigmoid activation function to produce the output of the hidden layer, a1.

The output of the hidden layer is then multiplied by the weight matrix w2, and the bias vector b2 is added. The resulting values are passed through the sigmoid activation function to produce the final output of the neural network, y_pred.

The cost of the neural network’s predictions is then calculated using the binary cross-entropy loss function.

This cost is a measure of how well the network is performing on the training data, and is used to update the weights and biases during the backward pass.

During the backward pass, the gradients of the cost with respect to the weights and biases are computed using the chain rule of calculus. These gradients are then used to update the weights and biases using the update rule: w -= lr * dw, where w is a weight or bias, lr is the learning rate, and dw is the gradient of the cost with respect to w.

Finally, the function returns the learned weights and biases, as well as an array of the cost values over the training epochs.

In the main section of the code, the train() function is called with the XOR input and output matrices, X and y, as well as the values of n_hidden=3, n_epochs=1000, and lr=0.1. These parameters were chosen empirically, and can be adjusted to improve the performance of the neural network.

The n_hidden parameter controls the number of neurons in the hidden layer, and therefore the complexity of the neural network. A larger value of n_hidden may allow the network to better model the XOR operation, but may also lead to overfitting and slower training times.

One example of output (run with Jupyterlite) is:

Final weights and biases:
w1 = [[-3.09192944  3.84673602  0.86697588]
      [ 5.43416261  5.64932615 -2.87473456]]
b1 = [[ 2.09802913 -0.99750375 -1.18703525]]
w2 = [[-4.42330887][ 5.18063322][ 0.44384871]]
b2 = [[-0.76509979]]

The output shows the final values of the weights and biases learned by the neural network during training. The weights and biases are the parameters that are optimized by the backpropagation algorithm to minimize the cost function and make accurate predictions on the training data.

In this specific case, the neural network has one hidden layer with three neurons and an output layer with one neuron. The weights are represented as matrices, where each element represents the weight of the connection between two neurons. For example, the element w1[0,1] represents the weight of the connection between the first neuron in the input layer and the second neuron in the hidden layer.

The biases are represented as vectors, where each element represents the bias of a neuron. For example, the element b1[0,1] represents the bias of the second neuron in the hidden layer.

The values of the weights and biases determine how the neural network processes the inputs and makes predictions. In this case, the neural network has learned to perform the XOR operation accurately, as demonstrated by the predictions on the test input [1, 0].

4. Testing the Neural Network:

Now that we have trained the neural network to perform the XOR operation, we can use it to make predictions on new inputs. In our case, the neural network should be able to predict the output of the XOR gate for any input combination of 0s and 1s.

To use the trained neural network to make predictions, we simply need to pass the input through the network’s forward pass, which will apply the weights and biases learned during training to compute the output.

We can test our neural network by providing it with the input [1, 0] and checking if it produces the expected output of 1. We can use the following code to make this prediction:

x_test = np.array([[1, 0]])
z1_test = np.dot(x_test, w1) + b1
a1_test = sigmoid(z1_test)
z2_test = np.dot(a1_test, w2) + b2
y_pred_test = sigmoid(z2_test)
print(f"Prediction for input {x_test}: {y_pred_test}")

This code will apply the learned weights and biases to the input [1, 0] and compute the predicted output of the neural network. The output should be close to 1, indicating that the neural network has successfully learned to perform the XOR operation. With one run we got:

Prediction for input [[1 0]]: [[0.9578651]]

We can also test the neural network on all possible input combinations of 0s and 1s, and compare its predictions with the expected outputs of the XOR gate. We can use the following code to do this:

inputs = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
expected_outputs = np.array([[0], [1], [1], [0]])

for i in range(len(inputs)):
    x = inputs[i]
    y_pred = predict(x, w1, w2, b1, b2)
    print(f"Input: {x}, Expected Output: {expected_outputs[i]}, Predicted Output: {y_pred}")

This code will loop through all possible input combinations and print the predicted output of the neural network along with the expected output from the XOR gate. If the neural network has learned to perform the XOR operation correctly, its predictions should match the expected outputs for each input combination.

Input: [0 0], Expected Output: [0], Predicted Output: [[0.01065188]]
Input: [0 1], Expected Output: [1], Predicted Output: [[0.98866357]]
Input: [1 0], Expected Output: [1], Predicted Output: [[0.99169754]]
Input: [1 1], Expected Output: [0], Predicted Output: [[0.01046418]]

Overall, by training the neural network with backpropagation, we have successfully implemented a solution to the XOR problem using a neural network in Python. With this foundation, we can begin to explore more complex neural network architectures and applications.

5. Conclusion and Further Applications

In this article, we have explored the implementation of a neural network in Python to perform the XOR operation using backpropagation. We started with a brief introduction to neural networks and backpropagation, followed by a discussion on how a neural network can be trained to perform the XOR operation. We then presented the Python code for implementing the neural network and discussed the role of various parameters used in the code.

To test the performance of our neural network, we used the test inputs of the XOR gate and presented the results of the neural network’s predictions.

In conclusion, we have demonstrated that a neural network can be trained to perform a complex logical operation like XOR using backpropagation. The techniques used in this article can be applied to other problems where the relationship between inputs and outputs is not immediately apparent. Neural networks have become an important tool for solving a wide range of problems in areas such as image recognition, natural language processing, and finance.

For those interested in further exploration and learning, we suggest experimenting with different parameters in the code to see how they affect the performance of the neural network. Additionally, one can try using different activation functions, different architectures, or even building a deep neural network to improve the performance of the model. Finally, one can explore more advanced techniques like convolutional neural networks, recurrent neural networks, or generative adversarial networks for more complex problems.

6. References

[1] Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. Deep learning. MIT press, 2016.

[2] Rumelhart, David E., Geoffrey E. Hinton, and Ronald J. Williams. “Learning representations by back-propagating errors.” nature 323.6088 (1986): 533-536.

#AI #NeuralNetworks #Backpropagation #PythonProgramming #XORGate #MachineLearning #DeepLearning #ArtificialIntelligence #NeuralComputing #ProgrammingTutorial #CodingTips #OpenSource



Copyright 2024 AI-Talks.org

Leave a Reply

Your email address will not be published. Required fields are marked *