Hello! Let's get right into it. By the end of this guide, you will have created and trained your first neural in Shkyera Grad!
Setup
This is easy, Shkyera Grad is a header-only library, so simply clone the repositoryu into your project:
git clone https://github.com/fszewczyk/shkyera-grad.git
and import the main file of the library inside your own project.
#include "shkyera-grad/include/ShkyeraGrad.hpp"
Now, you can use all the features of this small engine.
- Note
- Shkyera Grad is tested in C++17. Make sure your compiler supports this version.
Scalars
Internally, Shkyera Grad always operates on individual scalars. For most purposes, you do not need to deal with them directly, but it's nice to understand how they work. Each scalar is wrapped inside a Value
class. However, you should never instantiate objects of this type yourself. Instead, you should use the provided interface in the following way.
ValuePtr<float> a = Value<float>::create(5.2);
ValuePtr<Type::f32> a = Value<Type::f32>::create(5.2);
ValuePtr<Type::float32> a = Value<Type::float32>::create(5.2);
auto a = Value<float>::create(5.2);
auto a = Value<Type::float32>::create(5.2);
auto a = Value<Type::float64>::create(5.2);
auto a = Val32::create(5.2);
ValuePtr<Type::float64> b = Value<double>::create(6.9);
auto b = Value<Type::f64>::create(6.9);
auto b = Val64::create(6.9);
auto c = Value<int>::create(7);
You can also perform various operations directly on scalars!
using T = Type::float32;
auto a = Value<T>::create(2.1);
auto b = Value<T>::create(3.7);
auto c = a - b;
auto d = a * b / c;
c = d->log();
auto e = (a + b - c)->pow(d);
- Note
- Check out the cheatsheet for the list of all operations.
The magic behind the Shkyera Grad is that it keeps track of all the operations, so that you can later calculate the derivatives of your expression.
auto a = Value<T>::create(2.0);
auto b = Value<T>::create(3.0);
auto c = a * b;
c->getValue();
c->backward();
a->getGradient();
b->getGradient();
If you want some refreshment on derivatives, check out this wonderful video.
Vector
Multiple scalars can be grouped together in a Vector
to simplify operating on them. Input to any Module
(more on them later) is a Vector
. This abstraction provides some functionality that allows you to compute, for example a dot product.
auto a = Vector<T>::of(1, 2, 3);
auto b = Vector<T>::of({1, 2, 3});
auto c = Vector<T>(Value<T>::create(2), Value<T>::create(3), Value<T>::create(4));
auto d = Vector<T>::of({a[0]*b[0], a[1]*b[1], a[2]*b[2]});
for(auto &entry : d)
std::cout << entry << std::endl;
auto e = b.dot(c)
e->backward();
Vectors
are very useful since this is the way both the input and output data is represented. Each sample consits of an input Vector
and a target output Vector
.
Sequential
Nice! You got the basics! Let's build a network. The best way to create a model is through the use of the Sequential
interface. Each function that transforms an input Vector
into some output Vector
is implemented as a Module
. This includes neural layers as well as activation functions. Hey, even Sequential
is a Module
. This allows for creating complex strctures while using a common, simple interface.
You can create your first neural network using SequentialBuilder
in the following way.
auto network = SequentialBuilder<T>::begin()
.add(Linear<T>::create(2, 15))
.add(ReLU<T>::create())
.add(Linear32::create(15, 10))
.add(Sigmoid32::create())
.add(Dropout32::create(10, 2, 0.5))
.build();
- Warning
- Remember that subsequent layers have to have matching input and output sizes.
- Note
- For the full list of available layers and activation functions, check out the Cheat Sheet.
Training
To train our network, we need to define an Optimizer
that will optimizer the parameters as well as the Loss
function that we will minimize. Shkyera Grad comes with a set of well-known optimizers and loss functions. Again, check out the Cheat Sheet for a complete list.
auto optimizer = Optimizer<T>(network->parameters(), 0.01);
auto betterOptimizer = SGD32(network->parameters(), 0.01, 0.99);
auto awesomeOptimizer = Adam32(network->parameters(), 0.01);
auto awesomeCustomOptimizer = Adam32(network->parameters(), 0.01, beta1, beta2, epsilon);
Here's a list of some available Loss
functions:
Loss::MAE<T>
Loss::MSE<T>
Loss::CrossEntropy<T>
They are implemented as lambda functions, not as objects, so they do not need to be instantiated.
Learning XOR
XOR (Exclusive OR) is a simple Boolean function that maps two values two one:
X1 | X2 | Result |
0 | 0 | 0 |
0 | 1 | 1 |
1 | 0 | 1 |
1 | 1 | 0 |
Let's define our dataset.
Here, we basically pase the table above into Vector
s.
std::vector<Vec32> xs;
std::vector<Vec32> ys;
xs.push_back(Vec32::of(0, 0)); ys.push_back(Vec32::of(0));
xs.push_back(Vec32::of(1, 0)); ys.push_back(Vec32::of(1));
xs.push_back(Vec32::of(0, 1)); ys.push_back(Vec32::of(1));
xs.push_back(Vec32::of(1, 1)); ys.push_back(Vec32::of(0));
Neural Network
We define a simple neural network to predict this function. Our network has a total of three layers. It is a bit of an overkill for this task, but we will use it for learning purposes.
auto network = SequentialBuilder<Type::float32>::begin()
.add(Linear32::create(2, 15))
.add(ReLU32::create())
.add(Linear32::create(15, 5))
.add(ReLU32::create())
.add(Linear32::create(5, 1))
.add(Sigmoid32::create())
.build();
Training Loop
Now, we just need to specify the optimizer and the loss function we want to use:
auto optimizer = Adam32(network->parameters(), 0.05);
auto lossFunction = Loss::MSE<T>;
We train our model for 100 epochs. After each epoch, we pring the average loss.
for (size_t epoch = 0; epoch < 100; epoch++) {
auto epochLoss = Val32::create(0);
optimizer.reset();
for (size_t sample = 0; sample < xs.size(); ++sample) {
Vec32 pred = network->forward(xs[sample]);
auto loss = lossFunction(pred, ys[sample]);
epochLoss = epochLoss + loss;
}
optimizer.step();
auto averageLoss = epochLoss / Val32::create(xs.size());
std::cout << "Epoch: " << epoch + 1 << " Loss: " << averageLoss->getValue() << std::endl;
}
Verifying the results
After the training, let's inspect how our network behaves.
for (size_t sample = 0; sample < xs.size(); ++sample) {
Vec32 pred = network->forward(xs[sample]);
std::cout << xs[sample] << " -> " << pred << "\t| True: " << ys[sample] << std::endl;
}
In case you got lost along the way, check out the examples/xor.cpp
file. It contains the exact same code and is ready to run :)
Results
Nice! After compiling and running this code (make sure to use C++17), you should see something like this:
Epoch: 1 Loss: 0.263062
Epoch: 2 Loss: 0.211502
(...)
Epoch: 99 Loss: 0.000222057
Epoch: 100 Loss: 0.00020191
Vector(size=2, data={Value(data=0) Value(data=0) }) -> Value(data=0.0191568) | True: Value(data=0)
Vector(size=2, data={Value(data=1) Value(data=0) }) -> Value(data=0.99998) | True: Value(data=1)
Vector(size=2, data={Value(data=0) Value(data=1) }) -> Value(data=0.999984) | True: Value(data=1)
Vector(size=2, data={Value(data=1) Value(data=1) }) -> Value(data=0.0191568) | True: Value(data=0)
WOW! The network actually learned the XOR function.
This is it. You should have enough knowledge to start experimenting with Shkyera Engine. Let us know on GitHub what do you think about this project :)