A neural network is a machine learning model that mimics biological neurons, where data enters from an input layer and flows through multiple nodes with activation thresholds.
Recursive neural networks a type of neural network that is able to internally store memories of previous input data, so they are able to learn the time-dependent structure of the data stream.
Machine learning is used in many products today, for example, intelligent assistants such as siri and Google Now, recommendation engines - those used by Amazon.com to suggest products, and the ad-ranking systems used by Google and Facebook. More recently, some advances in deep learning have brought machine learning into the public eye: the defeat of Go master Lee Sedol by AlphaGo and the emergence of a number of new products such as image recognition and machine translation.
In this section, we'll cover some powerful and commonly used machine learning techniques. This of course includes some deep learning as well as some traditional methods for modern business needs. After reading this series of articles, you will have the necessary knowledge to apply specific machine learning experiments to your field.
Research on AI and deep learning has become more prevalent as the accuracy of deep neural networks has improved, and the use of speech and image recognition technology has captured the public's attention. But how it can be made further influential and more popular remains a question. The main contents of this article are: a brief introduction to feed-forward neural networks and recurrent neural networks, how to build a recurrent neural network for anomaly detection on time series data. To flesh out our discussion, we will demonstrate how to build a neural network with Deeplearning4j.
I. What is a neural network?
Artificial neural network algorithms were originally conceived to mimic biological neurons. But the analogy is pretty flimsy. Every feature of an artificial neural network is a refraction of a biological neuron: each node is connected to an activation threshold, a trigger.
After the connected artificial neuron systems are established, we are able to train these systems so that they learn some patterns in the data, and once learned, they can perform functions such as regression, classification, clustering, prediction, and so on.
Artificial neural networks can be thought of as a collection of computational nodes. Data passes through these nodes into the input layer of the neural network and then through the hidden layer of the neural network until a conclusion or result about the data emerges and the process stops. The output of the neural network is compared with the expected result, and the difference between the result produced by the neural network and the correct result is used to correct the activation threshold of the neural network nodes. As this process is repeated over and over again, the output of the neural network comes infinitely closer to the expected result.
The training process
Before you can build a neural network system, you have to understand the training process and how the network output is produced. However we don't want to delve too y into these equations, so here's a short introduction.
The input nodes of the network receive an array of values (perhaps called a tensor multidimensional array) representing the input data. For example, each pixel in an image can be represented as a scalar and the pixel is then passed to a node. The input data will be multiplied by the parameters of the neural network, and this input data will be enlarged or reduced depending on its significance; in other words, depending on the pixel will not affect the neural network's conclusions about the entire input data.
At first these parameters are random, which means that the neural network is built without any knowledge of the structure of the data. The activation function of each node determines the output of each input node. So whether each node can be activated depends on whether it receives sufficient stimulus strength, i.e. whether the results of the input data and parameters exceed the bounds of the activation threshold.
In the so-called densely or fully connected layers, the output value of each node is passed on to the nodes in the subsequent layers, and after passing through all the hidden layers it finally reaches the output layer, where the input results are produced. In the output layer, the final conclusion obtained by the neural network is compared to the expected conclusion (e.g., do these pixels in the picture represent a cat or a dog?). The neural network's guesses will be compared to the correct result. The computational errors between the neural network's guess and the correct result are incorporated into a test set, and the neural network uses these computational errors to constantly update its parameters to change the importance of different pixels in the picture. The goal of the whole process is to reduce the error between the output and the expected result, and to correctly label whether the image is a dog or not.
Deep learning is a complex process that involves matrix algebra, derivatives, probability, and intensive hardware use because of the large number of matrix coefficients that need to be modified, but users don't need to understand all of this complexity.
However, you should also know some basic parameters that will help you understand neural network functions. These include the activation function, the optimization algorithm, and the objective function (also known as the loss, cost, or error function).
The activation function determines whether and to what extent signals should be sent to connected nodes. The ladder function is the most commonly used activation function, and is a 0 if its input is less than a certain threshold, or a 1 if its input is greater than a threshold. nodes send a 0 or 1 to the connected node via the ladder activation function. the optimization algorithm determines how well the neural network learns, and how accurately the weights can be adjusted after testing for errors. The most common optimization algorithm is stochastic gradient descent. Finally, the cost function is often used as a measure of error to evaluate how well a neural network performs by comparing the results from a given training sample to the expected results.
Open source frameworks such as Keras and Deeplearning4j make creating neural networks easy. The things to consider when creating a neural network structure are how to match your data type to a known problem being solved, and to modify the existing structure to your actual needs.
Three Types of Neural Networks and Applications
Neural networks have been understood and used for decades, but it's only recently that a number of technology trends have made deep neural networks more efficient.
GPUs have made matrix operations faster; distributed computing architectures have allowed for much greater computational power; and the combination of multiple hyperparameters has allowed for faster iterations. All of this makes training much faster and finding the right structure quickly.
With larger datasets come large, high-quality labeled datasets similar to ImageNet. The larger the data on which a machine learning algorithm is trained, then the more accurate it will be.
Finally, as our ability to understand and the neural network algorithms continue to improve, the accuracy of neural networks continues to set new records in areas such as speech recognition, machine translation, and a number of machine perception and goal-oriented tasks.
Despite the fact that neural network architectures are very large, the main kinds of neural networks used are the following.
3.1 Feedforward Neural Networks
Feedforward neural networks consist of an input layer, an output layer, and one or more hidden layers. Feedforward neural networks make good general approximators and can be used to create generalized models.
This type of neural network can be used for classification and regression. For example, when using a feedforward network for classification, the number of neurons in the output layer is equal to the number of classes. Conceptually, the activated output neurons determine the classes predicted by the neural network. More precisely, each output neuron returns a probability number whose record matches the classification, where the classification with the highest probability will be selected as the output classification of the model.
The advantage of feedforward neural networks is that they are easy to use, simpler than other types of neural networks, and have a long list of example applications.
3.2 Convolutional Neural Networks
Convolutional neural networks and feedforward neural networks are very similar, at least in the way the data is transmitted. They structure roughly mimics the visual cortex. Convolutional neural networks pass through a number of filters. These filters focus on feature recognition of a subset of images, patches, and blocks. Each filter looks for different patterns of visual data, for example, some may be looking for horizontal lines, some for diagonal lines, and some for vertical ones. These lines are all seen as features, and as the filters pass over the image, they construct feature maps to locate where various types of lines are appearing in the image. Different objects in the image, like cats, 747s, juicers, etc. will have different image features, and these image features will enable the image to accomplish classification. Convolutional neural networks are very effective in image recognition and speech recognition.
Comparison of similarities and differences between Convolutional Neural Networks and Feedforward Neural Networks for image recognition. Although both network types are capable of image recognition, they do it in different ways. Convolutional neural networks are trained by recognizing overlapping parts of an image and then learning to recognize the features of the different parts; however, feedforward neural networks are trained on the entire image. Feedforward neural networks are always trained on a particular part or direction of the image, so when features of the image appear elsewhere they are not recognized, however, convolutional neural networks are able to avoid this very well.
Convolutional neural networks are mainly used for image, video, speech, and sound recognition, as well as drone tasks. Although this article is mainly about recurrent neural networks, convolutional neural networks are also very effective in image recognition, so it is important to understand.
3.3 Recurrent Neural Networks
Unlike feed-forward neural networks, recurrent neural networks have internal memory storage in the nodes of the hidden layer, which is constantly updated as the input data changes. The conclusions of a recurrent neural network are based on the current input and previously stored data. Recursive neural networks can take advantage of this internal memory storage state to process arbitrary sequences of data, such as time series.
Recursive neural networks are often used for handwriting recognition, speech recognition, log analysis, fraud detection, and cybersecurity.
Recursive neural networks are the best way to process time-dimensional datasets with the following data: web logs and server activity, sensor data from hardware or medical devices, financial transactions, phone records. Trying to track the dependencies and correlations of data at different stages requires you to understand some current and previous data states. Although we can also get events through feed-forward neural networks that move to another event over time, which would limit us to dependencies on events, so this approach is very inflexible.
A better way to keep track of data that has long-term dependencies in the time dimension is to use memory to store important events so that recent events can be understood and categorized. The best thing about recurrent neural networks is that they have "memory" inside their hidden layers to learn the importance of time-dependent features.
Next we will discuss the application of recurrent neural networks to character generators and network anomaly detection. The ability of a recurrent neural network to detect time-dependent features allows it to perform anomaly detection on time-series data.
Applications of Recurrent Neural Networks
The web is full of examples of text generation using RNNs, Recurrent Neural Networks are trained with a corpus to predict the next character as soon as a character is entered. Let's discover more features of RNNs with some practical examples below.
Application 1: RNNs for character generation
A recurrent neural network is trained to treat English characters as a series of time-dependent events. After training it learns that one character often follows another ("e" often follows "h", as in "the, he, she"). Since it predicts what the next character will be, it is effective in minimizing text entry errors.
Java is an interesting example because its structure includes a lot of nested structures; there's an open parenthesis necessarily followed by a closed one, and the same goes for the curly braces. The dependency between them will not be obvious in terms of location, because the relationship between multiple events is not determined by the distance between locations. But even if the recursive neural network is not explicitly told about the dependencies of individual events in Java, it can learn to understand them on its own.
In anomaly detection, we require the neural network to be able to detect similar, hidden, and perhaps obscure patterns in the data. Just as a character generator produces a likeness of the data once it fully understands the structure of the data, recurrent neural networks detect anomalies when they fully understand the structure of the data to determine if the input data is normal.
The character generation example shows that recurrent neural networks have the ability to learn temporal dependencies over different time scales, and that this ability can also be used to detect anomalies in network activity logs.
Anomaly detection can surface grammatical errors in text because what we write is shaped by grammatical structure. Similarly, network behavior is structured, and it has a predictable pattern that can be learned. Recursive neural networks that have been trained in normal network activity can monitor for intrusions that appear as anomalous as a sentence without punctuation.
Application 2, an example of a network anomaly detection project
Suppose that all we want to know about network anomaly detection is to be able to get some information about hardware failures, application failures, and intrusions.
What will the model show us?
As a large number of logs of network activity are fed into a recursive neural network, the neural network learns what normal network activity should look like. When this trained network is fed new data, it can then even determine what is normal activity, what is expected, and what is abnormal.
Training a neural network to recognize expected behavior is beneficial because there is not much abnormal data or the ability to accurately classify abnormal behavior. We train it in normal data and it will be able to alert us to the presence of abnormal activity at some point in the future.
As an aside, the trained neural network doesn't necessarily have to recognize a specific point in time when a specific thing happens (for example, it doesn't know that that particular day is Sunday), but it will certainly find some connections between some of the more obvious temporal patterns that we should be aware of, and some of the events that may not be obvious.
We'll give an overview of how to address this with Deeplearning4j, an open-source database for deep learning that is widely used on the JVM.Deeplearning4j provides a number of useful tools during model development: DataVec is an integrated tool that prepares model training data for ETL (Extract-Transform-Load) tasks. for ETL (Extract-Transform-Load) tasks. Just as Sqoop loads data for Hadoop, DataVec cleans, preprocesses, normalizes and standardizes the data before loading it into the neural network. It's also similar to Trifacta's Wrangler, except that it focuses more on binary data.
Starting Phase
The first phase consists of typical big data tasks and ETL: we need to collect, move, store, prepare, normalize, vectorize logs. The length of the time span has to be specified. Transformation of the data takes some effort, as JSON logs, text logs, and some non-continuous annotation patterns have to be recognized and transformed into arrays of values.DataVec can help with the transformation and normalization of the data. When developing a machine learning training model, the data needs to be divided into a training set and a test set.
Training Neural Networks
The initial training of a neural network needs to be done on the training data set.
During the first training, you need to adjust some hyperparameters to enable the model to achieve learning in the data. This process needs to be controlled in a reasonable amount of time. We will discuss hyperparameters later. During model training, you should aim to reduce errors.
But this can present a risk of overfitting the neural network model. Models that have overfitting present tend to score highly in the training set, but draw false conclusions when presented with new data. In the language of machine learning, this means that it is not generalizable enough. Deeplearning4J provides regularization tools and "premature stops" to avoid overfitting during training.
Training a neural network is the most time- and hardware-intensive step. Training on GPUs can effectively reduce training time, especially when doing image recognition. But additional hardware is a costly expense, so your deep learning framework must be able to utilize the hardware efficiently. cloud services such as Azure and Amazon provide GPU-based instances, and neural networks can also be trained on heterogeneous clusters.
Creating models
Deeplearning4J provides ModelSerializer to save training models. The training model can be saved or used or updated for later training.
During the execution of anomaly detection, the format of the log file needs to be consistent with the training model, and based on the output of the neural network, you will be able to conclude whether the current activity meets the expectations of normal network behavior.
Code example
The structure of a recursive neural network should look like this:
MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder(
.seed(123)
. optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT).iterations(1)
.weightInit(WeightInit.XAVIER)
. updater(Updater.NESTEROVS).momentum(0.9)
.learningRate(0.005)
.gradientNormalization(GradientNormalization. ClipElementWiseAbsoluteValue)
.gradientNormalizationThreshold(0.5)
.list()
.layer(0, new GravesLSTM.Builder(). activation("tanh").nIn(1).nOut(10).build())
.layer(1, new RnnOutputLayer.Builder(LossFunctions.LossFunction.MCXENT)
. activation("softmax").nIn(10).nOut(numLabelClasses).build())
.pretrain(false).backprop(true).build();
MultiLayerNetwork net = new MultiLayerNetwork(conf);
net.init();
Here's an explanation of a few important lines of code:
.seed(123)
Randomly set a seed value to the neural network's weights for initialization, as a way to obtain a result with replication. The coefficients are usually randomly initialized to give us consistent results even when adjusting other hyperparameters. We need to set a seed value that allows us to use this randomized weights when tuning and testing.
.optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT).iterations(1)
Decide which optimal algorithm to use (in this case stochastic gradient descent) to adjust the weights to improve the the error score. You may not need to modify this.
.learningRate(0.005)
When we use stochastic gradient descent, the error gradient is calculated. The weights change as we try to minimize the error value.SGD gives us a direction that makes the error smaller, and this learning efficiency determines how much of a gradient we should take in that direction. If the learning efficiency is too high, you're probably over the error minimum; if it's too low, your training will probably go on forever. This is a hyperparameter that you need to tune.