20.7: Neural Networks

Last updated
Save as PDF

Page ID: 41041

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

Neural networks came out modeling the brain and the nervous system in an attempt to achieve brain-like learning. They are highly parallel and by learning simple concepts we can achieve very complex behaviors. In relevance to this book, they also have proved to be very good biological models (not surprising giving where they came about).

Feed-forward nets

In a neural network we map the input to the output passing through hidden states that are parametrized by learning.

Figure 20.16: Illustration of a neural network

• Information flow is unidirectional

• Data is presented to Input layer

• Passed on to Hidden Layer
• Passed on to Output layer

• Information is distributed
• Information processing is parallel

Back-propagation

Back-propagation is one of the most influential results for training neural nets and allowing us to easily deal with multi-layer networks.

• Requires training set (input / output pairs)

• Starts with small random weights

• Error is used to adjust weights (supervised learning)
It basically performs gradient descent on the error landscape trying to minimize the error. Thus, back propagation can be slow.

Deep Learning

Deep learning is a collection of statistical machine learning techniques used to learn feature hierarchies. Often based on artificial neural networks. Deep neural networks have more than one hidden layer. Each successive layer in a neural network uses features in the previous layer to learn more complex features. One of the (relevant) aims of deep learning methods is to perform hierarchical feature extraction. This makes deep learning an attractive approach to modeling hierarchical generative processes as are commonly found in systems biology.

Example: DeepBind (Alipanahi et al. 2015)

DeepBind[1] is a machine learning tool developed by Alipanahi et al. to predict the sequence specificities of DNA- and RNA-binding proteins using deep learning based methods.

The authors point out three diculties encountered when training models of sequence of specificities on the large volumes of sequence data produced by modern high-throughput technologies: (a) the data comes in qualitatively different forms, including protein binding microarrays, RNAcompete assays, ChIP- seq and HT-SELEX, (b) the quantity of data is very large (typical experiments measure ten to a hundred thousand sequences and (c) each data acquisition technology has it’s own formats and error profile and thus an algorithm is needed that is robust to these unwanted effects.

The DeepBind method is able to resolve these diculties by way of (a) parallel implementation on a graphics processing unit, (b) tolerating a moderate degree of noise and mis-classified training data and (c) train predictive model in an automatic fashion while avoiding the need for hand-tuning. The following figures illustrate aspects of the Deep Bind pipeline.

To address the concern of overfitting, the authors used several regularizers, including dropout, weight decay and early stopping.

Dropout: Prevention of Over-Fitting

Dropout[5] is a technique for addressing the problem of overfitting on the training data in the context of large networks. Due to the multiplication of gradients in the computation of the chain rule, hidden unit weights are co-adapted which can lead to overfitting. One way to avoid co-adaption of hidden unit weights is to simply drop units (randomly). A beneficial consequence of dropping units is that larger neural networks are more computationally intensive to train.

However, this approach take a little longer with respect to training. Furthermore, tuning step-size is a bit of a challenge. The authors provide an Appendix, in which they (in part (A)) provide a helpful “Practical Guide for Training Dropout Networks.” They note that typical values for the dropout parameter p (which

Figure 20.17: A flowchart of the DeepBind procedure (taken from the DeepBind paper). Five sequences are being processed in parallel by the model. The model convolves the sequences (we can think of the deepbind model as a filter scanning through the sequencs), recitifies and pools them in order to produce a feature vector which is then passed through a deep neural network. The output from the deepnet is compared against the desired output and the error is back-propagated through the pipeline.

Figure 20.18: An illustration of the calibration, training and testing procedure used by the DeepBind method (taken from the DeepBind paper).

Courtesy of Macmillan Publishers Limited. Used with permission.

Source: Alipanahi, Babak, Andrew Delong, et al. "Predicting the Sequence Specificities of

DNA-and RNA-binding Proteins by Deep Learning." Nature Biotechnology (2015)

determines the probability that a node will be dropped) are between 0.5 and 0.8 for hidden layers and 0.8 for input layers.