In this video, I will talk about usage of neural networks in attentional cascades. Neural networks has been effectively used as classifiers in sliding window detectors. For example, Rowley face detector was the best before Viola-Jones detector. It has konstant architecture and was trained with back propagation. But strong model CNN classifiers are very slow, so it is impractical to use them in sliding windows, but we can apply them as the last stage of attentional cascade. For example, we can use extension of Viola-Jones detector as proposal generator, and apply CNN only to the selected proposals as it was done in Taking a Deeper Look at Pedestrians paper. The experiments have shown that this approach led to substantial improvement of pedestrian detection setting state-of-the art for 2015. The next step is to replace all previous stages in attentional cascade to these neural networks. It was done in the paper, A Convolutional Neural Network Cascade for Face Detection. In order to each real-time performance, neural networks at the full stages should be very simple and fast. The number of stages can be reduced by using bond in both regression between stages, which I will explain later. In this paper, full stage classifier has one convolutional layer, one max pooling layer and one fully convolutional layer. So it is very simple. Classifiers for the next stages are more complex, and they concatenate output for fully convolutional layer from previous stage classifier, to the outputs of fully convolutional layer of the the current state classifier, before final classification of a proposal. Convolutional Neural Networks has demonstrated great results for image classification. They can detect presence of object, even if it occupies only a small portion of image. Thus, it can train CNN to regress the position of objects in the image. Of course, it would be difficult to regard the position of small object from a large image. But it is much easier to refine the position of subject in a small neighborhood, taking current window as input. So we can add separate neural networks after each stage of the Cascade to refine the bounding box of the proposal. In this particular algorithm, refinement of bounding box, is formulated as a classification problem. The classifier selects a transformation from a predefined set of transformations. Such classifiers are called calibration networks. In this paper, the architecture of calibration networks is basically the same as that of classification networks. Here are examples of outputs of each stage of the Cascade. You can see how the position of proposals are refined after each calibration network. You can apply no maximum separation after each stage, to reduce the number of proposal to be processed on the next stage. There are two main conclusions. First, each stage of attentional cascade can be implemented as a narrow network model. Second, because bounding box regression can refine the bounding box prediction, we can use larger sliding window strides. And look for less Windows, thus greatly improving the speed of the detector.