Dimension reduction in physical and data sciences

Training reliably shallow neural nets from fewest samples

Massimo Fornasier

Technische Universit√§t M√ľnchen


We address the uniform approximation of (the structure of) a shallow feed-forward neural network, from a small number of query samples, under mild smoothness assumptions on the activation functions and weights $a_i$. Our general approximation strategy is developed as a sequence of algorithms to perform individual sub-tasks. We consider both active and passive sampling, which allow respectively strong and weak differentiation of the network. From first and second order differential information, we first approximate the span of rank-1 matrices $a_i \otimes a_i$ generated by the weights. Then we perform a whitening procedure to near-orthogonalize the weights. The core of the construction is the approximation of ridge directions expressed in terms of near-orthonormal rank-1 matrices $a_i \otimes a_i$, realized by formulating their individual identification as a suitable nonlinear program, maximizing the spectral norm of certain competitors constrained over the unit Frobenius sphere. We show how this sequence of algorithms can be performed on one, two, and multi-layer networks and provide more reliable identification of the network than gradient descent methods.