Abstract:
We address the uniform approximation of (the structure of) a shallow feed-forward neural network, from a small
number of query samples, under mild smoothness assumptions on the activation functions and weights $a_i$. Our general approximation
strategy is developed as a sequence of algorithms to perform individual sub-tasks. We consider both active and passive sampling, which
allow respectively strong and weak differentiation of the network. From first and second order differential information, we first approximate
the span of rank-1 matrices $a_i \otimes a_i$ generated by the weights. Then we perform a whitening procedure to near-orthogonalize the
weights. The core of the construction is the approximation of ridge directions expressed in terms of near-orthonormal rank-1 matrices
$a_i \otimes a_i$, realized by formulating their individual identification as a suitable nonlinear program, maximizing the spectral norm of
certain competitors constrained over the unit Frobenius sphere. We show how this sequence of algorithms can be performed on one, two,
and multi-layer networks and provide more reliable identification of the network than gradient descent methods. |