Theory of Computing
-------------------
Title     : Online Row Sampling
Authors   : Michael B. Cohen, Cameron Musco, and Jakub Pachocki
Volume    : 16
Number    : 15
Pages     : 1-25
URL       : https://theoryofcomputing.org/articles/v016a015

Abstract
--------

Finding a small spectral approximation for a tall $n \times d$ matrix
$A$ is a fundamental numerical primitive. For a number of reasons,
one often seeks an approximation whose rows are sampled from those of
$A$. Row sampling improves interpretability, saves space when
$A$ is sparse, and preserves structure, which is important, e.g.,
when $A$ represents a graph.

However, correctly sampling rows from $A$ can be costly when the
matrix is large and cannot be stored and processed in memory. Hence, a
number of recent publications focus on row sampling in the streaming
setting, using little more space than what is required to store the
returned approximation (Kelner--Levin, Theory Comput. Sys. 2013,
Kapralov et al., SIAM J. Comp. 2017).

Inspired by a growing body of work on online algorithms for machine
learning and data analysis, we extend this work to a more restrictive
_online_ setting: we read rows of $A$ one by one and immediately
decide whether each row should be kept in the spectral approximation
or discarded, without ever retracting these decisions. We present an
extremely simple algorithm that approximates $A$ up to
multiplicative error $1+\epsilon$ and additive error $\delta$ using
$O(d \log d \log (\epsilon\norm{A}_2^2/\delta) / \epsilon^2)$
online samples, with memory overhead proportional to the cost of
storing the spectral approximation. We also present an algorithm that
uses $O(d^2)$ memory but only requires
$O(d \log (\epsilon\norm{A}_2^2/\delta) / \epsilon^2)$ samples, which
we show is optimal.

Our methods are clean and intuitive, allow for lower memory usage than
prior work, and expose new theoretical properties of leverage score
based matrix approximation.