Current Projects
Approximate Hypothesis Testing
In Approximate Hypothesis Testing (AHT), the goal is to estimate an unknown distribution based on observed IID samples . Unlike in classical hypothesis testing, the estimate need not be exact—it suffices to produce some that is "close" to the sample-generating distribution . Together with my doctoral student Nicolas Le Gouic, we establish the sample complexity of AHT, i.e., the smallest number of samples required to reliably produce such .
Problem Statement. Consider samples that are IID according to a distribution only known to reside in a finite hypothesis class . Based on , we seek to produce an estimate —not necessarily in !—that is -close to as measured by some distance . Our aim is to determine the smallest number of samples that are required to produce such an estimate with probability at least for every sample-generating distribution . We refer to as the sample complexity of AHT and formally define it as
Clustered Hypothesis Testing. We determine by means of the auxililiary Clustered Hypothesis Testing (CHT) problem. In the latter, each hypothesis is assigned to at least one cluster . The aim of the problem is to determine, with probability at least , a cluster that contains the sample-generating distribution . The sample complexity of CHT is denoted and formally defined as
We show that for a judiciously constructed cluster family .
Specifically, we construct so that the intersection of every -ball with is the subset of some cluster ; this corresponds to the key fact that, in AHT, we need not distinguish between all hypotheses . In the figure below, we present an example of a cluster family for and , where
By choosing and as the Euclidean distance between the first two components of and ,
we obtain the cluster family for (depicted here is its -projection.)
Observe the overlap between the clusters: if the data-generating distribution was, say, , then both and would constitute a correct answer in the CHT problem.
Main Result. We show that scales as
where the minimum is taken over all "confusable" subsets of (in the example above, ; in general, we give a definition of "confusability" pertaining to the cluster family ), and where denotes a new "multivariate Bhattacharyya distance":
For detailed results and more examples, we shall publish a preprint shortly.