After finishing and submitting JASA paper (May 7), write an extension that utilizes the asymptotic approximation of Hankle and Toeplitz matrices with circulant matrices. This would allow for an asymptotic equivalence type argument.

Also, do kernel match + detection problem (June 15).

Finish NIPS paper with Dan, (June 1).

  • Algorithmic stability in IPs
  • Chris: funding
  • Lucky Imaging: projection estimator
  • Find minimax risk for sequential problem (asymptotic in n).  Adapt Belitzer and Levit.
  • Bound ratio of risks of Beran’s estimators and Berteros (and lucky imaging).
  • Read Chapter 10 of Cesa-Bianchi:  Online Inverse Problem?
  • Convex optimization for monotone estimator
  • Read minimax detection in inverse problems paper.

Suppose dY(t) = A\theta(t)dt + \epsilon d W(t) and let (\psi_i) be a basis (frame?) for space to which \theta belongs.
Act this observation functional on \psi_i, then we get a sequence space representation as Y_i = x_i + \epsilon \kappa_i^{-1} W_i where (W_i) forms a non-independent, but nearly independent set if A is sufficiently well behaved (think homogeneous, dilation invariant; or polynomial decay convolution).

  1. Ask James about his DS for correlated noise
  2. We can apply this to WVD by the above formulation.
  3. Fundamental Question:  Can we use the adaptive sampling framework to estimate $\latex \theta$ better?

Introduction

Suppose we are attempting to match two images, each with differing amounts of seeing and noise. One we call the reference image {(R)} and it is obtained with no noise. The second is called the science image {(S)} and it generally has more severe seeing and has noise. We wish to find some transformation that maps {R} to {S} such that the difference is noise-like if there are no additional sources.

In particular, we posit a sequence of operators {(K_{\lambda})_{\lambda \in \Lambda}} and {\sigma > 0} such that there exists a {K_{\lambda_0}} where {\mathbb{E}[S] = \mathbb{E}[K_{\lambda_0}R + \sigma Z] = K_{\lambda_0}}. Our goal is to choose {\hat\lambda \in \Lambda} based only on data.

1. Result

We propose a multiresolution noise-like statistic based on an idea in Davies and Kovak 1991, defined as:

\displaystyle  NL(\lambda, \mathcal{I}) := \sup_{I \in \mathcal{I}}\frac{1}{\sqrt{|I|}} \left| \sum_{i \in I} (K_\lambda R - S)_i \right|

where {\mathcal{I}} is a multiresolution analysis of the pixelized grid and {\lambda \in \Lambda}. Note that we do not include the multiplicative factor {\frac{1}{\sqrt{|I|}}} whenever it is not explicitly needed.

Observe that we can write this as

\displaystyle  NL(\lambda, \mathcal{I}) := NL(\lambda)= \sup_{I \in \mathcal{I}} \left| \sum_{i \in I} (K_\lambda - K_{\lambda_0})R + \sigma Z_i \right|  \ \ \ \ \ (1)

by adding and subtracting {K_{\lambda_0}R}. Note that the summation of {i} is supressed in the first term for notational clarity. Also, when {\mathcal{I}} is fixed, we supress that argument.

Now, one quality this statistic could have would be to asymptotically distinguish between competing hypotheses. In this case, low-noise asymptotics makes more sense than large sample, so we choose this regime.

Our goal is to look at the power of this statistic to determine amongst hypothesis asymptotically. It is known (Das Gupta 2008) that asymptotics for fixed alternative hypothesis leads to trivial results, such as power always tending toward 1.

Hence, we wish to look at an analogy to the Pittman slope. This can be phrased as follows. Let {\tau > 0} be given. Then we want to look at

\displaystyle  \lim_{\sigma \rightarrow 0} \mathbb{P} \left( \frac{NL(\lambda_0 + \Delta C_{\sigma})}{NL(\lambda_0)} > \tau \right)  \ \ \ \ \ (2)

where {C(\sigma)} is a function going to zero with {\sigma} and {\Delta} is a constant. We look at the ratio of the test under the alternate and null hypothesis as a way of rescaling. Alternatively, we can make {\tau} a function of {\sigma}. We see in what follows the ratio in effect chooses that function.

Lemma 1 We can rewrite (2) as

\displaystyle  \begin{array}{rcl}  \lim_{\sigma \rightarrow 0} \mathbb{P} \left( \frac{NL(\lambda_0 + \Delta C_{\sigma})}{NL(\lambda_0)} > \tau \right) & = & \lim_{\sigma \rightarrow 0} \mathbb{P} \left( \frac{ \sup_{I \in \mathcal{I}} \left| \sum_{i \in I} (K_{\lambda_0 + \Delta C(\sigma)} - K_{\lambda_0})R + \sigma Z_i \right| } { \sup_{I \in \mathcal{I}} \left| \sum_{i \in I} Z_i \right| } > \sigma \tau \right) \end{array}

Proof: Use (1) and multiply by {\sigma}. \Box

This is a difficult seeming probability to calculate, even asymptotically. Hence we use the limit as a heuristic that the absence of {\sigma} in the denominator within the probability allows us to consider the following instead

\displaystyle  \lim_{\sigma \rightarrow 0} \mathbb{P} \left( \sup_{I \in \mathcal{I}} \left| \sum_{i \in I} (K_{\lambda_0 + \Delta C(\sigma)} - K_{\lambda_0})R + \sigma Z_i \right| > \sigma \tau \right)

Before continuing, we need a result for exchanging {\sup} and {\mathbb{P}}:

Lemma 2 Let {(X_t)_T} be a sequence of random variables over some index {T} such that {\sup_t X_t = X_{t_*}} for some {t_* \in T}. Then for any {\tau > 0}

\displaystyle  \mathbb{P} ( \sup_t X_t > \tau) \geq \sup_t \mathbb{P}( X_t > \tau)

Proof: Write {\mathbb{P} ( \sup_t X_t > \tau) = \mathbb{E} \mathbf{1}(\sup_t X_t > \tau)}. Now, since {\mathbf{1}(\sup_t X_t > \tau) = \sup_t \mathbf{1}(X_t > \tau)}, we see that

\displaystyle  \mathbb{P} ( \sup_t X_t > \tau) = \mathbb{E} \sup_t \mathbf{1}( X_t > \tau) \geq \sup_t \mathbb{E} \mathbf{1}( X_t > \tau)

where for the last inequality we use that {\sup_t \int f_t \leq \int \sup_t f_t} for the necessary kinds of sequences of functions and measures. \Box

Using Lemma 2, we can write

\displaystyle  \mathbb{P} \left( \sup_{I \in \mathcal{I}} \left| \sum_{i \in I} (K_{\lambda_0 + \Delta C(\sigma)} - K_{\lambda_0})R + \sigma Z_i \right| > \sigma \tau \right) \geq \sup_{I \in \mathcal{I}} \mathbb{P} \left( \left| \sum_{i \in I} (K_{\lambda_0 + \Delta C(\sigma)} - K_{\lambda_0})R + \sigma Z_i \right| > \sigma \tau \right).  \ \ \ \ \ (3)

Now, we would like to examine the {C(\sigma)} such that the RHS of (3) {\stackrel{\sigma \rightarrow 0}{\rightarrow} 1}. First, we can compute the RHS of (3) as follows. Define {\mu_{I,\sigma} := \sum_{i \in I}(K_{\lambda_0 + \Delta C(\sigma)} - K_{\lambda_0})R}. Then

\displaystyle  \mathbb{P} \left( \left| \sum_{i \in I} (K_{\lambda_0 + \Delta C(\sigma)} - K_{\lambda_0})R + \sigma Z_i \right| > \sigma \tau \right) = 1 + \Phi\left( -\sqrt{|I|}\left( \tau + \frac{\mu_{I,\sigma}}{\sigma} \right) \right) - \Phi\left( \sqrt{|I|}\left( \tau - \frac{\mu_{I,\sigma}}{\sigma} \right) \right)  \ \ \ \ \ (4)

by noticing that the sum inside the absolute value is a {N\left(\mu_{I,\sigma},\frac{\sigma^2}{|I|}\right)} random variable.

Using that {\liminf_m \sup_n x_{m,n} \geq \sup_n \liminf_m x_{m,n}} for any doubly indexed sequence {x_{m,n}} we see that under (3) and (4)

\displaystyle  \begin{array}{rcl}  \lim_{\sigma \rightarrow 0} \mathbb{P} \left( \sup_{I \in \mathcal{I}} \left| \sum_{i \in I} (K_{\lambda_0 + \Delta C(\sigma)} - K_{\lambda_0})R + \sigma Z_i \right| > \sigma \tau \right) & \geq & \sup_{I \in \mathcal{I}} \lim_{\sigma \rightarrow 0} \bigg[ 1 + \Phi\left( -\sqrt{|I|}\left( \tau + \frac{\mu_{I,\sigma}}{\sigma} \right) \right)- \\ && \Phi\left( \sqrt{|I|}\left( \tau - \frac{\mu_{I,\sigma}}{\sigma} \right) \right)\bigg]. \end{array}

Now, we see that this probability goes to 1 when {\mu_{I,\sigma}/\sigma \rightarrow \infty}. We did this calculation for the case where {K_{\lambda}} has for a kernel a non-normalized Gaussian kernel with variance {\lambda} for all {\lambda \in \Lambda}.

The result was that

\displaystyle  \frac{\mu_{I,\sigma}}{\sigma} \rightarrow \infty \quad \textrm{if} \quad C'(\sigma) \rightarrow \infty

and

\displaystyle  \frac{\mu_{I,\sigma}}{\sigma} \rightarrow 0 \quad \textrm{if} \quad C'(\sigma) \rightarrow 0

However, the second of the two results is not informative. If we additionally assume that {C(\sigma) = \sigma^\alpha} for {\alpha > 0} we get

\displaystyle  \frac{\mu_{I,\sigma}}{\sigma} \rightarrow \infty \quad \textrm{if} \quad \alpha \in (0,1)

and

\displaystyle  \frac{\mu_{I,\sigma}}{\sigma} \rightarrow 0 \quad \textrm{if} \quad \alpha > 1.

We’re not sure what happens if {\alpha = 1}.

Risk Estimation

Weak-ly Updates Enter your password to view comments.
Sep 012010

This post is password protected. To view it please enter your password below:


As everyone in who has shared an office with me knows, I spend a lot of time thinking about inverse problems. In particular, statistical inverse problems. I think this is important because fundamentally statistics is an inverse problem. Observe the following very general statistical problem. Suppose {\mathcal{T}} is some separable Banach space. Suppose {\Theta \subseteq \mathcal{T}} is a model space. Let {(\phi_j) \subset \mathcal{T}^*} be a sequence of elements in the dual space of continuous linear functionals on {\mathcal{T}}. Lastly, suppose {n \in \mathbb{N}}, {\sigma >0} is a positive number, and {W} is a random variable defined on {\mathbb{R}^n}. Then we can define the statisical problem as a mapping {(\theta,\sigma,n) \mapsto \mathbb{P}_{\theta,\sigma,\phi}} where the observations are {\phi_j(\theta) + \sigma W_j} for each {1 \leq j \leq n} or equivalently {\phi(\theta) + \sigma W}. Hence, our goal is to make some inference about {\Theta} given an observation {Y \sim \mathbb{P}_{\theta,\sigma,\phi}}.

Generally speaking, there is usually another layer of formalization that encapsulates what we mean by `some inference.’ In particular, suppose we have a space {\mathcal{B}} and define a set of mappings {\mathcal{G} = \{g : \mathcal{\Theta} \rightarrow \mathcal{B} \}}. Then, we consider the elements of {\Theta} as models and either the space {\mathcal{G}} {\mathcal{B}} as the parameter space.

Now, the goal is to make a model {\theta \in \Theta}, a set of functionals {(\phi_j) \subset \mathcal{T}^*}, a parameter of interest {g \in \mathcal{G}} and an observation {Y \sim \mathbb{P}_{\theta,\sigma,\phi}} and develop some estimator {\hat{g}} that is a {\mathbb{P}_{\theta,\sigma,\phi}}- measureable function from {\mathbb{R}^n} to {\mathcal{B}}.

Inverse Problem Formalization (Working)

Weak-ly Updates Enter your password to view comments.
Aug 272010

This post is password protected. To view it please enter your password below:


© 2010 Weak-ly Update Suffusion theme by Sayontan Sinha