Notes on image coaddition, PSF homogenization, and matched filters

Table of Contents

1 Introduction

As seen from the Earth, stars appear as point sources; the diameter of the star is usually too small to be measured. On images from telescopes, however, they appear slightly blurry due to turbulance in the atmosphere and (in space or in small telescopes) limitations in the optics of the instrument. The apparent morphology of a point source such as a star in an image called the point spread function (PSF), or sometimes the point response function, of the image. The PSF of different exposures from the same telescope may vary significantly, and the PSF can also vary across a single image.

In the approximation that the PSF is uniform across an image, the effect of the atmosphere and instrument on the image is that of the application of a low-pass Fourier filter.

There is a significant body of work dedicated to altering the point spread function of images, particularly in approximating the true sky from available images. Much of this work has been done specifically to address the early optical problems of the Hubble Space Telescope; see the STScI conference on restoration of HST images. Good reference papers include the review paper by Hunt and Digital Image Processing by Gonzales and Woods.

Sky surveys have an additional motivation for studying PSF alteration. When multiple exposures are combined into a single image of the sky, and the exposures do not all have the same PSF, the PSF in the combined image will jump discontinuously at the edge of each exposure. In a large survey with many exposures at differing offsets, this is a significant problem.

These notes were prepared for a meeting to discuss the impact of PSF homogenization in such a survey. The plots and examples below were all generated using the R statistical computing software package, and the R code accompanies the plots.

2 A Matched Filter on a single exposure

2.1 The special case of an isolated point source in given location

For an image of a single point source, the total number of counts recorded in each pixel of an image is: \[ b(x) = f h(x,x_0) \] Note that h can be a point spread function (scaled by the photometric solution), but could apply to more arbitrary mappings.

The final image will also contain noise: \[ g(x) = b(x) + n(x) = f h(x,x_0) + n(x) \]

When we are concerned with objects near the limits of detection, the photon noise from the sky will dominate; the noise will be uniform over the image.

If we wanted to measure the flux from the point source, we might simply estimate it from the brightest pixel: \[ {\hat f}_{\text{max}} = \frac{g(x_{\text{max}})}{h(x_{\text{max}},x_0)} \]

We are not limited to using just that single pixel; we can get such an estimate for any pixel with flux: \[ {\hat f}_{x} = \frac{g(x)}{h(x,x_0)} \] but the uncertaitly will not be uniform: \[ \sigma_{\hat f}(x) = \frac{\sigma_{n}}{h(x,x_0)} \]

We can do a weighted mean of these different estimates to get an estimate with minimal uncertainty: \[ {\hat f} = \frac{\sum_x \frac{1}{\sigma_f^2(x)} \frac{g(x)}{h(x,x_0)} }{\sum_x \frac{1}{\sigma_f^2(x)}} = \frac{\sum_x \frac{h^2(x,x_0)}{\sigma^2_{n}} \frac{g(x)}{h(x,x_0)} }{ \sum_x \frac{h^2(x,x_0)}{\sigma^2_{n}}} = \frac{1}{ \sum_x h^2(x,x_0) } \sum_x h(x,x_0) g(x) \]

Note that this is the value of the convolution of the image by its point spread function, evaluated at the position of the point source, and scaled by constant that depends on the point spread function and photometric solution: \[ {\hat f} = \sum_x \left(\frac{h(x,x_0)}{\sum_{x_1} h^2(x_1,x_0)}\right) g(x,x_0) = \sum_x m(x,x_0) g(x, x_0) \]

Because the noise variance is independent of position, the uncertainty in the estimate of the flux is the the standard deviation of the sky noise in the image, after convolution by the point spread function. Because we know this particulary weighting minimizes the uncertainty in the flux relative to the flux, we can conclude that this weighting maximizes the value of the flux relative to the sky noise. This is precisely what we want for object detection; the point spread function is our optimal matching filter.

2.2 Example

First, I define a Moffat PSF:

moffat <- function(r,fwhm,beta=4.765) {
  hwhm <- (fwhm/2.0)
  alpha <- hwhm*(2^(1.0/beta)-1)^(-0.5)
  p <- (1.0 + (r/alpha)^2.0)^(-1.0*beta)

Plot the raw signal, with the sky sigma equal to the square root of the sky and the flux in a Moffat profile with a total sky flux 5 times the sky sigma:

npoints <- 64
sigma <- 4.0
flux <- 5*sigma
x <- (-1*npoints/2):(npoints/2)
h <- moffat(x,4.25)
n <- rnorm(x,sd=sigma)
b <- sigma^2+flux*h
g <- b+n

mtext("Data with noise",4,0,las=1)
mtext("Data w/o noise",4,0,las=1,padj=2,col="red")


Our representation of the pixel transfer function is centered on the center of the array rather than the edge, so the default R convolution shifts everything. Define a new convolution to center it on the 0 index.

recenter <- function(x) {
  npts <- length(x)
  hnpts <- ceiling(npts/2)
  z <- c(x[(hnpts+1):npts],x[1:(hnpts)])

Now, apply the convolution

fhat <- recenter(convolve(g,h,type="circular"))

mtext("Data with noise",4,0,padj=-2,las=1,col="gray")
mtext("Matched filtered",4,0,padj=0,las=1)
mtext("Data w/o noise",4,0,las=1,padj=2,col="green")


2.3 The Matched filter in Fourier space

Frequency filtering alters both the signal and noise of an image, but because the power spectrum of the signal and noise differ, the filtering will alter the signal and noise differently. To detect an image, we need the signal from a given object to be as high as possible relative to the noise; in other words, we need to apply a filter that maximizes the signal to noise of the objects near the detection limit.

Near the detection limit, the noise will be approximately uniform, and the power spectrum of the noise will be flat; the Fourier transform of the noise will be of equal magnitude at all frequencies, and have a mean of zero. The Fourier transform of the signal, however, will be the Fourier transform of the object, multiplied by the Fourier transform of the PSF. In the case of a point source, the power spectrum will be the shape of the PSF, with a magnitude that varies with the flux of the object.

So, our goal is to derive a matched filter that maximizes the signal to noise. In other words, our filtered image:

\[ \hat{f}(x) = \sum_{x'} f(x') m(x-x') \]

should be a least squares fit, minimizing:

\[ \chi^2 = \sum_x (f(x)-\hat{f}(x))^2 \]

This is the Wiener filter, taking considering only low signal to noise objects as part of the signal. In Fourier space

\begin{eqnarray*} M(\nu) & = & \frac{1}{H(\nu)} \frac {1} {1 + \frac{|N(\nu)|^2}{|B(\nu)|^2}} \\ & = & \frac{ |F(\nu)|^2 H^\ast(\nu) }{|B(\nu)|^2 + |N(\nu)|^2} \end{eqnarray*}

If we are looking at low signal to noise objects \( (|B|^2 < < |N|^2 ) \) and we are looking at a random distribution of point sources (F(ν) is flat), then \[ M(\nu) \propto H^\ast(\nu) \] and the matched filter is a mirror reflection of the point spread function, normalized by some factor (which doesn't affect the effectiveness of the matched filter).

2.4 A Fuzzy-minded explanation

What, then, does this mean? A pixel in the "true" image f(x) has contributions from each F(ν) Fourier components. Convolution by the point spread function by the instrument, atmosphere, etc. reduces this amount to H(ν)F(ν), and noise N(ν) is added to the result, giving us our observed image g(x), whose Fourier transform is: \[ G(\nu) = H(\nu)F(\nu)+ N(\nu) \] Our best estimate of F(ν) will be \[ \hat{F}(\nu) = \frac{G(\nu)}{H(\nu)}. \] Our image g(x), however, does not have the same signal to noise in each frequency component. If we weight each frequency component by its \( (S/N)^2 \)rather than doing a direct inverse transform, then we get

\begin{eqnarray*} \hat{f}(x) & \propto & \sum_{\nu} \frac{|F(\nu)H(\nu)|^2}{|N(\nu)|^2} \frac{G(\nu)}{H(\nu)} e^{\frac{2 \pi i \nu x}{N}}\\ & \propto & \sum_{\nu} \frac{|F(\nu)|^2}{|N(\nu)|^2} H(\nu) H^*(\nu) \frac{G(\nu)}{H(\nu)} e^{\frac{2 \pi i \nu x}{N}}\\ & \propto & \sum_{\nu} H^*(\nu) G(\nu) e^{\frac{2 \pi i \nu x}{N}} \end{eqnarray*}

which is just the inverse transform of our match filtered image.

2.5 An example

Going back to our previous example,

GG <- abs(fft(g))^2
BB <- abs(fft(b))^2
NN <- abs(fft(n))^2
FFhat <- abs(fft(fhat))^2
k <- 1:(npoints/2)

leglabels<-c("Original data","Matched filtered data","Noise-free data", "Noise")


Of course, when we are doing our fit, we do not know specifically what all the noise values are, so we use our expectation value for our magnitude of the noise:

NNe <- rep(0,times=npoints+1)+npoints*(sigma^2)
GGe <- rep(0,times=npoints+1)+BB+NNe

leglabels<-c("Original data","Matched filtered data","Expected signal+noise", "Noise-free signal", "Expected noise")


3 Application of matched filters to resampled images

Interpolation by a sinc function that is critically sampled at a higher frequency than the point spread function, such that it doesn't smooth the original PSF in physical space, is benign with respect to the matched filter; the optimal matched filter for point sources is still the PSF (but in the image space of the interpolated image).

Recall that the Fourier transform of a sinc function is a square; interpolation by a sinc is equivalent to taking the fourier transform, possibly replacing frequencies higher that some threshhold with zero, padding the result with zeros to get an array of the desired size, and inverting the Fourier transform. For example, in the above example, the power spectro of the intrepolated images would look like this:

snpoints <- 100
sk <- 1:(snpoints/2)
sGG <- rep(0,times=snpoints+1)
sFFhat <- rep(0,times=snpoints+1)
sGGe <- rep(0,times=snpoints+1)
sBB <- rep(0,times=snpoints+1)
sNNe <- rep(0,times=snpoints+1)
sGG[k] <- GG[k]
sFFhat[k] <- FFhat[k]
sGGe[k] <- GGe[k]
sBB[k] <- BB[k]
sNNe[k] <- NNe[k]

leglabels<-c("Original data","Matched filtered data","Expected signal+noise", "Noise-free signal", "Expected noise")


The works because, although there is significant correlation in the noise in the interpolated image, if a sinc function with a reasonable width was used, all of the correlation is in frequencies where the matched filter has no power, and therefore these frequencies have zero weight in the filtered image.

Spline intrepolation gives nearly the same answer as sinc interpolation, so it is nearly benign as well:

spg <- spline(g,n=(snpoints+1))$y
sph <- spline(h,n=(snpoints+1))$y
spfhat <- recenter(convolve(spg,sph,type="circular"))
spGG <- abs(fft(spg))^2
spFFhat <- abs(fft(spfhat))^2

leglabels<-c("Sinc data","Filtered sinc data","Spline data","Filtered spline data")


Note that, while the high frequency power in the spline interpolated version does not go precisely to zero as it does in the sinc interpolated version, the contribution from these frequencies is still many orders of magnitude less than in lower frequencies.

4 Consequences of using a different filter for matching

Consider the situation where, instead of using a filter that matches our PSF, we use a different filter, H0, for object detection, perhaps one that matches the global average PSF of many observations.

Instead of using an optimal image for object detection: \[ \hat{F}(\nu) = H^*(\nu)G(\nu) \] we get a slightly different image:

\begin{eqnarray*} \hat{F}_0(\nu) & = & H^*_0(\nu)G(\nu) \\ & = & \frac{H^*_0(\nu)}{H^*(\nu)} \hat{F}(\nu) \end{eqnarray*}

Because the H0 is probably close to H, this is likely to be fairly benign. However, recall one important assumption we made in deriving this matched filter: that the image was a random distribution of point sources. Different matched filters will match objects with different morphologies. If, for example, H0 is slightly wider than H, the filter may not be optimally suited for detecting point sources, but it may be better suited for detecting slightly extended sources. If H0 has a different ellipticity than H, then it will be better at detecting objects with some ellipticities, and worse at detecting objects with others. So, if we use the same matched filter for all images, ignoring variations is PSF, then our limiting magnitude will depend on object morphology in some complicated way.

4.1 The Wiener Filter

Using Parseval's theorem (see Bracewell, The Fourier Transform and its Applications, p112): \[ \sum_x |f(x)|^2 = \frac{1}{N} \sum_\nu |F(\nu)|^2 \] and the addition theorem (see Bracewell, The Fourier Transform and its Applications, p104): \[ {\mathcal F}(f(x)+g(x)) = {\mathcal F}(f(x)) + {\mathcal F}(g(x)) \] we know we can do the minimization in Fourier space: \[ \chi^2 = \frac{1}{N} \sum_\nu |F(\nu)-\hat{F}(\nu)|^2 \] where the convolution by the filter can be expressed more conveniently:

\begin{eqnarray*} \hat{F}(\nu) & = & M(\nu) G(\nu) \\ & = & M(\nu) [ H(\nu) F(\nu) + N(\nu) ] \\ & = & M(\nu) H(\nu) F(\nu) + M(\nu) N(\nu) \end{eqnarray*}

We can now minimize: \[ \chi^2 \propto \sum_\nu |F(\nu)-M(\nu) H(\nu) F(\nu) - M(\nu) N(\nu)|^2 \]

We then need to differentiate the above equation with respect to M, and set the derivative to 0 to find the minimum. If we assume the signal and noise are not correlated, this ultimately reduces to \[ M(\nu) = \frac { |F(\nu)|^2 H^\ast(\nu) } { |B(\nu)|^2 + |N(\nu)|^2 } \] If we are looking at low signal to noise objects, \[ |B(\nu)|^2 << |N(\nu)|^2 \] so \[ M(\nu) = \frac { |F(\nu)|^2 H^\ast(\nu) } { |N(\nu)|^2 } \] If the noise is uniform, \[ M(\nu) \propto |F(\nu)|^2 H^\ast(\nu) \] and if the signal is a uniform distribution of point sources, \[ M(\nu) \propto H^\ast(\nu) \]

5 PSF Homogenization

PSF homogenization is the process of convolving an image by some kernel that results in an image with a desired PSF. This is easist to understand in Fourier space. Consider a starting image with PSF H: \[ G(\nu) = H(\nu) F(\nu) + N(\nu) \] If we want an image Gh(ν) such that \[ G_h(\nu) = H_0(\nu) F(\nu) + N_h(\nu) \] then we must multiply G(ν) by \[ \frac{H_0(\nu)}{H(\nu)} \] to get

\begin{eqnarray*} G_h(\nu) & = & \frac{H_0(\nu)}{H(\nu)} G(\nu) \\ & = & \frac{H_0(\nu)}{H(\nu)} H(\nu) F(\nu) + \frac{H_0(\nu)}{H(\nu)} N(\nu) \\ & = & H_0(\nu) F(\nu) + \frac{H_0(\nu)}{H(\nu)} N(\nu) \\ & = & H_0(\nu) F(\nu) + N_h(\nu) \\ \end{eqnarray*}

where \[ N_h(\nu) = \frac{H_0(\nu)}{H(\nu)} N(\nu) \]

Note that the point spread function is in the denominator of the homogenization filter: the filter amplifies spatial frequencies where there is little power in the actual PSF relative to the desired PSF, while it suppresses frequencies where there is more power.

Consider the example we have been using:

GG <- abs(fft(g))^2
BB <- abs(fft(b))^2
NN <- abs(fft(n))^2
NNe <- rep(0,times=npoints+1)+npoints*(sigma^2)
k <- 1:(npoints/2)

leglabels<-c("Original data","Noise-free data", "Expected noise")


Now, let us construct and apply a homogenization filter. Start with the case where the target PSF is wider (has weaker high frequency spatial components):

h0 <- moffat(x,6.0)
GGh <- abs(fft(h0)*fft(g)/fft(h))^2
NNh <- NNe*abs(fft(h0)/fft(h))^2
BBh <- abs(fft(h0)*fft(b)/fft(h))^2

leglabels<-c("Original data","Homogenized data","Homogenized signal", "Homogenized expected noise")


Now we move to the case where the target PSF is sharper (has stronger high frequency spatial components):

h0 <- moffat(x,2.5)
GGh <- abs(fft(h0)*fft(g)/fft(h))^2
NNh <- NNe*abs(fft(h0)/fft(h))^2
BBh <- abs(fft(h0)*fft(b)/fft(h))^2

leglabels<-c("Original data","Homogenized data","Homogenized signal", "Hmg. expected noise")


Note that the high frequency noise gets greatly amplified. This effect can be reduced by a more careful choice for the form for the target PSF (for example by using a sinc instead of a gaussian), but to at least some extent amplification of high frequency noise is inherent in the the technique.

6 Naive matching filtering of a PSF homogenized image

Our derivation of the matching filter in previous sections assumed that the noize was uniform, at least over the range for the frequency range has finite values. This is clearly not the case for the PSF homogenized image. What happens if we ignore this, and pretend that the noise is indeed uniform after homogenization? The final, matched filtered image would look like this:

\begin{eqnarray*} \hat{F}_h(\nu) & = & H^*_0(\nu) G_h(\nu) \\ & = & H^*_0(\nu) \frac{H_0(\nu)}{H(\nu)} G(\nu) \\ & = & \frac{|H_0(\nu)|^2}{H(\nu)} G(\nu) \\ & = & \frac{|H_0(\nu)|^2}{|H(\nu)|^2} H^*(\nu) G(\nu) \\ & = & \left(\frac{|H_0(\nu)|}{|H(\nu)|}\right)^2 \hat{F}(\nu) \end{eqnarray*}

The fractional difference between applying an optimal matched filter and naively filtering a homogenized image is the square of the difference between the optimal filter and simple application of a standardized filter on non-homogenized images. When using a standardized PSF, homogenization pushes the weighting further from optimal, and exagerates the sensativity of the detection limit to object morphology.

What, then, is the optimal filter for an homogenized image?

\begin{eqnarray*} \hat{F}(\nu) & = & H^*(\nu) G(\nu) \\ & = & H^*(\nu) \frac{H(\nu)}{H_0(\nu)} G_h(\nu) \\ & = & \frac{|H(\nu)|^2}{H_0(\nu)} G_h(\nu) \\ \end{eqnarray*}

So the optimal filter is \[ \frac{|H(\nu)|^2}{H_0(\nu)} \]

7 Coaddition of PSF homogenized images

7.1 Coaddition in frequency space

The effects of PSF homogenization are much clearer in frequency space than in image space, so in this discussion I will make use of the addition theorem (see Bracewell, The Fourier Transform and its Applications, p104):

\begin{eqnarray*} {\mathcal F}(g_1(x)+g_2(x)) & = & {\mathcal F}(g_1(x)) + {\mathcal F}(g_2(x)) \\ f(x)+g(x) & = & {\mathcal F}^{-1}[G_1(\nu) + G_2(\nu)] \end{eqnarray*}

7.2 Direct coaddition

What happens if we coadd images? If we use the traditional approach, adding the images:

\begin{eqnarray*} G_1(\nu) & = & H_1(\nu) F(\nu) + N_1(\nu) \\ G_2(\nu) & = & H_2(\nu) F(\nu) + N_2(\nu) \end{eqnarray*}

with weights w1 and w2, perhaps setting w2 = 1 - w1, then we get \[ G_{a}(\nu) = [ w_1 H_1(\nu) + w_2 H_2(\nu) ] F(\nu) + w_1 N_1(\nu) + w_2 N_2(\nu) \]

7.3 Coaddition after homogenization

After homogenization, we have:

\begin{eqnarray*} G_{1h}(\nu) & = & \frac{H_0(\nu)}{H_1(\nu)} G_1(\nu) \\ & = & H_0(\nu) F(\nu) + \frac{H_0(\nu)}{H_1(\nu)} N_1(\nu) \\ G_{2h}(\nu) & = & \frac{H_0(\nu)}{H_2(\nu)} G_2(\nu) \\ & = & H_0(\nu) F(\nu) + \frac{H_0(\nu)}{H_2(\nu)} N_2(\nu) \end{eqnarray*}

and after coaddition: \[ G_{ah}(\nu) = w_1 \frac{H_0(\nu)}{H_1(\nu)} G_1(\nu) + w_2 \frac{H_0(\nu)}{H_2(\nu)} G_2(\nu) = H_0(\nu) F(\nu) + \frac{H_0(\nu)}{H_1(\nu)} w_1 N_1(\nu) + \frac{H_0(\nu)}{H_2(\nu)} w_2 N_2(\nu) \]

Note that for any given ν, the coadded Gah(ν) is still a weighted average of G1(ν) and G2(ν), but this time the weights are a function of ν. Note that the signal to noise G1(ν) at a given ν is proportional to H1(ν), but that H1(ν) appears in the denominator of the varying weights: the weight of a given pixel in frequency space is inversly proportional to its signal to noise! At frequencies where G1 has a high signal to noise but G2 has a low one, the G2 will dominate in the coadded image, and vice versa. This is exactly the opposite of the weighting one wants to maximize the signal to noise in the coadded image.

If you examine the weights of N1(ν) and N2(ν), you can see that the weights of either can become arbitrarily large, depending on the corresponding H. Astronomical PSFs are approximately of the form: \[ \mathcal{F}(PSF) \propto e^{-(\nu b)^\gamma} \] (see Saglia etal 1993), where b determines the scale of the PSF and γ the shape. Note that γ=2 is a gaussian PSF. Atmospheric models give a \(\gamma=\frac{5}{3}\).

Using this model, the coefficient for the noise term from exposure i in frequency ν is: \[ \frac{H_0(\nu)}{H_i(\nu)} \propto e^{-\nu^\gamma(b_0^\gamma-b_i^\gamma)} \]

If the target PSF is shaper than one of the exposure PSFs \((b_i > b_0)\), the noise from that exposure rises exponenitally with ν. Note that this contribution cannot be compensated by other exposures that have sharper PSFs, because the sum of a rising exponential and a falling one will always approach the rising one as the operand rises.

Note this exponential rise is not possible in direct coaddition, where the weight is independent of frequency.

7.4 Coaddition after matched filtering

After application of the matched filter, we have

\begin{eqnarray*} G_{1m}(\nu) & = & H_1^*(\nu) G_1(\nu) \\ & = & |H_1(\nu)|^2 F(\nu) + H^*_1(\nu) N_1(\nu) \\ G_{2m}(\nu) & = & H_2^*(\nu) G_2(\nu) \\ & = & |H_2(\nu)|^2 F(\nu) + H^*_2(\nu) N_2(\nu) \end{eqnarray*}

and after coaddition: \[ G_{am}(\nu) = w_1 H_1^*(\nu) G_1(\nu) + w_2 H_2^*(\nu) G_2(\nu) = [ w_1 |H_1(\nu)|^2 + w_2 |H_2(\nu)|^2 ] F(\nu) + w_1 H^*_1(\nu) N_1(\nu) + w_2 H^*_2(\nu) N_2(\nu) \]

Now, the relative weights of G1 and G2 still vary with ν, but in a more apprepriate way: frequencies with more signal get weighted more heavily.

At frequencies where a given exposure has a small value for H(ν), note that the noise contribution to the coadded image is suppressed.

8 A Matched Filter on coadded, PSF homogenized exposures

Going back to the step in the matched filter derivation before we assume uniform noise, and substituting in the noise from above, we get: \[ M(\nu) = \frac{H_0^*(\nu)}{ | \frac{H_0(\nu)}{H_1(\nu)} w_1 N_1(\nu) + \frac{H_0(\nu)}{H_2(\nu)} w_2 N_2(\nu) |^2} \]

Note that, for an ν for which H(ν) < < H0(ν), that term is likely to dominate the denominator, reducing the weight for the frequency. For coadded homogenized images, the image with the worst seeing determines which frequencies can be used for detection.

If we make our target PSF such that it is wider than the input images, this will not be a problem. Instead, the numerator will drop high frequency components to small values.

Author: Eric H. Neilsen, Jr.

Created: 2014-06-16 Mon 15:31

Emacs 24.3.1 (Org mode 8.2.5c)