[ Basic Info | References | User Guide ]
Basic Information on sfind
Task: sfind
Purpose: Automatically or interactively find sources in images
Categories: plotting
SFIND has been updated to incorporate a new statistically
robust method for detecting source pixels, called FDR (which
stands for "False Discovery Rate"), as an alternative to
simple sigma-clipping, which formed the basis of the original
implementation. Details of the FDR method can be found in
Hopkins et al 2001, (astro-ph/0110570) and references
therein. The original implementation of sfind has been
preserved, and can be applied by specifying the option "oldsfind".
In addition, when using SFIND in the new 'FDR' mode,
several new features are available. SFIND now provides the
option of outputing (1) a 'normalised' image (options=normimg)
created by subtracting a background and dividing by sigma
(the standard deviation); (2) a 'sigma' image (options=sigmaimg),
created by sigma-clipping the normalised image at a user-specified
sigma value; (3) an 'fdr' image (options=fdrimg) created similarly
to the sigma-image by clipping the normalised image at the FDR-
defined threshold.
In 'FDR' mode (the default), no interactive source detection
is possible, as is the case with the original version. Instead,
the detected sources are drawn from a distribution of pixels
with a robustly known chance of being falsely drawn from the
background, thus more reliably characterising the fraction of
expected false sources than is possible with a sigma-clipping
criterion.
The process of source detection and measurement is slightly
different in 'FDR' mode compared to the original SFIND
implementation (see below). In 'FDR' mode, the following steps
are performed:
1. The image is first normalised by estimating the background
(mean) and standard deviation (sigma) for the whole image in
uniformly distributed regions of size 'rmsbox' (a user input).
This is done by fitting a gaussian to the pixel histogram,
(as is done in the task 'imsad') - if the fit is poor, an
interative method is used instead. With these values known,
the image has the mean subtracted and is divided by sigma to
create a normalised image. This is the same as saying the
normalised image has a gaussian mean of 0 and sigma of 1.
The normalised image is output as a miriad image called
sfind.norm if 'options=normimg' is set. The rms noise measured
over the image can also be output as sfind.rms if
'options=rmsimg' is set.
2. From the normalised image a sigma-clipped image (called
sfind.sig) may be output (if 'options=sigmaimg'). This is
simply an image with pixel values set to 100 if the pixel
value in the normalised image is greater than the user
specified value of 'xrms,' or 0 otherwise.
3. The FDR method is implemented using the normalised image.
Each pixel is assigned a p-value, a probability that it was
drawn from the background, and a cutoff p-value is established
based on the percentage of false rejections (source pixels)
that the user specifies with the parameter 'alpha'. If
'options=fdrimg' is set, this cutoff p-value threshold is
used to create an 'fdr' image (called sfind.fdr) in the same
way as the sigma image above is created.
4. With the FDR cutoff threshold established, sources may now
be detected and measured. Each pixel with a p-value lying
*below* the cutoff p-value (i.e. a low chance of being drawn
from the background) may be part of a source. For each such
'FDR-detected' pixel, a hill-climbing routine finds a local
peak from adjacent FDR-selected pixels. This is then used as
the starting point for a routine which selects contiguous
monotonically decreasing adjacent pixels from the FDR-selected
ones, and to which a 2-D elliptical gaussian is fit in the
same way as the original SFIND (see below). The same parameters
are returned as in the original implementation, and the
logfile has the same format (see below).
Also, in FDR mode 'option=auto' from the original implementation
is assumed automatically, regardless of user input. This means
the inputs for 'type,' 'range,' 'device' etc are not relevant
and are ignored.
In the original implementation, SFIND displays an image via
a contour plot or a pixel map representation on a PGPLOT device.
The user is then provided with the opportunity to interactively
flag sources as real or not (indicated by a Y or N flag in a
log file).
Source positions are calculated by an algorithm which searches for
pixels brighter than the surrounding 24 pixels and then
bi-parabolically fitting positions and flux densities.
Once a source such as this is detected, SFIND checks to see whether
it is brighter than the user set multiple of the background rms.
If so, a 2D elliptical gaussian fit is performed (using the same
routine as IMFIT) and the source parameters are displayed on the
terminal (and written to a log file after user input to determine
a flag, Y or N, to attach). The source parameters are (in order):
Quantity Notes
-------------- -----------
Position RA and Dec. in standard miriad
hms,dms format
Formal errors in RA and Dec. (arcsec; treat judiciously)
Peak flux density (mJy)
Formal error in peak flux in mJy (generally not a good
density estimate of the true error)
Integrated flux density (mJy)
Major and minor axes and (arcseconds for axes, degrees for PA)
position angle of source Warning: these are not deconvolved
from the synthesized beam
Local background rms (sigma) (mJy) calculated from a gaussian fit
to the pixel histogram, as per imsad
rms of gaussian fit
Manipulation of the device colour lookup table is available
when you display with a pixel map representation.
Key: in
The input image.
Key: type
Specifies the type of the image in the IN keyword. Minimum match
is supported. Choose from:
"contour" (contour plot)
"pixel" (pixel map)
It is strongly suggested that pixel maps be used for source finding,
as contour plots may be deceiving. Default is "pixel".
Ignored in 'FDR' mode (the default).
Key: region
Region of interest. Choose only one spatial region (bounding
box only supported), but as many spectral regions (i.e.,
multiple IMAGE specifications) as you like. If you display a
3-D image, the cursor options are activated after each sub-plot
(channel or group of channels; see CHAN below) is drawn.
Default is full image.
Key: xybin
Upto 4 values. These give the spatial increment and binning
size in pixels for the x and y axes to be applied to the selected
region. If the binning size is not unity, it must be equal
to the increment. For example, to bin up the image by 4 pixels in
the x direction and to pick out every third pixel in the y
direction, set XYBIN=4,4,3,1
Defaults are 1,XYBIN(1),XYBIN(1),XYBIN(3)
Key: chan
2 values. The first is the channel increment, the second is
the number of channels to average, for each sub-plot. Thus
CHAN=5,3 would average groups of 3 channels together, starting
5 channels apart such as: 1:3, 6:8, 11:13 ... The channels
available are those designated by the REGION keyword. A new
group of channels (sub-plot) is started if there is a
discontinuity in the REGION selected channels (such as
IMAGE(10,20),IMAGE(22,30).
Defaults are 1,1
Key: slev
2 values. First value is the type of contour level scale
factor. "p" for percentage and "a" for absolute. Second
value is the level to scale LEVS by. Thus SLEV=p,1 would
contour levels at LEVS * 1% of the image peak intensity.
Similarly, SLEV=a,1.4e-2 would contour levels at LEVS * 1.4E-2
Default is no additional scaling of LEVS.
Ignored in 'FDR' mode (the default).
Key: levs
Levels to contour for first image, are LEVS times SLEV
(either percentage of the image peak or absolute).
Defaults try to choose something sensible
Ignored in 'FDR' mode (the default).
Key: range
3 values. The pixel map range (background to foreground), and
transfer function type. The transfer function type can be one
of "lin" (linear), "log" (logarithmic), "heq" (histogram equal-
ization), and "sqr" (square root). See also OPTIONS=FIDDLE which
is in addition to the selections here.
Default is linear between the image minimum and maximum
If you wish to just give a transfer function type, set
range=0,0,heq say.
Ignored in 'FDR' mode (the default).
Key: cutoff
Flux density below which possible sources are ignored.
Default is zero.
Ignored in 'FDR' mode (the default).
Key: rmsbox
In 'FDR' mode (the default) this is the size of the 'smoothing'
box used when estimating the background and standard deviation
of the image. It is suggested that this be several to many times
the beam size to prevent sources from artificially skewing the
background estimates. This may require some experimentation,
and 'options=normimg' may be useful in determining the
effectiveness of a particular rmsbox size.
In the original implementation (options=oldsfind), it is the
size (in pixels) of a box around each source within which the
background rms is calculated.
Default is 20 pixels.
Key: alpha
This (real) number is the *percentage* of false *pixels* which can
be accepted when applying the FDR method. Alpha determines the
threshold set in selecting pixels which belong to sources (as an
alternative to a simple sigma-cut), prior to the source-fitting
and measuring step.
Must be a positive number. Default is 2.0 percent.
Ignored if options=oldsfind.
Key: xrms
In 'FDR' mode (the default) this parameter defines the
sigma-cutoff for the creation of the image sfind.sig if
'options=sigmaimg' is set. If not, it is ignored. It has no
role in the detection or measurement of sources. No default.
In the original implementation (options=oldsfind), it is the
multiple of the background rms value above which a source must be
before the user is given the choice of verifying it.
No default.
Key: device
The PGPLOT plot device, such as plot.plt/ps. No default.
Ignored in 'FDR' mode (the default).
Key: nxy
Number of sub-plots in the x and y directions on the page.
Defaults choose something sensible
Ignored in 'FDR' mode (the default).
Key: labtyp
Two values. The spatial label type of the x and y axes.
Minimum match is active. Select from:
"hms" the label is in H M S (e.g. for RA)
"dms" the label is in D M S (e.g. for DEC)
"arcsec" the label is in arcsecond offsets
"arcmin" the label is in arcminute offsets
"absdeg" the label is in degrees
"reldeg" the label is in degree offsets
The above assume the pixel increment is in radians.
"abspix" the label is in pixels
"relpix" the label is in pixel offsets
"abskms" the label is in Km/s
"relkms" the label is in Km/s offsets
"absghz" the label is in GHz
"relghz" the label is in GHz offsets
"absnat" the label is in linear coordinates as defined by
the header you might call this the natural axis label
"relnat" the label is in offset natural coordinates
All offsets are from the reference pixel.
Defaults are "abspix", LABTYP(1) unless LABTYP(1)="hms"
whereupon LABTYP(2) defaults to "dms" (for RA and DEC).
Ignored in 'FDR' mode (the default).
Key: logfile
Log file name, default 'sfind.log'.
Key: options
Task enrichment options. Minimum match is active.
"fiddle" means enter a routine to allow you to interactively change
the display lookup table. You can cycle through b&w and colour
displays, as well as alter the transfer function by the cursor
location, or by selecting predefined transfer functions such as
histogram equalization, logarithmic, & square root.
"wedge" means that if you are drawing a pixel map, also draw
and label a wedge to the right of the plot, showing the map
of intensity to colour
"3value" means label each sub-plot with the appropriate value
of the third axis (e.g. velocity or frequency for an xyv ordered
cube, position for a vxy ordered cube).
"3pixel" means label each sub-plot with the pixel value of the
the third axis. Both "3pixel" and "3value" can appear, and both
will be written on the plot. They are the average values when
the third axis is binned up with CHAN. If the third axis is
not velocity or frequency, the units type for "3VALUE" will be
chosen to be the complement of any like axis in the first 2.
E.g., the cube is in vxy order and LABTYP=ABSKMS,ARCSEC the units
for the "3VALUE" label will be arcsec. If LABTYP=ABSKMS,HMS the
"3VALUE" label will be DMS (if the third [y] axis is declination).
"grid" means draw a coordinate grid on the plot rather than just ticks
"noerase" Don't erase a snugly fitting rectangle into which the
"3-axis" value string is written.
"unequal" means draw plots with unequal scales in x and y. The
default is that the scales are equal.
"mark" When source has been found, and user has agreed that it is
real, mark it with a cross.
"nofit" Prevents the program from fitting elliptical gaussians to each
source. The data given on each source will be that from a
bi-parabolic fit, as per the earlier version of sfind. Note that
flux densities from this fit are bi-parabolically fitted *peak*
flux densities, and the positions are to the peak flux density
position (which will always be within 1 pixel of the brightest
pixel in the source). This option is useful for providing a starting
point for groups of sources which the gaussian fitting procedure
hasn't taken a liking to.
"asciiart" During the interactive section of the program, an ascii
picture of each source is displayed, showing which pixels have
been used in the gaussian fitting procedure. The brightest pixel
in the source is symbolised by a "O", the rest by asterisks.
This option is ignored if "nofit" is being used.
"auto" The interactive section of the program is bypassed, and
all detected sources are flagged as real. The image is not
displayed.
This is set automatically in 'FDR' mode (the default) and it is
only necessary to select it manually if using 'options=oldsfind'
(see below).
"negative" The map is inverted before source detection and fitting,
ie, positive pixels become negative and vice versa. This is
to enable detection of negative sources without recourse to MATHS.
This feature may be used for detecting sources in polarisation
maps.
"pbcorr" Corrects the flux density value calculated for each source
for the effect of the primary beam attenuation. This is dealt with
correctly for mosaics as well as single pointings.
"oldsfind" Use this to run SFIND as the original implementation
for the interactive interface, or just consistency with earlier
measurements.
"fdrimage" An output image called sfind.fdr will be created
with pixel values of 100, if their p-values are below the FDR
threshold, or 0 otherwise.
Ignored if 'oldsfind' is present.
"sigmaimg" An output image called sfind.sig will be created
with pixel values of 100, if their sigma-values are above
the user specified threshold from 'xrms,' or 0 otherwise.
Ignored if 'oldsfind' is present.
"rmsimg" An output image called sfind.rms will be created
where the pixel values correspond to the rms noise level
calculated when normalising the image.
Ignored if 'oldsfind' is present.
"normimg" An output image called sfind.norm will be created
by subtracting a background mean from the input image and
dividing by the standard deviation. The mean and sigma are
calculated in regions of size 'rmsbox' tiled over the image.
Ignored if 'oldsfind' is present.
"kvannot" As well as the regular log file (always written)
create a kview format annotation file, called 'sfind.ann'
containing one ellipse per object, with the appropriate
location, size, and position angle.
"fdrpeak" The default for source measurement is to use only pixels
above the FDR threshold in measuring the properties of
sources. (This is analogous, in SExtractor, for example,
to having the detect and analyse thresholds at the same level.)
In some cases it may be desirable to allow fitting of sources
where the peak pixel is above the FDR threshold, but other
source pixels are not required to be. This is the case for
obtaining reasonable measurements of sources close to the
threshold. Selecting 'fdrpeak' allows this. If 'fdrpeak'
is selected, source pixels are still required to be contiguous
and monotonically decreasing from the peak pixel, but not
necessarily to be above the FDR threshold.
"allpix" Rather than selecting pixels for the gaussian fitting
by requiring they be monotonically decreasing away from the
peak pixel, this option allows all FDR-selected pixels
contiguous with the peak pixel to be fit for a source.
If this option is selected, the fdrpeak option is ignored.
"psfsize" Restricts the minimum fitted size of a detected
source to the size of the sythesised beam, i.e., the PSF.
Any source fitted to have a smaller size than this has its
FWHMa and PA set to those of the synthesised beam, and is refit
for the position and amplitude only.
Some common combinations of options I have used (for examples):
options=kva,fdri,norm,sig
options=old,mark,ascii
options=old,auto
Key: csize
Two values. Character sizes in units of the PGPLOT default
(which is ~ 1/40 of the view surface height) for the plot axis
labels and the velocity/channel labels.
Defaults choose something sensible.
Ignored in 'FDR' mode (the default).
Bugs:
In FDR mode the code has problems with very large images. I
think this is dependent on the available memory of the machine
running Sfind, but need to do more testing to be certain. I
have confirmed that on a machine with 256MB of memory and 256MB
swap space, an image of 3600x3600 pixels will be analysed correctly.
For larger images, the code will halt unceremoniously at the
first call to memfree in subroutine fdr. I don't understand
why this happens, although I am guessing it may have to do
with the call to memfree trying to free more memory than is
available.
The output is designed to print source fluxes in FORTRAN format
f8.3 and f9.3 for peak and integrated flux densities respectively.
This means that if your source's peak flux is > 9999.999 mJy, (ie
10 Jy) or its integrated flux is > 99999.999 mJy (ie, 100 Jy),
then it will not be displayed properly. People detecting very bright
sources - you have been warned!
If 'options=fdrimg', 'sigmaimg', or 'normimg' are used with a subregion
of the image (specified by the 'region' keyword), the output images
are made the size of the *full* input image. This retains the original
masking information, has zeroes outside the regular bounding box
of the selected region of interest, and the relevant output within.
This does not affect the analysis (which is all performed within
the bounding box of the regular region of interest), only the
output images.
The following comments refer to the original SFIND implementation.
The FDR implementation is much more robust to finding faint sources
close to bright sources, however since the gaussian fitting process
is the same, the comments about noise or morphology are still relevant.
The gaussian fitting procedure can at times be temperamental. If the
source lies in a noisy region of the map, or close to another bright
source, or is simply of a morphology poorly suited to being fit by
gaussians, firstly the source may not be detected at all, and if it
is, the quoted errors on position and flux density can be extremely
high (often displayed in the output as a row of asterisks due to the
vagaries of FORTRAN).
In many of these cases, the given values of flux density and position
are still quite reasonable, (perhaps with errors an order of magnitude
larger than would otherwise be typical), but user discretion is
advised. No responsibility is taken by the programmer(s) for loss
of life or property caused by taking the results of this program
under these conditions too seriously, or by frustration generated
by the use of this program under any conditions.
Additionally, for unresolved sources, the "integrated" flux
density quoted may be less than the peak flux density. (This occurs if
the fitted size of the source, proportional to bmaj x bmin, is a
smaller gaussian volume than that of the beam.) In this situation it
is suggested that the peak flux density be used.
ggestions for believing in a source or not:
If a source is close to being indistinguishable by eye from the
background there are a few rules of thumb to help determine whether
the gaussian fit is telling the truth about a source, or whether the
source is even real.
1) If the pixels used in the fit are widely scattered (as opposed to
comprising a nice contiguous group) the fit will probably not be
very good and/or will not be a good description of the source.
2) Check the fwhms and the position angle, and compare it to the
pixels used in the fit. (Remember these values are in arcsec for
the fwhm and degrees for the pa, while the ascii picture is in
pixels). If these obviously do not agree, then the fit was poor
and the source is probably not real.
3) Check the rms of the background. If this is high then firstly
the fit may not be good (as per 1), and secondly the source is in
a noisy area and should be treated with caution anyway.