Dienstag, 7. Juli 2020

thoughts on new, but potentially misclassified, classical Cepheids (I20+ sample)

On simple lightcurve classification to for classical Cepheids in the context of the Inno+20 draft


Starting point:

We have ~800(?) new classical Cepheid candidates from I+20. A good number of them have been recognized as variables in other surveys, but classified as different sources. The main other classifications are:

a) eclipsing binaries (EB)

HWR's 'by-eye' distillation of lightcurve characteristics that look non-Cepheid-like:
  • rise and fall are symmetric; i.e. flipping the time axis would leave the light curve invariant
  • the downward deviations (from the median) are larger than the upward deviations



b) W Vir (CWA & CWB)



HWR's 'by-eye' distillation of lightcurve characteristics that look non-Cepheid-like:


  • rise-time longer than fall-time (only CWA's); opposite to DCeph
  • have not discerned lightcurve characteristics in CWB that look different from DCeph

c) rotators (ROT)


HWR's 'by-eye' distillation of lightcurve characteristics that look non-Cepheid-like:


  • Fourier decomposition "wiggly", i.e. much power in the 4th - 7th order components
  • scatter of individual points from Fourier fit large, compared to Fourier amplitude; i.e. basic periodicity + much "noise"

HWR's Basic Impression

  • Indeed, many of the "new candidates" are different types of variables.
  • There should be a few simple lightcurve criteria to add/modify that should greatly reduce the contamination. These are spelled out below.

Possible steps to implement:

Diagnosis:

  •  let's re-check whether the 'contaminants' do not lie in funny corners of our current lightcurve-shape space (A21,A31,phi21,phi31) that can be cut at little loss to the DCeph completeness.
  • the plot below is from an OGLE paper, and sats e.g. that really most CWA's should lie elsewhere in Period phi21,phi31 space

  • Can you make plots a la Figure 3 in the draft, with all the externally confirmed DCeph as pale grey points, and the location of the "new candidates" that are classified as "others" by others as coloreds points (as separate colors, or in separate plots for EB, ROT, CWA & CWB).

Possible simple new light curve criteria

Here is a proposal how to use the 7th-order Fourier representation of the light curve to calculate a few other statistics that may be very effective at weeding out contaminants (at little completeness loss).
Let's call that function F7(p), where p=phase within [0,1].

Let me suggest to calculate from the analytic form F7(p) the following quantities:
  • mM: the median magnitude,  so that 50% of the period the source is brighter than mM, according to F7(p)
  • sig-mM: the variance < (F7(p)-mM)^2 > calculated over the part of the light curve where F7(p) is fainter than mM; and 
  • sig+mM: the variance < (F7(p)-mM)^2 > calculated over the part of the light curve where F7(p) is brighter than mM. [I think in all cases a "primitive" splitting of the period in say 1000 bins, and doing all of this brute force, should be fine.]
  • f-mM: the fraction of the period (according to F7(p)) where the magnitude is fainter than mM
  • f+mM: the fraction of the period (according to F7(p)) where the magnitude is fainter than mM
  • f_rise: that is the fraction of the period in which (according to F7(p)) the light curve rises; and 
  • f_fall: .. where it falls
  • scatter: the rms of the deviation of the data points from F7(p) (in mag), normalized by the rms of F7(p) itself.
  • n_max: the number of maxima that F7(p) has within a period: this should be 1 for smple light curves, but 3-5 for wiggly ones. 
So, that seems like a lot; but let's just explore them right now.
I think this is better than "machine learning" classification, because it depends less on the "data quality" the sampling rate etc..

Here's my propsal what to look for in diagnostic plots:

  • weeding out EBs:  plot  ( sig-mM / sig+mM ) vs f-mM, for verified Cepheids and verified (or externally classified) EBs. I would suspect that for EB's ( sig-mM / sig+mM )  is greater than for DCeph, and f-mM is smaller; this does not yet capture the time-symmetry of EB lightcurves
  • weeding out rotators:   plot scatter vs n_max (as defined above); I would suspect that for rotators scatter and n_max are larger (for DCeph vs ROT)
  • weeding out CWA: plot f_rise vs phi21  for DCEPH vs CWA/CWB