Dienstag, 25. Dezember 2018

finding Cepheids through GDR2

Goal:

can one use GDR2 only to select a type I Cepheid sample for targetting; the basic idea is: 
a) they must be luminous absK<0, they must vary >0.3mag peak-to-peak, and they are blueish.
Dust extinction is the bain, of course.

How can one define 'photometric variability' in GDR2?

Via the photometric noise: sqrt(g.phot_g_n_obs)/g.phot_g_mean_flux_over_error


across the many epochs:




This depends on G-band magnitude, but for G<18 the "actual photon noise" is small:





There are many sources with excess noise, which at the bright end turns out to be most
commonly intrinsic variability:


What does Gaia DR2 "variability" quantify?



that looks good! For a sinusoid: 4*rms = peak-to-peak.

Query:

SELECT top 70000 * , sqrt(phot_g_n_obs)/phot_g_mean_flux_over_error as variability
FROM gaiadr2.gaia_source
WHERE
bp_rp < 2.
and
sqrt(phot_g_n_obs)/phot_g_mean_flux_over_error > 0.08
and
phot_g_mean_mag < 17.
and
b between -10. and 10.
and
phot_g_mean_mag - 1.75*bp_rp < 13.
and
parallax+parallax_error < power(10.,(10.-(phot_g_mean_mag - 1.75*bp_rp))/5.)

Rationale: 
-- vary by >0.32mag peak-to-peak
-- have a 'predicted' W1mag (== phot_g_mean_mag - 1.75*bp_rp) < 13.
-- have an W1 abs mag < 0: parallax+parallax_error < power(10.,(10.-(phot_g_mean_mag - 1.75*bp_rp))/5.)

That yields 27000 candidates, of which 6000 have good astrometry...
Of the 700 GaiaDR2 Cepheids, 95% get picked up that way; the rest is all typeII Cepheids.

The query output is named Cepheid_searches_GDR2only_v3-result.fits
and can be found here:
https://www.dropbox.com/s/7xwsnvruk8an58w/Cepheid_searches_GDR2only_v3-result.fits?dl=0

This is what the distribution looks like  [NB: we can do better with the X-axis by taking a spectroscopic-survey-trained estimate of Teff, derived from Gaia,2Mass & WISE photometry]


In this plot, I have used (J-K) - 0.25*(G-K) as a self-dereddened color; Ideally, I'd like to have Teff as the X-axis. YST to the rescue?

Now we need to look at the contamination by other types of variable stars (RV Tau, W Vir, RRL):


We do this by looking at "dereddened color" vs "abs. mag." (however lousy):



and compare this to the Cepheids from Gaia DR2 (in blue):


If we then plot the lump centered on (0.2,-5), 825 sources, they look like this


And the possible contaminant's sky distribution looks like this:


are these stars (at (0.1,0) in the color-absmag plane) where the instability strip crosses the main sequence? (Delta Scuti?)






Update January 25, 2019

I have done a broader candidate selection, Yuan-Sen Ting has then estimated their T_eff, as follows: we train a neural net on all APOGEE stars to predict T_eff(APOGEE) from BP,G,RP,J,H,K,W1,
and apply it to the candidates. Initial cross-validation indicates a precision of ~250-300K; in the range 4000K t o 8000K.

With this, one gets a candidate set that looks like this:


This shows the different classes of variables even more nicely: luminous red variables at 3500K, (presumed) RV Tau (at 4600K), and (presumed) RRL (& beta Ceph, whatever) at (6200K,0).



Comparison with the Gaia DR2 paper Cepheids (black) shows where the classical Cepheids should lie (and shows Gaia DR2's misclassification rate).

That suggest to select (in a more stringent fashion) like this


which leads to an on-sky distribution like that:

This sample selection includes (>90%) if the Gaia DR2 Cepheid I, and basically all Cepheid I selected by variability from WISE in a recent 2018 paper (incl de Grijs,  check reference).
It will be interesting to see what the stars near the GC (|l|<45) are.



Addendum (Feb 18, 2019)

HWR is discovering that there are analogous variability measures in BP and RP. Taking the 
"Cepheid candidate take 7" sample, it's fun to look at the distribution of the ratio of variability
in BP and RP. This should be followed-up.



Note that this is a funky X-axis -- the sqrt(phot_rp_n_obs) is missing. The unmarked stripe  at (0.015,1.5) are RV Tau (what the f...).

hot-star selection (for SDSS-V) addendum

This is a revised attempt to select massive stars, based on the dereddened color
(J-K) - 0.25*(G-K); which should be < 0.03, along with absK<0.

The query is here:

SELECT        top 1500000 g.source_id,g.ra,g.dec,g.l,g.b,g.parallax,g.parallax_error,g.pmra,g.pmra_error,g.pmdec,
              g.pmdec_error,g.astrometric_n_good_obs_al,g.astrometric_excess_noise,g.astrometric_chi2_al
              ,g.phot_g_mean_mag,g.phot_bp_mean_mag,g.phot_rp_mean_mag,g.phot_variable_flag,g.teff_val,g.a_g_val,
              tm.j_m,tm.j_msigcom,tm.h_m,tm.h_msigcom,tm.ks_m,tm.ks_msigcom,g.phot_g_mean_mag - tm.h_m AS g_min_h,
 sqrt(g.phot_g_n_obs)/g.phot_g_mean_flux_over_error as variability,
 sqrt( g.astrometric_chi2_al / ( g.astrometric_n_good_obs_al - 5)) as astrometric_quality
FROM gaiadr2.gaia_source AS g
INNER JOIN gaiadr1.tmass_best_neighbour AS xmatch
 ON g.source_id = xmatch.source_id
INNER JOIN gaiadr1.tmass_original_valid AS tm
 ON tm.tmass_oid = xmatch.tmass_oid
WHERE  b between -15 and 15
and
( (tm.j_m - tm.ks_m - 0.25*(g.phot_g_mean_mag - tm.ks_m) < 0.03) or (g.phot_g_mean_mag - tm.ks_m < 1.5) )
and
g.phot_g_mean_mag < 18.
and
g.parallax+g.parallax_error < power(10.,(10.-tm.ks_m)/5.)
and
tm.ks_m < 12.




the basic sample file is here:
  https://www.dropbox.com/s/ayryeaqfo5jrz8l/massive_star_selection_2MASS.fits?dl=0

and the color-selection is presented below

the (observed) color vs. absmag distribution


shows that we are still missing nearby blue objects (when compared to the literature); this must be a query problem:



I cleaned via: good_astrometry (eliminates many), non-variable (at <0.02) and parallax SNR > 2


Some plots:





Let's look at the distribution in (observed) color vs distance?


compared to the literature (in blue)


What are the reddened stars in the bulge region?

B.t.w. there are some funny variables:



that only occur in the bar/bulge?


this smells like a data artifact; some other time...

and another footnote:

there's an artifact for stars at G=11mag




Sonntag, 23. Dezember 2018

(variability-based) YSO selection for SDSS-V

Starting point

In the previous post I made a proposal on how to select (low-contamination) samples of YSOs for SDSS-V targeting, through a combination of WISE W1-W2 excess, variability, and a parallax cut to eliminate backgrounds. This was based on:
YSO's can be discerned by (any combination of) the following observational properties:1)  their SEDs (0.5-20mum) are not-just-a-simple-photosphere (..disks, accretion, etc..)2)  they lie off the(ir) main sequence3)  many (most?) of them show some flux variability
4)  they are clustered in position and velocity space.

This (at first glance) seems to do very well at selecting YSOs (Class 0,I,II) that a) have a mid-IR excess (W1-W2>0.25), and are bright enough to show up in the Gaia catalog (G<18). But that leaves out later YSO phases (no W1-W2 excess), and thereby leaves out objects (Class III) that a portion of the SDSS-V YSO group care about. 

Here a propose a considerably broader YSO selection for SDSS-V (which encompasses the above approach), which is largely based on optical variability, but still seems to get low-contamination samples, though are more dominated by more 'mature' (low-mass) YSOs. To keep background contamination in check, the sample distance needs to be limited (e.g. to ~1kpc). Which subset of these YSOs are interesting enough to get targeted in the SDSS-V context needs to be sorted out.

The overall approach can be summarized as:

[ Gaia-detection (var>0.0x mag)  or  WISE-color-excess-Gaia-non-detections ] AND H~<12

where Gaia-detection may mean G<18 and parallax>x mas (x=0.3-1.5).

Variability Selection

Selection "Philosophy"

We are seeking YSOs that are a) bright enough to be well within Gaia's flux limit (G<18), b) are bright enough in H-band to be sensibly observed within the SDSS-V context, c) and are YSO's in the sense that they have not yet reached the main sequence of their Minit.
"Variability" in Gaia DR2 is defined as sqrt(g.phot_g_n_obs)/g.phot_g_mean_flux_over_error  i.e. via the photometric excess noise.

Query

I run on ESA's Gaia DR2 server, with a distance cut at 1kpc (TDB, g.parallax - g.parallax_error > 1) ; 

SELECT  * , sqrt(g.phot_g_n_obs)/g.phot_g_mean_flux_over_error as variability
FROM gaiadr2.gaia_source AS g
INNER JOIN gaiadr2.allwise_best_neighbour as xaw
    ON xaw.source_id = g.source_id
INNER JOIN gaiadr1.allwise_original_valid as allwise
    ON xaw.allwise_oid = allwise.allwise_oid
WHERE
g.phot_g_mean_mag < 18.
and
sqrt(g.phot_g_n_obs)/g.phot_g_mean_flux_over_error > 0.02
and
sqrt( g.astrometric_chi2_al / ( g.astrometric_n_good_obs_al - 5)) < 2.
and 
allwise.w1mpro < 12.5
and
g.parallax - g.parallax_error > 1.

See  here for an explanation of the query "philosophy".  I subsequently make a cut to H<12.5, and excise sources with W1_error or W2_error > 0.05 (TBD: is this necessary?).  I also run -- on a subset of the sky -- the exact same query, just insisting that variability < 0.02.
This yields a set of 93.000 stars; query output (i.e. "the sample" is here as a fits file).

The following Figure shows the distribution of this sample in the mid-IR excess (W1-W2) vs variability space. There are three regimes: W1-W2>0.25 (Class I,II objects), a group of low-variability objects at W1-W2~0.1, and the dominant plume at W1-W2=0 (stellar photosphere only?).

W1-W2 vs variability (for H<12.5, D<1kpc). Three regimes are apparent: sources with mid-IR excess (W1-W2>0.3);  (the dominant subset of sources with simple "photospheric" colors (W1-W2~0), and the (low-mass) sources with W1-W2~0.17.

CMD Distribution of Variability Selected Stars

Obviously, youth or YSO-ness is not the only source of >0.02mag variability among stars within 1kpc, they pulsate, eclipse, etc.. I certainly had no idea or preconceived notion of which fraction of the H<12.5 D<1kpc var>0.02mag stars may be YSO. The availability of parallaxes allows us to put these stars onto a CMD.

Let's stars with looking at the complementary set: (a subset of) stars H<12.5 within D<1kpc that does NOT vary (at the rms 0.02mag level):

Non-varying stars (at <0.02mag level) with H<12.5 (G<18) and D<1kpc(plx>1mas). Overplotted are (Padova) isochrones of log(t)=6.6 -- 9.6 in steps of 0.5 dex (4Myrs,12Myrs,40Myrs, etc..); this all looks nice and "boring".


Once can see a hint of the binary sequence, but otherwise this looks like a 3-10Gyr old population (to me). As expected, the combination of geometric survey volume and magnitude limit, prefers a certain stellar luminosity (those still bright enough to make the magnitude cut at the maximal distance).

Now let's look at the analogous query, but ALL stars that vary by variability>0.02mag.

Stars H<12.5,D<1kpc and variability > 0.02. The vast majority of them lie above the MS!
The following is the same plot as above, increasing point size to show that all the very reddened YSOs are still in here...

As above, but showing the sparse parts of CMD space.


Remarkably, the vast majority of these stars lie above the (old) main sequence, around the 4-40Myrs isochrones.  That patterns becomes even more distinct, if we look at the stars that vary by at least 0.05mag rms, as shown here; insisting on larger variability also selects against low-mass YSOs (?).

As above, but restricted to stars varying (rms) >0.05mag.


If that is true, then the majority of stars (H<12,D<1kpc) that vary by 0.02mag (or certainly 0.05) mag are YSOs (??!!??). Can someone educate me whether that can be true?

Obviously, this sample include all kinds of other variable stars, but they seem to be a modest fraction.

-------------------------------------------------------------

Aside: Variability Selection in the Orion Region

If I take I select all stars form Gaia (G<18.5) within 190<l<215 and -26<b<-8 and 2<parallax<3,
I get this



I can now split this in the non-variable sources (variability < 0.02mag; 53.000 sources)



and variable ones (variability > 0.02mag; 7000 sources)


and those 2000 with variability > 0.05

And now an aside on the aside: this is all for PMS stars (in Orion absG>4-ish). The more massive stars in Orion (absG<3, parallax and PM selected)  that have presumably reached their MS show no variability:  shown as larger symbols (color-coded by variability) on top of the lower-mass var > 0.02 background.



Seems all pretty neat.

[end of aside] -----------------------------------------------------


On-Sky (and Parallax) Distribution of these Stars

If that interpretation is correct that variability >0.02 is an efficient (both reasonably complete and pure) YSO selector, then this should be reflected in the sky distribution.

Let's start with the "easy" case: 2300 YSOs with distinct W1-W2 excess (>0.2) (and H<12.5 and D<1kpc) shown as a sky map woth parallax as color-coding.

W1-W2>0.2 stars within 1kpc that have H<12.5 and D<1kpc.

As above with larger dot sizes to show the correlation between position and distance.

This seems to give a very clean sample, as before here , just restricted to D<1kpc.

Let's now contrast that with the control sample that has no (<0.02mag) variability.



.. a nice smooth on-sky distribution, with most stars near the geometric sample limit, 1kpc.

Now what about the stars that have no W1-W2 excess (actually all variable stars, most of which have not W1-W2 excess). Their sky distribution looks like this (var > 0.02mag)

Alls stars (H<12.5, D<1kpc) that vary at >0.02mag, color-coded by distance. The two vertical features must be data artifacts.


and like this when restricting to var > 0.05mag, or like this

Question: what is that warped configuration? Gould's belt? I have no idea... Tell me what paper to read.


when restricting to D<500pc.   If we take the subsample with small mid-IR excess (W1-W2~0.18) the sky distribution looks like this:


Again showing the stronger spatial clustering of younger objects. (my conjecture)

Next steps verification:

My current conclusion is that variability alone is very effective at picking out objects too young to have settled on the MS of their mass. If this is a useful definition of YSOs, then most of them are YSOs. The ones with strong mid-IR excess (W1-W2) are very tightly clustered. The ones with W1-W2=0 and low-variability (0.02-0.05mag) have a considerably smoother sky distribution. If many of them are 30Myr+ old, this may not be surprising.

Questions to all: 

  1. What needs to be done to verify this?
  2. What's the interest in young (PMS) "field" stars?
  3. Please pay with the query results, as a proposed sample file to draw from. ( https://www.dropbox.com/s/de832x5p78q6y8h/GDR2_var%3E0.02_H%3C12.5_G%3C18.fits?dl=0 )
  4. What's the best way to augment all of this by WISE-selected (Class I) sources, that don't show up in Gaia, but have H<12? Of the MANY towards the Galactic center, which are interesting? 
==============================================================

Another aside on: which sources does a simple criterion W1-W2>0.25, H<12 select, and of these, which sources is a Gaia variability selection missing? 

And, is it enough to get all the ones that have H<11 through the GalacticGenesis program anyway? [Should those get a priority flag?] The plots below show ALL W1-W2>0.25, H<12. sources.


Here is an approximate map of those sources that are NOT in Gaia



and here is their galactic latitude distribution quantified: they are almost ALL exactly in the Galactic plane.

I.e. there is a modest number of such (missed sources, not in Gaia 11<H<12) sources in Orion, but the vast majority  of them are inner disk (within 1deg of the Galactic plane). What should we do about them?

Let's look at the Orion region, defined as



There are 1127 sources that pass H<12 and W1-W2>0.25. Of those, 1085 (97%) are in Gaia, and 979 (87%) bright enough to be included in the variability selection. Do we need to address those?

end of aside
====================================================================

Implications for SDSS-V target selection:

This picks out nearly 100k YSO/Young stars targets, which is more than we can target. 
 My proposal for SDSS_V YSO targetting: let's make the sample definition:
-- all stars with H<12.5, and W1-W2>0.25 and D<5kpc
and
-- all stars with H<12.5, variability>0.02mag  and D<1kpc

Or put differently:
-- all stars with H<12.5, variability>0.02mag  and D<1kpc
augmented by 
-- all stars with H<12.5, and W1-W2>0.25 and 1kpc<D<5kpc

Implicit is G<18, and we then need to set a targeting priority, where priority decreases as  W1-W2 decreases. E.g. if "second priority" are the targets with slight W2 excess, we still zoom in on clusters.


Samstag, 22. Dezember 2018

YSO selection for SDSS-V

Towards defining YSOs targets for SDSS-V

Science goals:

Previous post, dealt with identifying samples of stars that are massive (>5M_sun). I'd like to explore what can be done interms of an "algorithmic, all-sky" sample definition for (lower-mass) YSOs.

YSO's can be discerned by (any combination of) the following observational properties:
1)  their SEDs (0.5-20mum) are not-just-a-simple-photosphere (..disks, accretion, etc..)
2)  they lie off the(ir) main sequence
3)  many (most?) of them show some flux variability
4)  they are clustered in position and velocity space.
5)  dominant selection contaminants are dust-reddened giants of various sorts.


To start, I took a simple stab at combining the aspects 1), 3) 5)  Basically, I tried to find the objects that have YSO-like WISE colors (i.e. W1-W2 significantly > 0), that vary on year-timescales by a > few percent, and are within 5kpc (to cut out background). 
What I do requires some detection in Gaia (for starters I took G<18);
this clearly will miss seriously embedded sources.

This approach uses two aspects of Gaia that are perhaps non-obvious:
-- already now, Gaia is a high-precision, all-sky variability survey: 
    at the moment one gets only the rmsvariability amplitude
    on timescales of a year, encoded in variability == sqrt(N_obs)*flux_error/flux 
    (needs correction at the faint end).
-- and while the parallax measurements for such WISE-color-selected samples 
   will be insignificant for much of the sample,  plx/plx_error < 1, this is still very   
 informative: e.g. at plx_error < 0.2mas, plx/plx_error < 1 means you are NOT nearby.

Having said that, I queried GaiaDR2 x WISE for  
 variability>0.05mag,W1-W2>0.2Gaia_G<18  and no astrometry flags.
When plotted in WISE color space (excising bad W1 and W2 photometry, 
left-nad panel)  this selection yields 26.000 objects:

Stars selected by variability>0.05mag,W1-W2>0.2Gaia_G<18
These stars have an interesting distribution in variability amplitude.

Then I sub-select the objects that lie within 5kpc from the Sun (at least at >1sigma confidence),
i.e. parallax - parallax_error > 0.2mas; this is eliminate background; indeed, this cleans the large majority of objects, leaving only slightly over 5000.  In WISE color space,  this leaves a distinct population (the blob around (1.7,0.6):

Subsample from above, selected to be within 5kpc

If I then plot the objects surviving this simple procedure on the sky they look pleasing to my eye
(color-coding is the distance ), when trying isolate a sample of YSO within 5kpc:

This is the on-sky distribution of the stars, selected as above, color-coded by distance.
with a zoom-in on the Orion region (different distance color-scheme)



What seems good to me is:
1) the prominent nearby SF regions are there;
2) there is no apparent bulge/disk contamination;
3) the parallaxes indicate little contamination (i.e. patches in (l,b) have very similar parallaxes); note
that except for requiring parallax - parallax_error > 0.2mas, parallaxes have not entered the selection.

The query that created the sample is:

SELECT * , sqrt(g.phot_g_n_obs)/g.phot_g_mean_flux_over_error as variability
FROM gaiadr2.gaia_source AS g
INNER JOIN gaiadr2.allwise_best_neighbour as xaw
    ON xaw.source_id = g.source_id
INNER JOIN gaiadr1.allwise_original_valid as allwise
    ON xaw.allwise_oid = allwise.allwise_oid
WHERE
g.phot_g_mean_mag < 18  /* magnitude cut, so that variability at 0.05mag can be established */
and
sqrt(g.phot_g_n_obs)/g.phot_g_mean_flux_over_error > 0.05/* variability selection rms > 0.05mag */
and
sqrt( g.astrometric_chi2_al / ( g.astrometric_n_good_obs_al - 5)) < 2. /* weed out bad astrometry */
and
allwise.w1mpro - allwise.w2mpro > 0.25  /* stay away from the boring-star-locus in WISE */
and
g.parallax - g.parallax_error > 0.2  /* stars that are within 5kpc; to weed out background */

Comments in blue must be removed before the query runs

[I did do poor-WISE-photometry-cleaning afterwards]
The resulting output file can be found at:
https://www.dropbox.com/s/1ylqx7c9f4kmkbd/YSO_subset_v2_of_var%3E0.05%3EW1-W2%3E0.25.fits?dl=0

Next steps and action items for the working group:

Obvious limitations:

By construction, the above procedure only selects YSOs with W1-W2>0.25 and G-band variability > 0.05mag, and G<18; i.e. YSOs that do not show significant "W2 excess" and lie near the normal stellar locus, and those that don't vary (by 0.05mag) will not be captured; and those
with Gaia-G > 18. The first aspect means that Class-III objects and more mature PMS phases (sorry about the term) will be missed; the second aspect means severely reddened Class 0 objects will also be absent. This may, will and can lead to an augmentation of the above query.

For such objects, their variability and their position in the CMD must play a larger role; I have started to explore this, and will post next.

Verification steps:

-- do known YSOs, whose established measurements should satisfy the above criteria, show up in the sample? Does the query pick what it purports to. I.e. what falls through the cracks because of crowding, but would still make a good fiber target?
-- what is the overlap/mismatch with established catalogs?
-- can we use existing spectral surveys (incl. SDSS-IV) to verify the purity of the selection?