Starting point
Maosheng Xiang und Sven Buder identified a GALAH DR2 - LAMOST DR4 overlap sample that allowed to attempt a many-label transfer, using Yuan-Sen Ting's data-driven, physical-prior-constrained Cannon-Payne hybrid. That resulted in 6.3M LAMOST objects with assigned labels:
Initial diagnostic plots
HWR made some initial plots, that basically showed that
- in many cases everything looks plausible
- there many spectra where the assigned labels (even the stellar parameters) must be spurious, for a number of reasons. The two main ones are:
- there are LAMOST data problems, such as 'no blue spectra', etc...
- Action on Maosheng: put a flag on all objects, where there are serious data problems the labels fall outside of the training range. This is illustrated here (6.3M LAMOST labels in grey; training set as colored dots)
The immediate reaction is to restrict all subsequent science analyses for the time being toobjects that fall within the training range. [Neige Frankel raised the important point that not all LAMOST objects whose labels fall within the training range, have 'true' labels within that range'. This may be illustrated with the following diagnostic plot from the low-metallicity regime:
Seemingly, most giants with FeH<-2 have poor chi2; possibly because the label-transfer interpolator cannot predict spectra well that are essentially featureless. That brings up instantly the question of whether the poor chi2 at FeH<-2 systematically distort their FeH estimatesNext Steps
On the verification, most of the next steps are actions on Maosheng. Those steps include:
- describe how (before the label transfer was done) the input labels were pushed to the isochrones, using Gaia information. Was that done? Is the approach published, else sketch it.
- devise and implement the flagging (or excising) of
- 'bad data'
- find algorithmic ways to identify spectra, whose label estimates may be distorted as a consequence of label-transfer extrapolation. There may be two ways: either split the training sample in half, and then make two predictions for extrapolated objects (i.e. pseudo-crossvalidation beyond the training set regime), ask how consistent the estimates are; or use chi2, which works in some ways, as this plots shows [color = log(chi2)?]
- Think about enlarging the training set. Specific proposal include:
- boost the overlap sample, e.g. by requiring that only logg,Teff,FeH,MgFe,(a 'few' others) are well-measured in GALAH
- Augment the training sample by synthetic, ab initio spectra (conceptually troubling, eh?), especially in the regime of hot stars (Kurucz Atlas models) or very metal poor stars FeH<-2 (see issue above.)
- Think about where we can get training labels of very cool MS stars (<4200K) from
Keine Kommentare:
Kommentar veröffentlichen