"Modelling" the Gaia DR3 Astrometric Orbits (a.k.a. learning astrophysics from complicated catalogs)
The Basic Idea
Gaia has produced many catalogs with constraints on interesting astrophysical phenomena. In particular, Gaia DR3 has produced a catalog of >130,000 "astrometric orbit solutions", where many-epoch observations of a sources have lead to an "orbit". Specifically, the light-centroid position of the source on the sky can be (well) fit by a combination of parallax, proper motion and a Keplerian orbit.
This catalog (lovingly dubbed gaiadr3.nss_two_body_orbit) should be a treasure trove on questions on how stars deal with their angular momentum during formation (binarity, or planetary system), on stellar evolution (what if one of the stars in a binary dies to become a WD, NS or BH)?
But it turns out that it is not easy to go from "What's (not) in the catalog?" to astrophysical questions such as "How often do low-mass stars have even lower mass companions?" "How common are giant gas planets at, say, 3 year periods, around stars of different mass?", "How common are brown dwarfs are companions?", or "How often are stars orbiting stellar-mass black holes?"
This catalog<-->question link is not easy for a variety of reasons. First, and foremost, no catalog contains everything. It usually (and sensibly) contains what the underlying experiment can detect or measure well. Second, acknowledging this incompleteness does not yet help. To learn from the catalog one must build a model that predicts the ensemble of catalog entries, and this model must incorporate the selection function (e.g. Rix et al 2022; and references therein for a recent exposition of this issue). The selection function specifies with what probability (often 0 or 1) an object of any properties (flux, temperature, distance, direction, etc..) would have ended up in the catalog or not. The selection function should be a function of variables that a) are "observables" and b) can be predicted by the model.
For "literature" samples, the selection function is hopefully specified in terms that lend themselves to modelling; else it may need to be (painstakingly) (re-)constructed. By comparison, specifying a "model" is often easy. E.g. for the scienc e theme discussed above, the model could be given by a single number: what is the probability pJ that a solar-type star in the Galactic neighbourhood has (as dominant companion) a Jupiter-mass object in a 3-year orbit.
To get specific, let's make a plot of the catalog in which the Gaia mission speaks about (astrometric) binary orbits. There are many things to be plotted about this sample, but the following Figure (showing all 134,000 sample members) cuts most to the chase.
The X-axis shows the inferred period, between 0.1 years and 10 years, with the prominent 'aliasing gap' at 1 year. The Y-axis shows the 'Astrometric Mass-Ratio Function' (AMRF), a clever quantify devised by Shahaf et al. ( 2019 ). It links the light-centroid motion (observable) to the motions of the primary and secondary. If the primary dominates, then AMRF becomes an approximate measure of the mass ratio, q.
So, interesting companions are either at very high AMRF (black holes, neutron stars ??) or at very low values (planets, etc..). And, the plot shows very little / nothing at extreme AMRF. What does this mean for the incidence of BH's and planets orbiting stars in the pertinent period range....?
That's the basic issue to be addressed.
The Basic Mathematics
Let's take as (interesting) toy case that we want to constrain is the rate at which solar-type stars have Jupiter-mass objects MJ in ~2AU orbits.
This is a special case of the following general model that constrains the probability "p" that a stars of a c ertain mass and metallicity has a planet of Mp in an orbit size "a" and eccentricity "e".
This model is now to be compared to the "data" (which is in essence the set of catalog entries in the "orbit catalog" above).
In addition to the light-centroid parameters ("lc"), we need the parallax (to convert milli-arcseconds to AU), the apparent magnitudes \vec{m} (which set the S/N; see below), and an estimate of the primary star's mass (which comes from other information), as Kepler's law needs that. The Gaia catalog contains such information for N_*=134,000 stars (see plot above).
The nest (and hard and crucial) step is to understand under which circumstances a system might have ended up in the catalog. Why this is crucial, we'll show below; let's for now just go with it.
For Gaia DR3 the following conditions have to be fulfilled (Halbwachs+2022).
These are S/N conditions on the parallax (varpi), and on the quality of the light-centroid semi-major axis (a_0). There is also a criterion on the eccentricity uncertainty. For operational reasons all these criteria depend on period.
The art, and bain, is to express the selection function S correctly. This function S is the probability that a system with certain physical characteristics -- stellar, and planetary mass, distance, period, etc.. -- would have passed the selection criteria. We have to be able to go to any (even just envisioned) star of certain mass distances etc., and answer: if that star had a planet of certain mass in a certain orbit, would it have resulted in a catalog entry.
Initial version of such an approach was worked out in El-Badry+2023, for the regime of massive dark companions); there it seemed that the parallax/S/N criterion was the most stringent. If we assume that the parallax uncertainty is (mostly) a function of parallax, brightness of the stars and semi-major axis of the orbit, then we can make the selection function a relatively simple function of "model-predictable observables" [More to say here].
This then leads to the following prediction of how many systems we should see in the catalog:
How to interpret this? To predict the number of catalog entries that imply a planet of mass M_p in an orbit of size a -- for a given model p(M_p,a) -- we need to sum over all stars (yes, ALL stars) and sum up their selection function. Of course the sum over all stars is silly, as for almost all stars in the Milky Way the selection function probability is S==0. So, in practice, it boils down to counting all stars in the catalog for which S==1 (in a given dM_P x da ) and comparing that to the number of actual catalog entries (using the binomial statistic).
The delightful aspect is that -- once the selection function is specified -- the exercise becomes "trivial".
Selection Function Nitty-Gritty
<here we'll eventually spell out how to determine and then apply a suitable selection function>