how MOSAIKS works

The basic idea of MOSAIKS is to seperate users from the costly and difficult process of transforming imagery into inputs (called “features”) for a machine learning algorithm (images → X). We do that part, so users never have to download or manage imagery. Users then download a table of MOSAIKS features (X), link them to their own geocoded data on the outcome (Y) they are interested in (called “labels”) and run a linear regression to predict their labels using MOSAIKS features (Y = Xβ).

All users use the same MOSAIKS features and just match them to their labels based on location. Users can run their analysis on any statistical software they are comfortable with. For most applications, the computing demands will not require users to work with specialized machines, since desktops and laptops work.

MOSAIKS works because MOSAIKS features captures a huge amount of information about the colors, patterns and textures that show up in satellite imagery. We don’t know what patterns/colors/textures will be important for the application that users have (since we don’t know what applications users will try), so we just try to capture all of them. The purpose of the regression step is to teach the model which patterns/colors/textures predict the labels, and then to use that understanding to make predictions in locations where users don’t have labels. In addition, MOSAIKS encodes image information in a way that allows for nonlinear relationships between labels and images, even though the regression that users implement is a linear regression.

For users, the procedure for using MOSAIKS has five steps (corresponding figure from Rolf et al. is below):

  1. Download MOSAIKS features (X) in the areas where you have labels.

  2. Merge the features with your labels (Y) based on location (so features at position P are linked to labels at position P).

  3. Run a ridge regression of your labels on the MOSAIKS features (Y = Xβ).

  4. Evaluate performance.

  5. Use the results of the regression model (β) to make predictions () in a new region of interest where you do not have labels, using only the MOSAIKS features that correspond with those new locations.

We’ve found that MOSAIKS works well across diverse prediction tasks (e.g. forest cover, house price, population, road length, elevation, income) and it achieves accuracy competitive with deep neural networks at orders of magnitude lower computational cost.

If you are interested in using MOSAIKS, you can also see our tutorial and slide deck. And if you have a little bit of time, we recommend reading the paper we wrote when we introduced the system. We wrote it with users in mind, so we tried to make it as clear and accessible as possible.

If you use MOSAIKS, please reference: Rolf et al. “A generalizable and accessible approach to machine learning with global satellite imagery.” Nature Communications (2021).