MetaLearning Challenges

MetaLearn 2022

We are grateful to Microsoft and Google for generous cloud unit donations and to 4Paradigm for donating prizes. This project is also supported by ChaLearn and HUMANIA chair of AI grant ANR-19-CHIA-00222 of the Agence Nationale de la Recherche (France). Researchers and students from Université Paris-Saclay, Universiteit Leiden, TU Eindhoven, and 4Paradigm have contributed. The challenge is hosted by Codalab (Université Paris-Saclay).

Machine learning has solved with success many mono-task problems, but at the expense of long wasteful training times. Meta-learning promises to leverage the experience gained on previous tasks to train models faster, with fewer examples, and possibly better performance. Approaches include learning from algorithm evaluations, from task properties (or meta-features), and from prior models.

Following the AutoDL 2019-2020 challenge series and past meta-learning challenges and benchmarks we have organized, including MetaDL@NeurIPS'21, we are organizing two competitions in 2022:

  • Meta-learning from learning curves (accepted to WCCI2022).

  • Cross-domain meta-learning (to be submitted to NeurIPS'22).

We are also planning to organize a workshop at ICML'22 (proposal to be submitted).

Contact us, if you want to join the organizing team.

Meta-learning from learning curves challenge

In this challenge, which is part of the WCCI 2022 competition program, we focus on meta-learning from algorithm evaluations. However, we consider multiple evaluations in the process of learning, i.e. “learning curves’’. Learning curves evaluate algorithm incremental performance improvement, as a function of training time, number of iterations, and/or number of examples. The aim of this challenge is to uncover meta-learning strategies that leverage information on partially trained algorithms, hence reducing the cost of training them to convergence. We offer pre-computed learning curves as a function of time, to facilitate benchmarking. Meta-learners must “pay” a cost emulating computational time (negative reward) for revealing their next values. Hence, meta-learners are expected to learn the exploration-exploitation trade-offs between:

  • exploitation = continuing “training” an already tried good candidate algorithm (requesting its next learning curve point) and

  • exploration = checking new candidate algorithms (asking its first learning curve point).

Data: We created a meta-dataset from the 30 datasets of the AutoML challenge, by running 20 algorithms with different hyperparameter, from which we obtained learning curves both for the validation sets and the test sets.

Protocol: This challenge is the first of a series on the problem of meta-learning from learning curves in which we are testing a novel two-phase competition protocol. During a development phase, participants submit agents to be meta-trained and meta-tested on all data, except the test learning curves of each task. During a final test phase, a scoring program computes the agent’s performance on the test learning curves, based on pre-recorded agent suggestions. Furthermore, the ingestion program runs a hold-out procedure: in each split, we hold out 5/30 datasets for meta-testing, and use the rest for meta-training.

Evaluation: The agent is evaluated by two metrics: 1) the terminal score on the agents’ learning curve, and 2) the Area under the agents’ Learning Curve (ALC). The values will be averaged over all meta-test datasets and shown on the leaderboards. The final ranking will be made according to the average test ALC.

Cross-domain Meta-learning challenge

In this challenge, we focus on end-to-end meta-learning: meta-learning algorithms are exposed to a meta-dataset consisting of several tasks from a number of domains, and must return a learning machine ready to cope with a new learning task, from a new domain.

Data: We are working hard to extend the MetaAlbum benchmark we started putting together last year. It will consist of 30 datasets from 10 domains. They are all image classification datasets, uniformly formatted as 128x128 RGB images, carefully resized with anti-aliasing, cropped manually, and annotated with various meta-data, including super-classes. Ten of those datasets will be revealed to the public in their entirety for practice purposes, ten will be used in the challenge feed-back phase, and ten of them in the challenge final test phase.

Protocol (tentative): We will introduce a novel challenge protocol. We are currently considering several possibilities. In each phase, out of n=10 datasets, perform either:

  • a hold-out validation procedure at the meta-level, by bulk meta-training on k datasets, leaving (n-k) datasets out for meta-testing (k is fixed to a given value); or

  • a continual learning procedure by incrementally meta-training on datasets j=1 to k, and meta-testing on the remaining (n-k) datasets (k varies from 1 to n).

In either case, the procedure would be repeated for multiple dataset orders, each time re-setting the memory of the meta-learning algorithm; average and standard deviation of the results would be computed.

We are also considering to organize 2 tracks:

  • A model-centric track in which participants submit algorithms/agents capable of meta-training and returning a meta-trained learning machine (which can then learn a new task). The "data loader" would be provided and fixed.

  • A data-centric track in which participants submit a data loader supplying training examples in any way they like, based on available training data. The algorithms/agents capable of meta-training would be supplied and fixed.

In any case, we would be meta-testing on n-k datasets, in the following way. Each dataset/task has C classes (C>=20) with Nc=40 examples per class:

For each dataset in meta-test set, split data into training and test sets. Always keep the same nt=20 examples for testing in each class, and

  • vary the number of training samples (shots) ns<=20 (any-shot testing setting) ns=[1, 2, 5, 10, 20].

  • vary the number of classes (ways) nc<=C (any-way testing setting) nc=[2, 4, 8, 16, min(32, C)]


  • For each domain in the final test phase, performances will be averaged over all experiments and a ranking will be made using this average score.

  • The overall ranking will be made from the average rank of individual domain rankings.

Congratulations to the Neurips'21 MetaDL winners [slides]

About Meta Learning

For a comprehensive overview of Meta-learning, we refer to the following resources:

ChaLearn (USA)

University Paris-Saclay (France)

Codalab, UPSaclay (France)

Leiden University (the Netherlands)

Universidad de la Sabana (Colombia)

4Paradigm (China)