AutoML-Conf 2022 https://2022.automl.cc 1st International Conference on</br> Automated Machine Learning Tue, 06 Sep 2022 06:18:44 +0000 en-US hourly 1 https://wordpress.org/?v=6.1.1 https://2022.automl.cc/wp-content/uploads/2021/11/cropped-android-chrome-512x512-1-32x32.png AutoML-Conf 2022 https://2022.automl.cc 32 32 Best Paper Award for “Automatic Termination for HPO” https://2022.automl.cc/best-paper-award-for-automatic-termination-for-hpo/ Fri, 12 Aug 2022 08:37:39 +0000 https://2022.automl.cc/?p=851 At the first Conference on Automated Machine Learning (AutoML), the work on Automatic termination for hyperparameter optimization won a best-paper award for a new way to decide when to stop Bayesian optimization, a widely used hyperparameter optimization method. In this blog post, we, i.e., the authors of the paper, intuitively explain the main aspects of this work.

The performance of machine learning models crucially depends on their hyperparameters, but setting them correctly is typically a tedious and expensive task. Hyperparameter optimization (HPO) requires iteratively retraining the model with different hyperparameter configurations in a search for a better one. This poses a question of the trade-off between the time use and the accuracy of the resulting model. The current work on Automatic termination for hyperparameter optimization suggests a new principled stopping criterion for this problem. At its core, it admits the discrepancy between the generalization error (i.e., true but unknown optimization objective) and its empirical estimator (i.e., objective optimized by HPO). That hints at the intuitive stopping criterion that is triggered once we are unlikely to improve the generalization error.

Hyperparameter optimization as black-box optimization

HPO can be seen as black-box optimization where we sequentially interact with a fixed unknown objective \(f(\gamma)\) and for a selected input \(\gamma_t\) observe its noise-perturbed evaluation \(y_t\). In HPO, the black-box function to be minimized is a population risk, i.e., performance of the model \(\mathcal{M}_\gamma\) trained on some data \(\mathcal{D}\) (e.g., via SGD) evaluated on the unseen data from the distribution \(P\). In the evaluation, we observe the noise-perturbed result with the stochasticity coming from the model training.

Bayesian optimization (BO) is a powerful framework for such black-box functions. In its core, for the function \(f\) with the optimizer \(\gamma^*\), BO searches for better predictive performance via (i) training a probabilistic model (e.g., Gaussian process (GP) or Bayesian neural network) on the evaluations and (ii) acquiring the most promising next hyperparameter candidate. This iterative search lasts until, for example, some user-defined budget. The goal is to minimize the distance between optimal \(\gamma^*\) and the best configuration \(\gamma_t^*\) found before the budget is used. This distance \(f(\gamma_t^*) – f(\gamma^*)\) quantifies the convergence of the HPO algorithm.

The discrepancy between the true objective and the computable HPO target

In the paper, we qualitatively address the question “whether \(f\) is actually the objective we optimize in practical BO”. The answer is no since the data distribution is unknown in practice. What we actually optimize (and evaluate) is the estimator of the population risk. That is, for some given data (divided into train-validation-test) we evaluate validation set-based value. As the result, the evaluation \(y_t\) inherits stochasticity not only due to the (1) stochasticity in model training; but also due to (2) data realization.

That highlights the discrepancy between what we would like to optimize and what we can optimize. As the result, optimizing the empirical estimator too strenuously may actually lead to sub-optimal solutions to the population risk.

Convergence criterion for hyperparameter optimization

Due to the discrepancy in the objectives, the quality of configurations is judged according to the empirical estimator \(\widehat f\) instead of the true objective \(f\). Namely, it is quantified by the regret \(\widehat r_t = \widehat f(\gamma_t^*) – \widehat f(\gamma_D^*)\) computed for the true optimizer \(\gamma: \hat{f} \to \widehat{f}\) and the best-found solution \(\gamma_t^*\) (see the figure below).

Our termination criterion is based on the observation that the evaluation of a particular hyperparameter configuration inherits the statistical error of the empirical estimator. By the statistical error, we mean, for example, its variance indicating how far, on average, the collection of evaluations is from the expected value. If the statistical error dominates the regret, further reduction of the regret may not improve notably the generalization error. However, neither the statistical error nor the true regret is known. In the paper, we show how to compute estimates of both quantities.

Building blocks of the termination criterion

Bounding the regret

We derive the regret bounds from the probabilistic model used in Bayesian optimization. The key idea behind the bound is that as long as the model, namely the Gaussian process, is well-calibrated, the function: \(\hat f \to \widehat{f} \) lies in the interval with upper and lower confidence bounds (see \(\mathrm{ucb}\) and \(\mathrm{lcb}\) below). These are theoretically studied concentration inequalities valid under common assumptions used in BO and HPO (see the paper). Thus, the regret bound can be obtained via the distance between the upper and lower bounds on the function.

Bounding the statistical error

We show how one estimates the statistical error in the case of the prominent cross-validation-based estimator. The characteristics of this estimator, namely bias and variance, are well theoretically studied, which equips us with the ways to estimate them.

Our termination criterion, then, is that the statistical error exceeds the maximum plausible improvement. This termination condition adapts to different algorithms or datasets and its computation comes with negligible computational cost on top of training the model for some hyperparameter configuration.

Evaluation

We evaluate the stopping criterion over a wide range of HPO and Neural Architecture Search problems, covering the cases where cross-validation is used and when it is not (or can not be). The results show that our method is problem adaptive (in contrast to the baselines). Moreover, it proves an interpretable way to navigate the trade-off between the time use and the accuracy of the resulting model. We encourage you to check the experiments and the main paper for more details.

]]>
Introducing Reproducibility Reviews https://2022.automl.cc/introducing-reproducibility-reviews/ Thu, 17 Feb 2022 07:47:11 +0000 https://www.automl.org/?p=3967 By Frank Hutter, Isabelle Guyon, Marius Lindauer and Mihaela van der Schaar (general and program chairs of AutoML-Conf 2022)

Did you ever try to reproduce a paper from a top ML conference and failed to do so? You’re not alone! At AutoML-Conf (see automl.cc), we’re aiming for a higher standard: with the papers we publish you shouldn’t have this problem!

Why is this important?
The reproducibility of research papers is key to the sustained progress of a field. Yet, there are many accounts of poor reproducibility in machine learning [Haibe-Kains et al, 2020], reinforcement learning [Henderson et al. 2018, Agarwal et al. 2021] and also AutoML [Yan et al. 2019, Lindauer & Hutter. 2020]. We believe that at AutoML-Conf we can fix this, and if we’re successful then other top ML conferences may follow.

What is the status quo?
Let’s say Alice wants to reproduce the results of Bob’s paper. If Bob does not make code available, Alice has to implement it from scratch. This often takes substantial time (often weeks) and rarely yields the same results as in the paper. The result? Much reduced impact for Bob, wasted effort for Alice, a slow-down in progress in the community. Not good.
Fortunately, in some cases Bob does make code available by now. However, Alice may still have to fight with it, find it to be incomplete, not runnable, or to not yield the original results. She emails Bob, who initially replies with helpful comments but at some point says he doesn’t have time to help more because he is busy with his next two publications. Again, the same failure as above. This happens far too often.

What can conferences do to improve this?
NeurIPS took a great first step by introducing a reproducibility checklist in 2019. However, NeurIPS neither mandates a code release nor makes the checklist available after the review process, reducing transparency. Next to this “official” checklist, individual reviewers, who have been sensitized to the topic of reproducibility, sometimes ask about code during the rebuttal process. Such requests during the rebuttal are almost always successful. However, the fact that they only occur for a small fraction of papers increases the randomness in reviewing.

Can we do better?
Yes! The problem with the status quo is that the incentive system is broken. It is substantial work to ensure the reproducibility of results, and while there are many incentives for publishing the next paper (e.g., graduation, tenure reviews, hiring criteria, performance reviews at companies, etc), the incentives for ensuring reproducibility aren’t comparable. We thus need to turn this incentive system around: authors should have to do the work for ensuring the reproducibility of their results *in order to get their paper published*. This incentive already works when individual reviewers ask about code, and at AutoML-Conf we’ll consistently integrate such discussions about reproducibility into the review process.

How will we achieve this?
To make the papers we publish at AutoML-Conf highly reproducible, we decided to invite dedicated reproducibility reviewers. These reviewers will be asked to check the authors’ answers to the questions of the reproducibility checklist (see the authors’ instructions included in the Latex template), and to verify them. For example, a reproducibility reviewer could check whether it is easy to install the software and run the experiments as documented (potentially using intermediate results / checkpoints for compute-intensive steps), and provide feedback on how to improve reproducibility further. Authors have the chance to act on this feedback to improve their work’s reproducibility as part of the rebuttal.

What counts as “reproducible”?
At this point, we only aim for a limited notion of reproducibility also known as “replicability”: when a reviewer repeats the authors’ steps, can she obtain the same results? Are the results exactly the same given the same seeds? Are results similar across seeds, e.g., with overlapping confidence bounds? Broader notions of reproducibility, such as qualitatively similar results on related datasets, etc., would be great to consider in the future.

Won’t this cause a lot of additional work for authors?
It does indeed cost time to make your paper reproducible, but it is dramatically more efficient for the authors to do this than for anyone else. Recalling the example above, it would be quite easy for Bob to document the steps he followed right when submitting the paper. It takes more time to do this when Alice asks about the code a few months later. And it takes yet more time (or is impossible) for Alice and other researchers to figure out Bob’s code by themselves. As a silver lining for Bob, papers with properly released code also have a much higher impact on the community than those without code, and when Bob wants to revisit his ideas a year later himself, he also benefits from having left everything in a clean state.

Can authors get extra time for the code release?
Yes; the supplemental material is only due a week after the main paper. Also, authors are free to continue updating their anonymous repository during the rebuttal.

Do authors have to clean up their code?
We see two possible ways for code releases:

  1. The code dump. A quick yet effective way to boost reproducibility: make code & scripts available, with requirements, provide a README detailing how to reproduce results, etc, but don’t spend (“waste”) time to clean it up.
  2. The open-source software package. Authors that want to build a community around their code may choose to invest more time into the code they release.

Option 2 tends to achieve greater lasting impact, and we’re thrilled to see such work, but for many papers Option 1 is more efficient; it is perfectly fine in terms of reproducibility.

Who are the reproducibility reviewers?
Like other reviewers, these will be volunteers from the community. We expect reproducibility reviewers to be more junior on average than standard reviewers; in our eyes, if it requires a PhD to reproduce a paper’s results then the authors didn’t do their job fully.

How do I sign up as a reproducibility reviewer?
Glad you ask! This only works if we have enough volunteers, as we’re aiming to have one reproducibility reviewer per submission. Please sign up here: https://forms.gle/mxki3gaSN7jZZykH9

This process is very much an experiment, but we hope that it will work and contribute to improving the reproducibility in our field, facilitating its sustained progress 🙂

]]>
Announcing the Automated Machine Learning Conference 2022 https://2022.automl.cc/announcing-the-automated-machine-learning-conference-2022/ Fri, 03 Dec 2021 15:35:42 +0000 https://www.automl.org/?p=3908 Modern machine learning systems come with many design decisions (including hyperparameters, architectures of neural networks and the entire data processing pipeline), and the idea of automating these decisions gave rise to the research field of automated machine learning (AutoML). AutoML has been booming over the last decade, with hundreds of papers published each year now in the subfield of neural architecture search alone. At the same time, AutoML matured considerably, and by now several AutoML systems support thousands of users in their projects.

In 2014, our journey towards an AutoML community started with the first international workshop on AutoML at ICML. Over the years, we co-organized 8 successful AutoML workshops, as well as the workshop series on Bayesian optimization, meta-learning, and neural architecture search. On top of this, there are AutoML-related workshops at different recent conferences, including ICML/NeurIPS/ICLR, IJCAI/AAAI, CVPR/ICCV, KDD, etc. This shows the huge interest in AutoML, but also led to a fragmentation of the community. We would like to change this and bring this community together.

Towards this end, we are excited to announce the 1st International Conference on AutoML in 2022. It will be co-located with ICML in Baltimore from July 25th to July 27th, 2022. Like ICML, in hopes that the COVID pandemic is under control in the summer, we currently plan for an in-person conference that will bring the community together physically.

Why yet another conference?

Community building is one of the central pillars and motivations for us to organize this conference. Meeting parts of the community is also possible at other conferences, but AutoML has grown so much that it deserves a home for bringing together all the different subfields of AutoML to exchange views, experiences and requirements. AutoML-Conf will provide this home.

Compared to the many thousands of participants at NeurIPS, CVPR, ICML and ICLR, AutoML-Conf will likely still be small enough to allow knowing a substantial fraction of attendees personally, making it much easier to build up strong connections, which in turn facilitates cross-fertilization and collaborations, and allows to form a sense of community.

Besides providing a dedicated conference that is on-topic for researchers in AutoML (rather than a session here and there at the mainstream conferences), having an AutoML conference also comes with the benefit of reducing the noise in the review process. Specifically, compared to conferences like ICML, ICLR, CVPR or NeurIPS, at AutoML-Conf it is much more likely that you get reviewers who are familiar with the topic of your AutoML paper.

What makes AutoML-Conf special?

Since collaborations often organically arise from sharing code, we strongly embrace open source, more so than other ML conferences. Open source also helps tackle large parts of the reproducibility crisis of ML and AutoML. Therefore, making code available is mandatory for publications at AutoML-Conf. We do, however, recognize that sharing certain aspects of the code or certain data sets is not possible; please see the author guidelines for these cases.

Making good open source code available is hard work: going the extra mile to really allow others to reproduce and use your work effectively. However, this is the quality that participants of AutoML-Conf can rely on, and the quality that will boost this great community even further.

In addition, we will also try to rethink how a conference can contribute to building a community. Sitting in a lecture hall most of the day and passively listening to talks helps you to learn something new, but only helps little for connecting to people. In fact, with papers and recorded talks being online weeks before conferences take place, consuming new content becomes less and less important at the conference. We envision a conference that encourages networking, interactions and discussions among attendees. Similar to awesome places for collaborations such as Dagstuhl (Germany) or NII Shonan (Japan), we plan for a mix of talks, invited poster sessions and small discussion rounds.

We will make recorded talks available before the conference to allow attendees to come prepared and network in a targeted fashion. At the conference itself, most papers will be presented as posters to allow for in-depth discussions. Of course, the recorded talks will remain online after the conference.

To be fair, we strive to accept all top-quality papers submitted to AutoML-Conf. We define “top-quality” similarly as other top-tier ML conferences, such as NeurIPS, ICML and ICLR (with the aforementioned emphasis on open-source and reproducibility), but we have no acceptance rate goal that we need to reach; we will rather consider a paper’s potential impact on the community in our final acceptance decisions. We hope that this will contribute towards reducing the noise in the reviewing process, while still maintaining or increasing the quality standard of accepted papers.

Owing to climate change, we will not force authors of accepted papers to attend in person (independent of the COVID situation), although we of course recommend it. We have, however, decided not to hold a full hybrid conference since it is even more important for the very first AutoML conference to meet in person, enabling direct interactions without any technical barriers, in order to build strong personal relationships and a sense of community.

Keynote Speakers

As you can see on automl.cc, we will have keynotes by six renowned leaders in the field.

  • Anima Anandkumar (Caltech & NVIDIA) was the brain behind the AutoML framework Amazon Sagemaker and has made various contributions on network architectures.
  • Jeff Clune (University of British Columbia & OpenAI) is well known for his two Nature papers and his vision on AI generating algorithms.
  • Chelsea Finn (Stanford University) is without doubt one of the leading experts in the field of meta-learning.
  • Timnit Gebru showed over and over again the risks of AI if not carefully designed or applied to the wrong questions, a central question the AutoML community must face with increased adoption.
  • Julie Josse (INRIA Ecole Polytechnique) is a world expert in learning with missing values, a problem that is key to making AutoML applicable in many real-world applications.
  • Alex Smola (Amazon Web Services) has unique experiences in AutoML systems from developing Auto-Gluon and providing a cloud-based AutoML service.

Join us in this journey!

We are super excited about AutoML-Conf and hope that you will join us, the many other organizers and senior area chairs in this journey! Please visit automl.cc for more details, share widely, and submit by the deadline (February 24, 2022 for abstracts, March 3, 2022 for full papers). We’re looking forward to seeing you at the conference!

]]>