AutoML-多精度优化

AutoML-多精度优化

1.HyperBand

Param Description
B Total Computational Resource
R Maxmimum Amount of Resouce that can be Allocated to A Single Configuration
n Number of Initial Configurations
r Resources per Configuration
$S_{max}$ Maximum Successive Halving Steps
$\eta$ A Constant $ \ge 1 $ ,$=3,4$,淘汰率
Bracket A Combination of $n$ and $r$

HyperBand Algorithm
Parameters

Core

  1. Successive Halving: Iteratively:Allocate -> Evaluate -> Dicard the least promising ones

  2. Hyperband extends Successive Halving by introducing an outer loop that explores different trade-offs between the number of configurations (n) and the resources allocated per configuration (B/n). While SH only evaluates one resource allocation scheme (for a fixed n and B/n), Hyperband systematically explores multiple configurations of n and B/n.

  • Inner Loop: Hyperband runs multiple rounds of Successive Halving, each with a different resource allocation (i.e., different values for n and B/n).

  • Outer Loop: The outer loop iterates over these different resource allocation schemes, allowing Hyperband to experiment with different configurations of n and B/n.

Insights

  1. HyperBand performs a geometric search over the allocation of resources and evaluates configuration with progressively larger resource allocations.
  2. Optimal Resource Allocation
  • The optimal strategy allocates resources to configurations based on their ability to distinguish them from the best-performing configuration. This requires minimal resources to separate each configuration.
  1. Successive Halving vs Uniform Allocation
  • Successive Halving is more efficient than uniform allocation because of Early-Stopping strategy.

  • Successive Halving adapts by progressively allocating resources to promising configurations, leading to a smaller budget compared to uniform allocation.

Experiments

HyperBand generalizes to different resources:

  1. Time
  2. DataSet Sampling
  3. Feature Sampling

Discussion

  1. If the training time increases superlinearly with the resource, Hyperband can offer higher speedups.

  2. HyperBand is more suitable to difficult problems such as high-dimensional ones.

2.BOHP

Combination is Innovation

  1. BO : effective in finding optimal configurations but is computationally expensive, especially in high-dimensional spaces.

  2. HyperBand : efficiently allocates resources across configurations using a bandit-based strategy and faster than BO but it suffers from the lack of a model to guide the search, which can lead to suboptimal solutions over time.

Algorithm

For simpilicity and computational efficiency,use simple TPE as the BO component.

  1. Initialization:
  • Randomly sample a fraction $ρ$ of configurations and evaluate them using a small budget.

  • Store these initial configurations and their performance in dataset D.

  1. Main Optimization Loop:

While optimization has not converged:

2.1 Successive Halving (SH)
For each budget level between bmin and bmax:

  • Randomly sample configurations to evaluate.
  • Evaluate each configuration using a small budget and rank them by performance.
  • Discard the worst-performing configurations and increase the resources for the best-performing configurations.
  • Store the results and update the dataset D.

2.2 Bayesian Optimization (BO):

  • If the random fraction ρ is selected, pick a random configuration for evaluation (exploration).

  • Otherwise, use the Bayesian model to select the next configuration:

  • Select the best available budget for model fitting: Based on the dataset D, choose the budget with at least Nmin observations.

  • If no valid configurations are found for that budget, pick a random configuration.

  • Fit a model (e.g., using TPE or another probabilistic model) to predict the best configuration.

  • Sample from the model: Sample Ns configurations from the model and choose the one that maximizes the expected improvement (EI).

  • Add the selected configuration to the dataset D and evaluate it.

  1. Evaluate Configurations:
  • Evaluate each configuration selected by either SH or BO on a larger budget.

  • Update the dataset D with the new evaluation results.

  1. Repeat:
  • Continue the process of refining the search space by performing Successive Halving (SH) and guiding the search using Bayesian Optimization (BO) until the optimization process converges or a maximum number of iterations is reached.
  1. Final Configuration:
  • The best configuration found is the one that provides the best performance according to the dataset D.