AutoML-多精度优化
1.HyperBand
Param | Description |
---|---|
B | Total Computational Resource |
R | Maxmimum Amount of Resouce that can be Allocated to A Single Configuration |
n | Number of Initial Configurations |
r | Resources per Configuration |
$S_{max}$ | Maximum Successive Halving Steps |
$\eta$ | A Constant $ \ge 1 $ ,$=3,4$,淘汰率 |
Bracket | A Combination of $n$ and $r$ |
Core
-
Successive Halving: Iteratively:Allocate -> Evaluate -> Dicard the least promising ones
-
Hyperband extends Successive Halving by introducing an outer loop that explores different trade-offs between the number of configurations (n) and the resources allocated per configuration (B/n). While SH only evaluates one resource allocation scheme (for a fixed n and B/n), Hyperband systematically explores multiple configurations of n and B/n.
-
Inner Loop: Hyperband runs multiple rounds of Successive Halving, each with a different resource allocation (i.e., different values for n and B/n).
-
Outer Loop: The outer loop iterates over these different resource allocation schemes, allowing Hyperband to experiment with different configurations of n and B/n.
Insights
- HyperBand performs a geometric search over the allocation of resources and evaluates configuration with progressively larger resource allocations.
- Optimal Resource Allocation
- The optimal strategy allocates resources to configurations based on their ability to distinguish them from the best-performing configuration. This requires minimal resources to separate each configuration.
- Successive Halving vs Uniform Allocation
-
Successive Halving is more efficient than uniform allocation because of Early-Stopping strategy.
-
Successive Halving adapts by progressively allocating resources to promising configurations, leading to a smaller budget compared to uniform allocation.
Experiments
HyperBand generalizes to different resources:
- Time
- DataSet Sampling
- Feature Sampling
Discussion
-
If the training time increases superlinearly with the resource, Hyperband can offer higher speedups.
-
HyperBand is more suitable to difficult problems such as high-dimensional ones.
2.BOHP
Combination is Innovation
-
BO : effective in finding optimal configurations but is computationally expensive, especially in high-dimensional spaces.
-
HyperBand : efficiently allocates resources across configurations using a bandit-based strategy and faster than BO but it suffers from the lack of a model to guide the search, which can lead to suboptimal solutions over time.
Algorithm
For simpilicity and computational efficiency,use simple TPE as the BO component.
- Initialization:
-
Randomly sample a fraction $ρ$ of configurations and evaluate them using a small budget.
-
Store these initial configurations and their performance in dataset D.
- Main Optimization Loop:
While optimization has not converged:
2.1 Successive Halving (SH)
For each budget level between bmin and bmax:
- Randomly sample configurations to evaluate.
- Evaluate each configuration using a small budget and rank them by performance.
- Discard the worst-performing configurations and increase the resources for the best-performing configurations.
- Store the results and update the dataset D.
2.2 Bayesian Optimization (BO):
-
If the random fraction ρ is selected, pick a random configuration for evaluation (exploration).
-
Otherwise, use the Bayesian model to select the next configuration:
-
Select the best available budget for model fitting: Based on the dataset D, choose the budget with at least Nmin observations.
-
If no valid configurations are found for that budget, pick a random configuration.
-
Fit a model (e.g., using TPE or another probabilistic model) to predict the best configuration.
-
Sample from the model: Sample Ns configurations from the model and choose the one that maximizes the expected improvement (EI).
-
Add the selected configuration to the dataset D and evaluate it.
- Evaluate Configurations:
-
Evaluate each configuration selected by either SH or BO on a larger budget.
-
Update the dataset D with the new evaluation results.
- Repeat:
- Continue the process of refining the search space by performing Successive Halving (SH) and guiding the search using Bayesian Optimization (BO) until the optimization process converges or a maximum number of iterations is reached.
- Final Configuration:
- The best configuration found is the one that provides the best performance according to the dataset D.