Empirical Risk minimization (ERM)
It is a principle in statistical learning theory which defines a family of learning algorithms and is used to give theoretical bounds on their performance.
The idea is that we don’t know exactly how well an algorithm will work in practice (the true "risk") because we don't know the true distribution of data that the algorithm will work on but as an alternative, we can measure its performance on a known set of training data.
We assumed that our samples come from this distribution and use our dataset as an approximation.
If we compute the loss using the data points in our dataset, it’s called empirical risk. It is “empirical "and not “true” because we are using a dataset that’s a subset of the whole population.
When our learning model is built, we have to pick a function that minimizes the empirical risk that is the delta between predicted output and actual output for data points in the dataset.
This process of finding this function is called empirical risk minimization (ERM).