Empirical risk minimization

From Wikipedia, the free encyclopedia

Empirical risk minimization (ERM) is a principle in statistical learning theory which defines a family of learning algorithms and is used to give theoretical bounds on the performance of learning algorithms.

Background

Consider the following situation, which is a general setting of many supervised learning problems. We have two spaces of objects $X$ and $Y$ and would like to learn a function $\! h: X \to Y$ (often called hypothesis) which outputs an object $y \in Y$ , given $x \in X$ . To do so, we have in our disposal a training set of a few examples $\! (x_1, y_1), \ldots, (x_m, y_m)$ where $x_i \in X$ is an input and $y_i \in Y$ is the corresponding response that we wish to get from $\! h(x_i)$ .

To put it more formally, we assume that there is a joint probability distribution $P (x, y)$ over $X$ and $Y$ , and that the training set consists of $m$ instances $\! (x_1, y_1), \ldots, (x_m, y_m)$ drawn i.i.d. from $P (x, y)$ . Note that the assumption of a joint probability distribution allows us to model uncertainty in predictions (e.g. from noise in data) because $y$ is not a deterministic function of $x$ , but rather a random variable with conditional distribution $P (y | x)$ for a fixed $x$ .

We also assume that we are given a non-negative real-valued loss function $L(\hat{y}, y)$ which measures how different is the prediction $\hat{y}$ of a hypothesis from the true outcome $y$ . The risk associated with hypothesis $h (x)$ is then defined as the expectation of the loss function:

$R(h) = \mathbf{E}[L(h(x), y)] = \int L(h(x), y)\,dP(x, y).$

A loss function commonly used in theory is the 0-1 loss function: $L(\hat{y}, y) = I(\hat{y} \ne y)$ , where $I (...)$ is the indicator notation.

The ultimate goal of a learning algorithm is to find a hypothesis $\! h^*$ among a fixed class of functions $\mathcal{H}$ for which the risk $R (h)$ is minimal:

$h^* = \arg \min_{h \in \mathcal{H}} R(h).$

Empirical risk minimization

In general, the risk $R (h)$ cannot be computed because the distribution $P (x, y)$ is unknown to the learning algorithm (this situation is referred to as agnostic learning). However, we can compute an approximation, called empirical risk, by averaging the loss function on the training set:

$\! R_\mbox{emp}(h) = \frac{1}{m} \sum_{i=1}^m L(h(x_i), y_i).$

Empirical risk minimization principle states that the learning algorithm should choose a hypothesis $\hat{h}$ which minimizes the empirical risk:

$\hat{h} = \arg \min_{h \in \mathcal{H}} R_{\mbox{emp}}(h).$

Thus the learning algorithm defined by ERM principle consists in solving the above optimization problem.

[edit] Properties

by 쿠리다쿠리 2010. 12. 9. 19:52

IId(독립 동일 분포:Independent and identically distribution)

공부좀 해야지/Mathmatics

"IID" and "iid" redirect here. For other uses, see IID (disambiguation).

In probability theory and statistics, a sequence or other collection of random variables is independent and identically distributed (i.i.d.) if each random variable has the same probability distribution as the others and all are mutually independent.

The abbreviation i.i.d. is particularly common in statistics (often as iid, sometimes written IID), where observations in a sample are often assumed to be (more-or-less) i.i.d. for the purposes of statistical inference. The assumption (or requirement) that observations be i.i.d. tends to simplify the underlying mathematics of many statistical methods: see mathematical statistics and statistical theory. However, in practical applications of statistical modeling the assumption may or may not be realistic. The generalization of exchangeable random variables is often sufficient and more easily met.

The assumption is important in the classical form of the central limit theorem, which states that the probability distribution of the sum (or average) of i.i.d. variables with finite variance approaches a normal distribution.

from Wiki

by 쿠리다쿠리 2010. 12. 8. 20:01

Drawn

공부좀 해야지/단어가 짧아서....

drawn [drɔːn]
DRAW의 과거분사.

†drawn [drɔːn] a.
① (칼집 따위에서) 빼낸, 뽑은.
② 그어진(선 등).
③ 잡아늘인.
④ 찡그린, 일그러진(얼굴 등).
⑤ 내장[속]을 빼낸(생선·새 등).
⑥ 끌린.
⑦ 비긴, 무승부의.
┈┈•a ∼ game.⑦

by 쿠리다쿠리 2010. 12. 8. 19:55

PREV | 1 ··· 16 17 18 19 20 21 22 ··· 36 | NEXT

		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

마이테이블

글

Empirical risk minimization

설정

From Wikipedia, the free encyclopedia

Background

Empirical risk minimization

[edit] Properties

글

IId(독립 동일 분포:Independent and identically distribution)

설정

글

Drawn

설정

사이드 메뉴

CATEGORY

TAG

RECENT POSTS

RECENT COMMENT

RECENT TRACKBACK

ARCHIVE

NOTICE

CALENDAR

COUNTER

티스토리툴바

마이테이블

글

Empirical risk minimization

설정

From Wikipedia, the free encyclopedia

Background

Empirical risk minimization

[edit] Properties

글

IId(독립 동일 분포:Independent and identically distribution)

설정

글

Drawn

설정

사이드 메뉴

CATEGORY

TAG

RECENT POSTS

RECENT COMMENT

RECENT TRACKBACK

ARCHIVE

NOTICE

MY LINK

CALENDAR

COUNTER

티스토리툴바