# ML : Supervised / Unsupervised Learning

[복습]

그전에, 이전 강의에서 정확하게 이해하지 못한 것 같은 내용이 있어 다시 리뷰해본다.

Tom Mitchell provides a more modern definition :

"A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E."

P로 측정한 T에서의 작업 성능이 경험 E로부터 향상된다면, 컴퓨터 프로그램은 일종의 과제 T와 성과 측정 P에 관한 경험 E로부터 배우는 것으로 알려져 있다.

체커 게임을 예로 들면,

E : 많은 체커 게임 경험
T : 체커 게임을 하는 행위
P : 프로그램이 다음 게임에서 우승할 확률

1. Supervised Learning

- Given a data set and already know what our correct output should look like

- 정답이 무엇인지 알 수 있는 데이터 집합을 갖고 있음

- Input에 대한 올바른 Output이 무엇인지 알 수 있음

1.1. Regression problem

- continuous output으로 결과를 예측

- input 내용을 연속적인 함수로 매핑(map input variables to continuous functions)

- 'Size in feet'에 대하여 'Price'라는 답이 존재

- 객관적인 결과 값 유추를 통하여, 특정 Size일 때의 Price를 예측할 수 있음

1.2. Classification problem

- 문제에 대한 결과를 discrete(분리된, 이산적인)한 output으로 예측

- 입력된 여러 input을 discrete한 카테고리로 매핑

- Tumor(종양) 크기에 따른 Malignant(악성인지 여부) 판단
- 악성이면 1(Y), 아니면 0(N)
- output(결과 값)을 0 또는 1로 구분되게 예측할 수 있는 것
- output의 갯수는 여러개로 나누어서 매핑할 수 있음(1: 1기, 2: 2기, 3: 3기 등)

1.3. Example

- Given data about the size of houses on the real estate market, try to predict their price.
: Price as a function of size is a continuous output, so this is a regression problem.
: 부동산 시장 규모에 대한 데이터는 연속적인 결과 = Regression
- What if 'Whether the house sells for more or less than the asking price"
: House is classified based on price into two discrete categories
: 주택이 물가보다 더 많이 또는 적게 판매되는가 = 가격을 기준으로 구분 = Classification
- Given a picture of a person, we have to predict their age on the basis of the given picture
: 주어진 사람 그림을 기준으로 나이를 예측 = Regression
- Given a patient with a tumor, we have to predict whether the tumor is malignant or benign.
: 환자의 종양을 악성인지 양성인지 구분 = Classification

2. Unsupervised Learning

- Unsupervised learning allows us to approach problems with little or no idea what our results should look like. We can derive structure from data where we don't necessarily know the effect of the variables.
- 데이터가 처음부터 명확하게 일정한 규칙으로 구분되어 있지 않음
- 어느 정도 모아진 양의 데이터를 데이터 변수들 사이의 관계를 사용
- 명확한 답이 존재하지 않음 = No feedback based on the prediction results

2.1. Clustering

- Take a collection of 1,000,000 different genes, and find a way to automatically group these genes into groups that are somehow similar or related by different variables, such as lifespan, location, roles, and so on.

: 1,000,000 가지의 유전자를 모으고 이들 유전자를 수명, 위치, 역할 등과 같은 다양한 변수에 의해 어떻게 유사하거나 관련이있는 그룹으로 자동 분류하는 방법을 찾으십시오.

2.2. Non-Clustering

- The "Cocktail Party Algorithm", allows you to find structure in a chaotic environment. (i.e. identifying individual voices and music from a mesh of sounds at a cocktail party).

: "칵테일 파티 알고리즘"을 사용하면 혼란스러운 환경에서 구조를 찾을 수 있습니다. (즉, 칵테일 파티에서 소리의 메쉬로부터 개별 음성 및 음악을 식별).

2.3. Examples of using unsupervised problem

출처 강의 : https://www.coursera.org/learn/machine-learning/home/info

Joon's Blog