훈련 데이터 세트와 테스트 데이터 세트 나누기

feature와 label을 정의한다.
Scikit-learn의 train_test_split을 이용하여 훈련 데이터와 테스트 데이터를 나눈다.
train_test_split의 중요 파라미터

-test_size : validation set에 할당할 비율(20% -> 0.2)

-shuffle : 뒤섞기 여부 (기본은 True)

-random_state : 랜덤 시드 값
샘플 코드

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split

train = pd.read_csv('https://bit.ly/fc-ml-titanic')
train.head()

feature = ['Pclass', 'Sex', 'Age', 'Fare']
label = ['Survived']
x_train, x_valid, y_train, y_valid = train_test_split(train[feature], train[label], test_size=0.2, shuffle=True, random_state=30)

이전AI/ML(패스트 캠퍼스) - imputer를 이용한 결측치 처리

다음Spring - 생성자 의존성 주입과 수정자 의존성 주입