๐Ÿ“ฃ ์˜ค๋Š˜์˜ ํŒŒ์ด์ฌ ๐Ÿฃ

2021.08.20 14:41 2,422 ์กฐํšŒ

๋ฐ์ดํ„ฐ ๋ถ„์„


   ๐Ÿ‘ฉ๐Ÿปโ€๐Ÿ’ป ํƒ์ƒ‰์  ๋ฐ์ดํ„ฐ ๋ถ„์„(EDA)


          ๋ฐ์ดํ„ฐ ํƒ์ƒ‰ ๊ธฐ์ดˆ

                 - ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ (import)

                 - ํŒŒ์ผ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ (read_csv())

                 - ํ–‰์—ด๊ฐฏ์ˆ˜ ๊ด€์ฐฐํ•˜๊ธฐ (shape)

                 - ๋ฐ์ดํ„ฐ ํ™•์ธํ•˜๊ธฐ (head())

                 - ๊ฒฐ์ธก์น˜ ์œ ๋ฌด ํ™•์ธํ•˜๊ธฐ isnull().sum()

                 - ๋ฐ์ดํ„ฐ ๊ฒฐ์ธก์น˜ ํ™•์ธํ•˜๊ธฐ (info())

                 - ์ˆ˜์น˜๋ฐ์ดํ„ฐ ํŠน์„ฑ ๋ณด๊ธฐ (describe())  

        

          ์‹œ๊ฐํ™”

                 - Matplotlib ์„  ๊ทธ๋ž˜ํ”„ (plot())

                 - Matplotlib ํžˆ์Šคํ† ๊ทธ๋žจ (hist())

                 - Seaborn ํžˆ์Šคํ† ๊ทธ๋žจ (distplot())

                 - Seaborn ์‚ฐ์ ๋„ ๊ทธ๋ž˜ํ”„ (scatterplot())

                 - Seaborn ์‚ฐ์ ๋„ ๊ทธ๋ž˜ํ”„ (pairplot())

                 - Seaborn  HeatMap ๊ทธ๋ž˜ํ”„(heatmap())


   ๐Ÿ‘ท ์ •ํ˜• ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ


          ๊ฒฐ์ธก์น˜ ๋‹ค๋ฃจ๊ธฐ

                 - ๊ฒฐ์ธก์น˜ ์‚ญ์ œํ•˜๊ธฐ, ๋Œ€์ฒดํ•˜๊ธฐ (dropna(), fillna())

                 - ๊ฒฐ์ธก์น˜ ํ‰๊ท ์œผ๋กœ ๋Œ€์ฒด (fillna({mean}))

                 - ๊ฒฐ์ธก์น˜ ๋ณด๊ฐ„๋ฒ•์œผ๋กœ ๋Œ€์ฒด (interpolate())


          ์ด์ƒ์น˜ ๋‹ค๋ฃจ๊ธฐ

                 - ์ด์ƒ์น˜ ํƒ์ง€ seaborn_boxplot()

                 - ์ด์ƒ์น˜ ์ œ๊ฑฐ IQR


          ์ •๊ทœํ™” ๋ฐ ์ธ์ฝ”๋”ฉ

                 - ์ˆ˜์น˜ํ˜• ๋ฐ์ดํ„ฐ ์ •๊ทœํ™” MinMaxScaler()

                 - ์›-ํ•ซ ์ธ์ฝ”๋”ฉ OneHotEncoder()


          ๋‹ค์ค‘๊ณต์„ ์„ฑ

                 - ๋‹ค์ค‘๊ณต์„ ์„ฑ ํ•ด๊ฒฐ - ๋ณ€์ˆ˜ ์ •๊ทœํ™”

                 - ๋‹ค์ค‘๊ณต์„ ์„ฑ ํ•ด๊ฒฐ - ๋ณ€์ˆ˜ ์ œ๊ฑฐ

                 - ๋‹ค์ค‘๊ณต์„ ์„ฑ ํ•ด๊ฒฐ - PCA (1)

                 - ๋‹ค์ค‘๊ณต์„ ์„ฑ ํ•ด๊ฒฐ - PCA (2)

                 - ๋‹ค์ค‘๊ณต์„ ์„ฑ ํ•ด๊ฒฐ - PCA (3)


          ํŒŒ์ƒ ๋ณ€์ˆ˜ ์ถ”๊ฐ€

                 - ์—ฐ์†ํ˜• ๋ณ€์ˆ˜ ๋ณ€ํ™˜ (1)

                 - ์—ฐ์†ํ˜• ๋ณ€์ˆ˜ ๋ณ€ํ™˜ (2)

                 - Polynomial Features (1)

                 - Polynomial Features (2)


๐Ÿค– ๋จธ์‹ ๋Ÿฌ๋‹


   โœจ๋ชจ๋ธ


          ์˜์‚ฌ๊ฒฐ์ •๋‚˜๋ฌด(Decision Tree)

                 - ๋ชจ๋ธ๊ฐœ๋… (Decision Tree)

                 - ๋ชจ๋ธ์„ ์–ธ (DecisionTreeClassifier())

                 - ๋ชจ๋ธํ›ˆ๋ จ (fit())

                 - ํ…Œ์ŠคํŠธ์˜ˆ์ธก(predict())


          Random forest

                 - ๋ชจ๋ธ ์ •์˜ RandomForestClassifier()

                 - ๋žœ๋คํฌ๋ ˆ์ŠคํŠธ ๋ณ€์ˆ˜์ค‘์š”๋„ ํ™•์ธ (feature_importances_)

                 - ๋ชจ๋ธ ์‹ค์Šต


          ๊ต์ฐจ ๊ฒ€์ฆ

                 - ๊ต์ฐจ ๊ฒ€์ฆ ์ •์˜ K-Fold

                 - ๊ต์ฐจ๊ฒ€์ฆ ์‹ค์Šต K-Fold


   ๐ŸŽ› ํŠœ๋‹


          Grid Search

                 - ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ, GridSearch ๊ฐœ๋… (์ •์ง€๊ทœ์น™)

                 - GridSearch ๊ตฌํ˜„ (GridSearchCV())


          Bayesain optimization

                 - Bayesian Optimization

                 - ๊ทธ๋ฆฌ๋“œ, ๋žœ๋ค ์„œ์น˜ vs Bayesian Optimization

                 - Bayesian Optimization ์‹ค์Šต