Pandas ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ ์‹ค์Šต
ยท
Data Science/coding pratice
์‹ค์ œ ๋ถ€๋™์‚ฐ ๋ฐ์ดํ„ฐ๋กœ ๋ฐ์ดํ„ฐ์ „์ฒ˜๋ฆฌ ์‹ค์Šต์„ ์ง„ํ–‰ 1. column ์žฌ์ •์˜→ rename : column์˜ ์ด๋ฆ„์ด ๋ณต์žกํ•  ๊ฒฝ์šฐ ์žฌ์ •์˜๋ฅผ ํ•ด์ค€๋‹ค #๋‚ด๊ฐ€์ง ์ฝ”๋“œ df.columns = ['์ง€์—ญ๋ช…', '๊ทœ๋ชจ๊ตฌ๋ถ„', '์—ฐ๋„', '์›”', '๋ถ„์–‘๊ฐ€๊ฒฉ'] >> ์ด๋ฆ„์„ ๋‹ค ๋„ฃ์–ด์ค˜์„œ column์ด๋ฆ„์„ ์ƒˆ๋กœ ๋ถ€์—ฌํ•ด์คŒ #ํ•ด์„ค df = df.rename(columns={'๋ถ„์–‘๊ฐ€๊ฒฉ(ใŽก)':'๋ถ„์–‘๊ฐ€๊ฒฉ'}) 2. column์˜ datatype ๋ณ€ํ™˜: astype df['๋ถ„์–‘๊ฐ€๊ฒฉ'].astype(int) 3. strip์œผ๋กœ ๊ณต๋ฐฑ์ด ์žˆ๋Š” ๋ฐ์ดํ„ฐ ๊ณต๋ฐฑ์—†์• ๊ธฐ: strip() column์˜ ๋ฌธ์ž์—ด์— strip์„ ์‹คํ–‰ํ•˜๊ณ ์ž ํ•  ๋•Œ๋Š” str.strip() df.loc[df['๋ถ„์–‘๊ฐ€๊ฒฉ']==' '] #ํ™•์ธ df['๋ถ„์–‘๊ฐ€๊ฒฉ'] = df['๋ถ„์–‘๊ฐ€๊ฒฉ']...
paper study 01 - ๊ณ ๊ฐ์˜ ํŠน์„ฑ ์ •๋ณด๋ฅผ ํ™œ์šฉํ•œ ํ™”์žฅํ’ˆ ์ถ”์ฒœ์‹œ์Šคํ…œ ๊ฐœ๋ฐœ
ยท
Data Science/Paper
Today's paper ๊ณ ๊ฐ์˜ ํŠน์„ฑ ์ •๋ณด๋ฅผ ํ™œ์šฉํ•œ ํ™”์žฅํ’ˆ ์ถ”์ฒœ์‹œ์Šคํ…œ ๊ฐœ๋ฐœ - ๊น€ํšจ์ค‘, ์‹ ์šฐ์‹, ์‹ ๋™ํ›ˆ, ๊น€ํฌ์›…, ๊น€ํ™”๊ฒฝ ์˜ค๋Š˜์€ ์ง€๊ธˆ ์ฝ๊ณ  ์žˆ๋Š” ๋…ผ๋ฌธ ๊ณต๋ถ€ ๊ธฐ๋ก์„ ํ•˜๊ฒ ๋‹ค. ์ €์ž‘๊ถŒ์ƒ ์ž์„ธํ•œ ์„ค๋ช…์€ ํ•  ์ˆ˜ ์—†๊ณ , ์ฝ๋Š” ๊ณผ์ •์—์„œ ๋ง‰ํžˆ๋Š” ๊ฐœ๋…์„ ๊ทธ๋•Œ๊ทธ๋•Œ ํ•™์Šตํ•˜๋ คํ•œ๋‹ค. ์ฐธ๊ณ ๋งํฌ https://huidea.tistory.com/263 [Machine learning] ์ถ”์ฒœ ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ๊ธฐ์ดˆ ์ด์ •๋ฆฌ - Collaborative filtering , Matrix Factorization, SVD, Factorization 0. ์ถ”์ฒœ ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ์ข…๋ฅ˜ https://towardsdatascience.com/various-implementations-of-collaborative-filtering-100385c6dfe0 1)..
Numpy 03
ยท
Data Science/coding pratice
๋ชฉ์ฐจ -arange: array, list๋ฅผ ์ƒ์„ฑํ•จ -range: ๋ฐ˜๋ณต๋ฌธ ๊ตฌ๋ฌธ ๋‚ด์—์„œ ๋ฒ”์œ„ ์ง€์ • -์ •๋ ฌ(sort) -์ธ๋ฑ์Šค๋ฅผ ๋ฐ˜ํ™˜ํ•˜๋Š” argsort -broadcasting arange์™€ range๋ฅผ ๊ฐ™์ด ๋ณด๊ณ  ์ดํ•ดํ•˜๊ธฐ ์šฐ๋ฆฌ๋Š” ์ˆœ์ฐจ์ ์ธ ๊ฐ’์„ ์ƒ์„ฑํ•  ๋•Œ๊ฐ€ ๋งŽ๋‹ค. 1. ํšŒ์›์— ๋Œ€ํ•œ ๊ฐ€์ž…์ •๋ณด ๋ถ€์—ฌ(1๋ฒˆ ํšŒ์›๋ถ€ํ„ฐ ์ˆœ์ฐจ์ ์œผ๋กœ) 2. 100๊ฐœ ํ•œ์ •ํŒ๋งค ์ƒํ’ˆ์— ๋Œ€ํ•œ ๊ณ ์œ ๋ฒˆํ˜ธ ๋ถ€์—ฌ ๋ฐ์ดํ„ฐ ๊ด€๋ฆฌ๋ฅผ ์œ„ํ•ด ์ธ๋ฑ์Šค๋ฅผ ๋ถ€์—ฌํ•˜๋Š” ๊ฒƒ์€ ์ผ๋ฐ˜์ ์ด๋‹ค. 1. arange 1-1. ์ˆœ์„œ๋Œ€๋กœ list์— ๊ฐ’์„ ์ƒ์„ฑํ•˜๋ ค๋ฉด? arr = [1,2,3,4,5,6,7,8,9,10] #์ด๋ ‡๊ฒŒ ํ•ด๋„ ๋˜๋Š”๋ฐ.. ๊ท€์ฐฎ๋‹ค arr = np.arange(1,11) #1์ด์ƒ 11๋ฏธ๋งŒ์œผ๋กœ ์ˆซ์ž๋ฅผ ๋„ฃ์–ด์ค€๋‹ค (์ฒซ๋ฒˆ์งธ ์ธ์ž์—๋Š” start ์ด์ƒ, ๋‘๋ฒˆ์งธ ์ธ์ž์—๋Š” stop์˜..
์บ๊ธ€ ๊ณต๋ถ€ ์‹œ์ž‘ - ์ž…๋ฌธํ•˜๊ธฐ
ยท
Data Science/kaggle study
https://unfinishedgod.netlify.app/2020/03/22/%EC%BA%90%EA%B8%80-%EC%9E%85%EB%AC%B8%EC%9E%90%EB%A5%BC-%EC%9C%84%ED%95%9C-%EA%B0%80%EC%9D%B4%EB%93%9C-%EB%AC%B8%EC%84%9C/ ์บ๊ธ€ ์ž…๋ฌธ์ž๋ฅผ ์œ„ํ•œ ๊ฐ€์ด๋“œ ๋ฌธ์„œ - ๋ฏธ์™„์„ฑ์˜์‹  ๋ฐ์ดํ„ฐ ๋ถ„์„์„ ๊ณต๋ถ€ํ•˜๋‹ค๋ณด๋ฉด ์–ธ์  ๊ฐ€ ๋“ฃ๋Š” ์ด๋ฆ„์ด ์žˆ๋‹ค. “์บ๊ธ€”. ์บ๊ธ€์ด ๋ญ”๊ฐ€ ํ•˜๊ณ  ๊ฒ€์ƒ‰์„ ํ•ด๋ณด๋ฉด “2010๋…„์— ์„ค๋ฆฝ๋œ ์˜ˆ์ธก ๋ชจ๋ธ ๋ฐ ๋ถ„์„ ๋Œ€ํšŒ ํ”Œ๋žซํผ์œผ๋กœ ๊ธฐ์—… ๋ฐ ๋‹จ์ฒด์—์„œ ๋ฐ์ดํ„ฐ์™€ ํ•ด๊ฒฐ๊ณผ์ œ๋ฅผ unfinishedgod.netlify.app ๋งํฌ ์ฐธ์กฐ 5์›” ์ฒซ์ฃผ ๋ชฉํ‘œ, ํƒ€์ดํƒ€๋‹‰ ๋…ธํŠธ๋ถ ๋ถ„์„ํ•ด๋ณด๊ธฐ 5์›” ์•ˆ ํ•ด์•ผํ•˜๋Š” ๊ฒƒ๋“ค -์ €๋„or๋…ผ๋ฌธ ์ฝ๊ธฐ -ํŒŒ์ด์ฌ ai ์ˆ˜ํ•™ ์ฑ… ์ฝ๊ธฐ ..
Numpy
ยท
Data Science/coding pratice
https://blog.naver.com/rlawozl96/222652701056 Python - Numpy 01 ์ด์ œ๋ถ€ํ„ฐ ๋ชฉ์ฐจ๋ฅผ ์ ์–ด์ค˜์•ผ๊ฒ ๋‹ค.. ๋‚ด์šฉ๋งŽ์€๋ฐ ๋ญ์žˆ๋Š”์ง€ ๋ชฐ๋ผ์„œ ํ—ท๊ฐˆ๋ฆผ - numpy ๊ฐœ์š” - numpy array: np.arra... blog.naver.com Numpy 1 -numpy ๊ฐœ์š” -numpy array -numpy dtype -numpy indexing, slicing 1D array: ํ–‰๋ฒกํ„ฐ, ์—ด๋ฒกํ„ฐ 2D array: Matrix(2์ฐจ์› ํ…์„œ) 3D array: ํ…์„œ https://blog.naver.com/rlawozl96/222652765082 Numpy 02 ๋ชฉ์ฐจ -Fancy indexing: ์ง‘ํ•ฉ์œผ๋กœ ์ถ”์ถœ, ๋ณต์Šต์ž˜ํ•˜๊ธฐ -Boolean indexing: ์กฐ๊ฑด(T/F)์œผ๋กœ ..
Python ๊ธฐ์ดˆ
ยท
Data Science/Python ๊ธฐ์ดˆ
์ž ์‹œ ํ‹ฐ์Šคํ† ๋ฆฌ๋ฅผ ๋ฐฉ์น˜ํ•˜๊ณ  ๋ธ”๋กœ๊ทธ์— ์ •๋ฆฌํ•ด๋’€๋˜ ๊ฒƒ์„ ๋‹ค์‹œ ํ‹ฐ์Šคํ† ๋ฆฌ๋กœ ๋ฐฑ์—…ํ•˜๊ธฐ... https://blog.naver.com/rlawozl96/222573880972 Python 1 ๋‹ค์‹œ ํŒŒ์ด์ฌ... ํ‹ฐ์Šคํ† ๋ฆฌ๊นŒ์ง€ ๊ด€๋ฆฌํ•  ์ž์‹ ์ด ์—†์–ด์„œ ๋‹ค์‹œ ๋ธ”๋กœ๊ทธ๋กœ ๋Œ์•„์˜ด ์˜ˆ์ „์— ๋“ค์—ˆ๋˜ ๋ถ€๋ถ„ ๊ธฐ์–ต์ด ์•ˆ๋‚˜... blog.naver.com Python 1 -๋ฐ์ดํ„ฐ ํƒ€์ž…, list tuple set dict ๊ฐœ๋…, length ํŒŒ์•… https://blog.naver.com/rlawozl96/222587658689 Python 2 0. Other Calculation a = 10 b = 3 1) % : ๋ชซ์„ ๋‚˜๋ˆˆ ๋‚˜๋จธ์ง€๋ฅผ ๊ตฌํ•จ a % b = 1 (10/3... blog.naver.com Python 2 -calculation, in..
pandas ๋ณต์Šต 3
ยท
Data Science/coding pratice
๋ชฉ์ฐจ - ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ๋ณ‘ํ•ฉ(merge): concat ์ฐจ์ด (left, right, inner, outer), column์ด ๋‹ค๋ฅผ๋•Œ merge - astype(), dtype() : ๋ฐ์ดํ„ฐ ํƒ€์ž… ๋ณ€๊ฒฝ - pd.to_datetime - dt.ํ•จ์ˆ˜ - ๊ฐ’์„ ๋„ฃ์–ด์ฃผ๋Š” ๋ฐฉ๋ฒ• 3๊ฐ€์ง€ - apply: ํ•จ์ˆ˜ def ์ •์˜ํ•˜์—ฌ ์ ์šฉ ex.์„ฑ๋ณ„ ๋‚จ/์—ฌ๋ฅผ 0,1๋กœ ๋ฐ”๊พธ๊ธฐ ๋“ฑ - lambda: lambda x: ์ˆ˜์‹ (ํ•จ์ˆ˜์‹ ๊ฐ„๋‹จํžˆ ํ•œ์ค„๋กœ) - map: dictํ˜•ํƒœ๋กœ key, value๊ฐ’์— ๊ฐ๊ฐ ํ• ๋‹น - ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์˜ ์‚ฐ์ˆ ์—ฐ์‚ฐ(์‹œ๋ฆฌ์ฆˆ) column๊ณผ column ๊ฐ„ ์—ฐ์‚ฐ column๊ณผ ์ˆซ์ž ๊ฐ„ ์—ฐ์‚ฐ ๋ณตํ•ฉ์—ฐ์‚ฐ mean(), sum()์„ axis ๊ธฐ์ค€์œผ๋กœ ์—ฐ์‚ฐ(์—ด์˜ ์ดํ•ฉ๊ณ„, ํ–‰์˜ ์ดํ•ฉ๊ณ„) NaN ๊ฐ’์ด ์กด์žฌํ• ๊ฒฝ์šฐ์˜ ์—ฐ์‚ฐ - ๋ฐ์ดํ„ฐ..
pandas ๋ณต์Šต 2
ยท
Data Science/coding pratice
๊ณ ๊ฐ์˜ ํŠน์„ฑ ์ •๋ณด๋ฅผ ํ™œ์šฉํ•œ ํ™”์žฅํ’ˆ ์ถ”์ฒœ์‹œ์Šคํ…œ ๊ฐœ๋ฐœ๋ชฉ์ฐจ -group by: ๊ทธ๋ฃน๋ณ„ ํ†ต๊ณ„(count, sum ๋“ฑ ํ•จ์ˆ˜๋ง๋ถ™์—ฌ ์‚ฌ์šฉ) -multi index -multi index๋ฅผ pivot table๋กœ ๋ณ€ํ™˜: unstack -reset_index() * -fillna: ๊ฒฐ์ธก์น˜ ์ฑ„์šฐ๊ธฐ -dropna: ๊ฒฐ์ธก์น˜ ์žˆ๋Š” row/column ์ œ๊ฑฐ -drop_duplicate: ์ค‘๋ณต์น˜ ์žˆ๋Š” row/column ์ œ๊ฑฐ (keep: first/last ์˜ต์…˜ ๊ฐ€๋Šฅ) -drop: ํ–‰,์—ด ์ œ๊ฑฐ -๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ํ•ฉ์น˜๊ธฐ(์˜†์œผ๋กœ ํ•ฉ์น˜๊ธฐaxis=1, ์•„๋ž˜๋กœ ํ•ฉ์น˜๊ธฐsort=False) Import pandas as pd 1) Group by : ๊ทธ๋ฃน๋ณ„ ํ†ต๊ณ„ df. groupby('์†Œ์†์‚ฌ') #์•„๋ฌด๊ฒƒ๋„ ์ถœ๋ ฅ ์•ˆ ๋จ. ์ถ”๊ฐ€์ ์œผ๋กœ ์ทจํ•  ํ†ต๊ณ„ํ•จ์ˆ˜๋ฅผ..