๊ด€๋ฆฌ ๋ฉ”๋‰ด

ruriruriya

[Python] ํŒŒ์ด์ฌ ํŒ๋‹ค์Šค(Pandas) - 1์ฐจ์› ๋ฐ์ดํ„ฐ ์ƒ์„ฑ ์‹œ๋ฆฌ์ฆˆ(Series), ๋ ˆ์ด๋ธ”๊ณผ ์ธ๋ฑ์Šค ๋ณธ๋ฌธ

๐ŸPython/Pandas

[Python] ํŒŒ์ด์ฌ ํŒ๋‹ค์Šค(Pandas) - 1์ฐจ์› ๋ฐ์ดํ„ฐ ์ƒ์„ฑ ์‹œ๋ฆฌ์ฆˆ(Series), ๋ ˆ์ด๋ธ”๊ณผ ์ธ๋ฑ์Šค

๋ฃจ๋ฆฌ์•ผใ…‘ 2023. 11. 14. 18:18
๋ฐ˜์‘ํ˜•

์‚ฌ์ง„: Unsplash ์˜ shiyang xu

ํŒ๋‹ค์Šค(Pandas)๋Š” ํŒŒ์ด์ฌ์—์„œ ๋ฐ์ดํ„ฐ ์กฐ์ž‘๊ณผ ๋ถ„์„์„ ์œ„ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋‹ค. ์ฃผ๋กœ ํ‘œ ํ˜•์‹์˜ ๋ฐ์ดํ„ฐ๋‚˜ ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ๋ฅผ ๋‹ค๋ฃจ๋Š” ๋ฐ์— ์‚ฌ์šฉํ•œ๋‹ค. ํŒ๋‹ค์Šค๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•˜๊ณ  ์ •๋ฆฌํ•˜๋Š” ๋ฐ ์œ ์šฉํ•œ ๋‹ค์–‘ํ•œ ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•œ๋‹ค.

 

ํŒ๋‹ค์Šค(Pandas)์˜ ์žฅ์ 

- ํ–‰๊ณผ ์—ด์— ๋ ˆ์ด๋ธ”์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.
- ๊ธฐ๋ณธ์ ์ธ ํ†ต๊ณ„๋ฐ์ดํ„ฐ๋ฅผ ์ œ๊ณตํ•œ๋‹ค.
- NaN ๊ฐ’์„ ์•Œ์•„์„œ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๋Š” ๋ฉ”์†Œ๋“œ๊ฐ€ ์žˆ๋‹ค.
- ์ˆซ์ž ๋ฌธ์ž์—ด์„ ์•Œ์•„์„œ ๋ถˆ๋Ÿฌ์˜จ๋‹ค.
- ๋ฐ์ดํ„ฐ์…‹๋“ค์„ ๋ณ‘ํ•ฉํ•  ์ˆ˜ ์žˆ๋‹ค.
- Numpy์™€ Matplotlib์™€ ํ†ตํ•ฉ๋œ๋‹ค.

 

ํŒ๋‹ค์Šค(Pandas) ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ

ํŒ๋‹ค์Šค ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ๋ถˆ๋Ÿฌ์˜ค๋Š” ๊ฐ€์žฅ ์ฒซ๋ฒˆ ์งธ ๋ฐฉ๋ฒ•์ด๋‹ค.
๋‹ค๋ฅธ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋„ ๋™์ผํ•œ ๋ฐฉ๋ฒ•์œผ๋กœ ๋ถˆ๋Ÿฌ์˜จ๋‹ค.

import pandas as pd

 

ํŒ๋‹ค์Šค ์‹œ๋ฆฌ์ฆˆ(Pandas Series) ๋ฐ์ดํ„ฐ ์ƒ์„ฑํ•˜๊ธฐ

ํŒ๋‹ค์Šค ์‹œ๋ฆฌ์ฆˆ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•  ๋• ๋ฆฌ์ŠคํŠธ๋ฅผ ์“ด๋‹ค. ํŒŒ์ด์ฌ์—์„œ ๊ฐ€์žฅ ์ค‘์š”ํ•œ ๊ฐœ๋… ์ค‘ ํ•˜๋‚˜๊ฐ€ ๋ฆฌ์ŠคํŠธ์ด๋‹ˆ ๊ผญ ๊ธฐ์–ตํ•  ๊ฒƒ!
๋จผ์ € ๋ณ€์ˆ˜์— ๋ฆฌ์ŠคํŠธ ๊ฐ’์„ ๋Œ€์ž…ํ•˜์—ฌ ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•œ๋‹ค.

>> index = ['americano','latte','cappucino','einspenner']
>> data = [30,50,'Yes','No']

index ๋ณ€์ˆ˜๋ฅผ ํ”„๋กฌํ”„ํŠธ์— ์ณ๋ณด๋ฉด ๋ณ€์ˆ˜ ์•ˆ์˜ ๊ฐ’์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

>> index
['americano','latte','cappucino','einspenner']
>> index[1]
'latte'

 

์‹œ๋ฆฌ์ฆˆ(Series) ๋ž€?

ํŒ๋‹ค์Šค์˜ 1์ฐจ์› ๋ฐ์ดํ„ฐ๋ฅผ ์‹œ๋ฆฌ์ฆˆ๋ผ๊ณ  ๋ถ€๋ฅธ๋‹ค. ์‹œ๋ฆฌ์ฆˆ ์˜ค๋ฅธ์ชฝ ๋ถ€๋ถ„์€ value(data) ๋ผ๊ณ  ํ•˜๊ณ  ์‹œ๋ฆฌ์ฆˆ์˜ ์™ผ์ชฝ ๋ถ€๋ถ„์„ index๋ผ๊ณ  ํ•œ๋‹ค.

 

์‹œ๋ฆฌ์ฆˆ ๋ฐ์ดํ„ฐ ์ƒ์„ฑ ์‹œ ์‚ฌ์šฉ๋ฒ•

pd.Series(data=๋ฐ์ดํ„ฐ ๊ฐ’์ด ์†ํ•ด ์žˆ๋Š” ๋ณ€์ˆ˜๋ช…, index=์ธ๋ฑ์Šค ๊ฐ’์ด ์†ํ•ด์žˆ๋Š” ๋ณ€์ˆ˜๋ช…)

 

>> pd.Series(data=data)
0	30
1	50
2	Yes
3	No
dtype: object


>>pd.Series(data=data, index=index)
americano	30
latte	50
cappucino	Yes
einspenner	No
dtype: object

์ƒ์„ฑ๋œ ์‹œ๋ฆฌ์ฆˆ ๋ฐ์ดํ„ฐ๋ฅผ ์ƒˆ๋กœ์šด ๋ณ€์ˆ˜์— ๋„ฃ์–ด์„œ ๋”์šฑ ๊ฐ„ํŽธํ•˜๊ฒŒ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.

>> caffee = pd.Series(data = data, index = index)
>> caffee
americano	30
latte	50
cappucino	Yes
einspenner	No
dtype: object
>> caffee.ndim # ๋ฐ์ดํ„ฐ์˜ ์ฐจ์›์„ ๋‚˜ํƒ€๋‚ด๋Š” ์†์„ฑ( 1์ด๋ฉด 1์ฐจ์›(์‹œ๋ฆฌ์ฆˆ) 2์ด๋ฉด 2์ฐจ์›(๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„))
1
>> caffee.size # ์š”์†Œ์˜ ์ด ๊ฐœ์ˆ˜๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ์†์„ฑ
4
>> caffee.shape #ํŒ๋‹ค์Šค์˜ ์‹œ๋ฆฌ์ฆˆ๋‚˜ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์„ ํŠœํ”Œ ํ˜•ํƒœ๋กœ ๋ฐ˜ํ™˜ํ•˜๋Š” ์†์„ฑ
(4,)
>> caffee.index
Index(['americano','latte','cappucino','einspenner'], dtype='object')
>> caffee.values
array([30,50,'Yes','No'], dtype=object)

 

ํŒ๋‹ค์Šค์˜ ์ธ๋ฑ์Šค์™€ ๋ ˆ์ด๋ธ”

ํŒ๋‹ค์Šค์˜ ํ•ด๋‹น ์š”์†Œ๋“ค์„ ์ง€์ •ํ•  ๋•Œ ์ธ๋ฑ์Šค์™€ ๋ ˆ์ด๋ธ”์„ ์‚ฌ์šฉํ•˜์—ฌ ์œ„์น˜๋ฅผ ํŒŒ์•…ํ•  ์ˆ˜ ์žˆ๋‹ค.
์ธ๋ฑ์Šค๋Š” 0๋ถ€ํ„ฐ ์‹œ์ž‘ํ•˜์—ฌ ํ•ด๋‹น ์š”์†Œ์˜ ์œ„์น˜์— ์ ‘๊ทผํ•  ์ˆ˜ ์žˆ๊ณ 
๋ ˆ์ด๋ธ”์€ ๊ฐ ํ–‰๋ ฌ์˜ ๊ณ ์œ ํ•œ ์‹๋ณ„์ž๋กœ ์œ„์น˜์— ์ ‘๊ทผํ•  ์ˆ˜ ์žˆ๋‹ค.

>> caffee
americano	30
latte	50
cappucino	Yes
einspenner	No
dtype: object

>> caffee['americano']
30

>> caffee[['latte','cappucino']] # 2๊ฐ€์ง€ ์ด์ƒ ํ•ด๋‹น ๋ ˆ์ด๋ธ” ์š”์†Œ๋ฅผ ๋ณด๊ณ  ์‹ถ์œผ๋ฉด ์ค‘์ฒฉ๋ฆฌ์ŠคํŠธ ์‚ฌ์šฉ
latte	50
cappucino	Yes
dtype: object

>> caffee[-1] # ์ธ๋ฑ์Šค -1์€ ๋งˆ์ง€๋ง‰ ์š”์†Œ๋ฅผ ๊ฐ€๋ฆฌํ‚จ๋‹ค.
'No'

>> caffee[0:3] # ์‹œ์ž‘ ์œ„์น˜ ์ธ๋ฑ์Šค : ๋ ์œ„์น˜ ๋‹ค์Œ ์ธ๋ฑ์Šค
americano	30
latte	50
cappucino	Yes
dtype: object

>> caffee['latte':'einspenner'] # ์‹œ์ž‘ ์œ„์น˜ ๋ ˆ์ด๋ธ” : ๋ ์œ„์น˜ ๋ ˆ์ด๋ธ”
latte	50
cappucino	Yes
einspenner	No
dtype: object

 

ํŒ๋‹ค์Šค ์‹œ๋ฆฌ์ฆˆ์˜ ์‚ฐ์ˆ ์—ฐ์‚ฐ

ํŒ๋‹ค์Šค ์‹œ๋ฆฌ์ฆˆ์—์„œ ์‚ฐ์ˆ ์—ฐ์‚ฐ์ด ๊ฐ€๋Šฅํ•˜๋‹ค.

>> Index = ['americano','latte','cappucino','einspenner']
>> data = [30,50,20,10]

>> caffee = pd.Series(data =data , index = index)

>> caffee
americano	30
latte	50
cappucino	20
einspenner	10
dtype: object

>> caffee = caffee +5 # caffee ์ „์ฒด ์‹œ๋ฆฌ์ฆˆ ์š”์†Œ์— +5

>> caffee
americano	35
latte	55
cappucino	25
einspenner	15
dtype: object

>> caffee['americano'] -= 2 # amricano ๋ ˆ์ด๋ธ” ํ•ด๋‹น ์š”์†Œ๋งŒ -2

>>caffee
americano	33
latte	55
cappucino	25
einspenner	15
dtype: object

>> caffee[['latte','cappucino']] -= 3

>> caffee
americano	33
latte	52
cappucino	22
einspenner	15
dtype: object

 

๋ฐ˜์‘ํ˜•