๊ด€๋ฆฌ ๋ฉ”๋‰ด

ruriruriya

[Python] ํŒŒ์ด์ฌ ํŒ๋‹ค์Šค(Pandas) - ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„(DataFrame)์— ๋ฐ์ดํ„ฐ ๋กœ๋“œ ๋ณธ๋ฌธ

๐ŸPython/Pandas

[Python] ํŒŒ์ด์ฌ ํŒ๋‹ค์Šค(Pandas) - ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„(DataFrame)์— ๋ฐ์ดํ„ฐ ๋กœ๋“œ

๋ฃจ๋ฆฌ์•ผใ…‘ 2023. 11. 15. 14:11
๋ฐ˜์‘ํ˜•

์‚ฌ์ง„: Unsplash ์˜ Li Jiangang

๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์— ํŒŒ์ผ์„ ๋กœ๋“œํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ๋‹ค์–‘ํ•œ ํŒŒ์ผ ํ˜•์‹์— ๋”ฐ๋ผ ๋‹ค๋ฅด๋‹ค. ์ฃผ๋กœ ์‚ฌ์šฉ๋˜๋Š” ํŒŒ์ผ ํ˜•์‹์œผ๋กœ๋Š” CSV, Excel, JSON, SQL, HTML ๋“ฑ์ด ์žˆ๋‹ค. ๊ฐ ํ˜•์‹์— ๋งž๊ฒŒ ์ ์ ˆํ•œ ํŒ๋‹ค์Šค ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ๋ฅผ ๋กœ๋“œํ•  ์ˆ˜ ์žˆ๋‹ค.

CSV(Comma Separated Values)ํŒŒ์ผ ์ฝ๋Š” ๋ฐฉ๋ฒ•

CSV ํŒŒ์ผ์€ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์˜ ์š”์†Œ๊ฐ€ ์ฝค๋งˆ๋กœ ๊ตฌ๋ถ„๋˜์–ด ์žˆ๋Š” ๊ฒƒ์„ ๋งํ•œ๋‹ค.
CSV ํŒŒ์ผ์„ read_csv() ๋ฉ”์†Œ๋“œ๋กœ ๋กœ๋“œํ•˜๋ฉด ์ฝค๋งˆ ๊ธฐ์ค€์œผ๋กœ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์ด ์ถœ๋ ฅ๋œ๋‹ค.

๊ตฌ๊ธ€ ์ฝ”๋žฉ์—์„œ ์‹คํ–‰ํ•˜๋ฉด ๋ฐ์ดํ„ฐ๊ฐ€ ๋งŽ์„ ๋• ์•„๋ž˜ ํ–‰๊ณผ ์—ด์ด ํ‘œ์‹œ๋˜๋ฉฐ ์ค‘๊ฐ„์ด ์ƒ๋žต๋œ๋‹ค.

>> df = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/แ„ƒแ…ฆแ„‹แ…ตแ„แ…ฅแ„‡แ…ฎแ†ซแ„‰แ…ฅแ†จ/data/GOOG.csv')
>> df

HTML ํŒŒ์ผ๋กœ ๋กœ๋“œํ•  ์ˆ˜ ์žˆ๋‹ค.
๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์ด ํ•ด๋‹น๋œ ์›นํŽ˜์ด์ง€์˜ ๋งํฌ๋กœ ๋กœ๋“œํ•œ๋‹ค.

read_html() ๋ฉ”์†Œ๋“œ๋ฅผ ์‚ฌ์šฉํ•ด์„œ ๋ณ€์ˆ˜์— ๋„ฃ์–ด ์ถœ๋ ฅํ•ด๋ณด๋ฉด
๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์ด ์•„๋‹Œ ๋ฆฌ์ŠคํŠธ๊ฐ€ ๋‚˜์˜จ๋‹ค.
์ด ์›นํŽ˜์ด์ง€ ๋‚ด์— ํ‘œ๊ฐ€ 2๊ฐœ์ด๊ธฐ ๋•Œ๋ฌธ์— ๋ถˆ๋Ÿฌ์˜จ๋’ค ์ง€์ •์„ ํ•ด์ฃผ์ง€ ์•Š์œผ๋ฉด
์ด๋ ‡๊ฒŒ ๋ฆฌ์ŠคํŠธ(list) ๋กœ ์ถœ๋ ฅ๋œ๋‹ค.

๊ทธ๋Ÿฌ๋ฉด ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์œผ๋กœ ๋ณด๋ ค๋ฉด ์–ด๋–ป๊ฒŒ ํ•ด์•ผํ• ๊นŒ?

>> df = pd.read_html('https://www.livingin-canada.com/house-prices-canada.html')
>> df
[                                                City  \
 0                                      Vancouver, BC   
 1                                       Toronto, Ont   
 2                                        Ottawa, Ont   
 3                                       Calgary, Alb   
 4                                      Montreal, Que   
 5                                        Halifax, NS   
 6                                       Regina, Sask   
 7                                    Fredericton, NB   
 8  (adsbygoogle = window.adsbygoogle || []).push(...   
 
                                  Average House Price  \
 0                                         $1,036,000   
 1                                           $870,000   
 2                                           $479,000   
 3                                           $410,000   
 4                                           $435,000   
 5                                           $331,000   
 6                                           $254,000   
 7                                           $198,000   
 8  (adsbygoogle = window.adsbygoogle || []).push(...   
 
                                      12 Month Change  
 0                                           + 2.63 %  
 1                                            +10.2 %  
 2                                           + 15.4 %  
 3                                            – 1.5 %  
 4                                            + 9.3 %  
 5                                            + 3.6 %  
 6                                            – 3.9 %  
 7                                            – 4.3 %  
 8  (adsbygoogle = window.adsbygoogle || []).push(...  ,
                                              Province  \
 0                                    British Columbia   
 1                                             Ontario   
 2                                             Alberta   
 3                                              Quebec   
 4                                            Manitoba   
 5                                        Saskatchewan   
 6                                         Nova Scotia   
 7                                Prince Edward Island   
 8                             Newfoundland / Labrador   
 9                                       New Brunswick   
 10                                   Canadian Average   
 11  (adsbygoogle = window.adsbygoogle || []).push(...   
 
                                   Average House Price  \
 0                                            $736,000   
 1                                            $594,000   
 2                                            $353,000   
 3                                            $340,000   
 4                                            $295,000   
 5                                            $271,000   
 6                                            $266,000   
 7                                            $243,000   
 8                                            $236,000   
 9                                            $183,000   
 10                                           $488,000   
 11  (adsbygoogle = window.adsbygoogle || []).push(...   
 
                                       12 Month Change  
 0                                             + 7.6 %  
 1                                             – 3.2 %  
 2                                             – 7.5 %  
 3                                             + 7.6 %  
 4                                             – 1.4 %  
 5                                             – 3.8 %  
 6                                             + 3.5 %  
 7                                             + 3.0 %  
 8                                             – 1.6 %  
 9                                             – 2.2 %  
 10                                            – 1.3 %  
 11  (adsbygoogle = window.adsbygoogle || []).push(...  ]

 

๋ฆฌ์ŠคํŠธ๋Š” ์ธ๋ฑ์Šค๋กœ ์š”์†Œ๊ฐ’์„ ์ฐพ์„ ์ˆ˜ ์žˆ๋‹ค.

์œ„ ๋ฆฌ์ŠคํŠธ ๋‚ด์— ๋ฆฌ์ŠคํŠธ ๊ฐœ์ˆ˜๋Š” 2๊ฐœ์ด๋‹ค.
len() ๋ฉ”์†Œ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ์‰ฝ๊ฒŒ ์•Œ ์ˆ˜ ์žˆ๋‹ค.

>> len(df)
2

๋ฆฌ์ŠคํŠธ์˜ ์ฒซ ๋ฒˆ์งธ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์€ df[0], ๋‘ ๋ฒˆ์งธ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์€ df[1] ์ด๋‹ค.
ํ”„๋กฌํ”„ํŠธ๋กœ ๋ณ€์ˆ˜[์ธ๋ฑ์Šค]๋ฅผ ์ž…๋ ฅํ•˜์—ฌ ์ถœ๋ ฅํ•˜๋ฉด ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ํ˜•ํƒœ๋กœ ํ‘œ๊ฐ€ ๋‚˜์˜จ๋‹ค.

>> df[0]

>> df[1]

์œ„์— ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์„ ๋ณด๋ฉด ๋งˆ์ง€๋ง‰ ํ–‰์—๋Š” adsbygoogle.... ๋กœ ์‹œ์ž‘ํ•˜๋Š” ๊ฒฝ๊ณ ๋ฌธ์ด ๋‚˜์˜ค๋Š” ๋ฐ
์ด๊ฒƒ์€ ๊ตฌ๊ธ€ ๊ด‘๊ณ ๊ฐ€ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„๋งˆ๋‹ค ํ•˜๋‹จ์— ๋‹ฌ๋ ค์žˆ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

์ด ๊ฒƒ์„ ์—†์• ๋ ค๋ฉด ์–ด๋–ป๊ฒŒ ํ•ด์•ผํ• ๊นŒ?

drop() ๋ฉ”์†Œ๋“œ์™€ ์Šฌ๋ผ์ด์‹ฑ์œผ๋กœ ์—†์• ๋Š” ๋‘ ๊ฐ€์ง€ ๋ฐฉ๋ฒ•์ด ์žˆ๋‹ค.

>> city=df[0].loc[0:7,] #์Šฌ๋ผ์ด์‹ฑ

>> provinces = df[0].drop(8, axis= 0) #drop ๋ฉ”์†Œ๋“œ ์‚ฌ์šฉ axis์€ ํ–‰๊ณผ ์—ด์˜ ๋ฐฉํ–ฅ์„ ๊ฐ€๋ฆฌํ‚จ๋‹ค.

์ด๋ ‡๊ฒŒ ํŒ๋‹ค์Šค๋Š” ์—ฌ๋Ÿฌ ํ™•์žฅ์ž์˜ ํŒŒ์ผ์„ ๋กœ๋“œํ•ด์˜ฌ ์ˆ˜ ์žˆ๋‹ค.

ํŒŒ์ผ ๋กœ๋“œํ•  ๋•Œ ๋ฉ”์†Œ๋“œ ์ •๋ฆฌ

ํŒŒ์ผ ํ˜•์‹ ํŒ๋‹ค์Šค ๋ฉ”์†Œ๋“œ ์‚ฌ์šฉ ์˜ˆ์‹œ
CSV read_csv() pd.read_csv('ํŒŒ์ผ๋ช….csv')
Excel read_excel() pd.read_excel('ํŒŒ์ผ๋ช….xlsx')
JSON read_json() pd.read_json('ํŒŒ์ผ๋ช….json')
SQL read_sql() pd.read_sql('ํ…Œ์ด๋ธ”๋ช…', engine)
SQL read_sql_query() pd.read_sql_query('SELECT * FROM ํ…Œ์ด๋ธ”๋ช…', engine)
HTML read_html() pd.read_html('์›นํŽ˜์ด์ง€ ์ฃผ์†Œ')[0]
๊ธฐํƒ€ ํ˜•์‹ ๋‹ค์–‘ํ•œ ํ˜•์‹์— ๋”ฐ๋ฆ„ ๋‹ค์–‘ํ•œ ํŒŒ์ผ ํ˜•์‹์— ๋Œ€ํ•œ ํŠนํ™”๋œ ๋ฉ”์„œ๋“œ๋ฅผ ์‚ฌ์šฉ

 

๋ฐ˜์‘ํ˜•