๐Ÿ”ฅ/pandas

[pandas] ๊ฒฐ์ธก์น˜ ํ•ฉ๊ณ„ ๋ง‰๋Œ€๊ทธ๋ž˜ํ”„ / ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์œผ๋กœ ํ‘œํ˜„ํ•˜๊ธฐ

say! 2022. 11. 3. 20:30
728x90

ํ•„์š”ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ

import pandas as pd
import numpy as np

๋ฐ์ดํ„ฐ ๋กœ๋“œํ•˜๊ธฐ

  • ํŒ๋‹ค์Šค์—์„œ ๋ฐ์ดํ„ฐ๋ฅผ ๋กœ๋“œํ•  ๋•Œ๋Š” read_csv ์‚ฌ์šฉ
  • ๋ฐ์ดํ„ฐ๋ฅผ ๋กœ๋“œํ•ด์„œ df๋ผ๋Š” ๋ณ€์ˆ˜์— ๋‹ด๊ธฐ
  • shape๋ฅผ ํ†ตํ•ด ๋ฐ์ดํ„ฐ์˜ ๊ฐœ์ˆ˜ ์ฐ๊ธฐ, ๊ฒฐ๊ณผ๋Š” (ํ–‰, ์—ด) ์ˆœ์œผ๋กœ ์ถœ๋ ฅ๋จ
# read_csv๋กœ ๋ถˆ๋Ÿฌ์˜จ ํŒŒ์ผ์€ df๋ผ๋Š” ๋ณ€์ˆ˜์— ๋‹ด๊ธฐ
df = pd.read_csv("data\์†Œ์ƒ๊ณต์ธ์‹œ์žฅ์ง„ํฅ๊ณต๋‹จ_์ƒ๊ฐ€์—…์†Œ์ •๋ณด_์˜๋ฃŒ๊ธฐ๊ด€_201909.csv", low_memory=False)
df.shape

๊ฒฐ์ธก์น˜ ํ™•์ธ

df.isnull()

๊ฒฐ์ธก์น˜ ํ•ฉ๊ณ„ ๊ตฌํ•˜๊ธฐ

# True๋Š” 1๋กœ
null_count = df.isnull().sum()
null_count

๊ฒฐ์ธก์น˜ ํ•ฉ๊ณ„๋ฅผ ๋ง‰๋Œ€๊ทธ๋ž˜ํ”„๋กœ ํ‘œํ˜„ํ•˜๊ธฐ

# ์œ„์—์„œ ๊ตฌํ•œ ๊ฒฐ์ธก์น˜๋ฅผ .plot.barh๋ฅผ ํ†ตํ•ด ๋ง‰๋Œ€๊ทธ๋ž˜ํ”„๋กœ ํ‘œํ˜„
null_count.plot.barh(figsize=(5, 7))

๊ฒฐ์ธก์น˜ ํ•ฉ๊ณ„๋ฅผ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์œผ๋กœ ๋งŒ๋“ค์–ด์ฃผ๊ธฐ

# ์œ„์—์„œ ๊ณ„์‚ฐํ•œ ๊ฒฐ์ธก์น˜ ์ˆ˜๋ฅผ reset_index๋ฅผ ํ†ตํ•ด ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์œผ๋กœ ๋งŒ๋“ค์–ด์ฃผ๊ธฐ
# df_null_count ๋ณ€์ˆ˜์— ๊ฒฐ๊ณผ๋ฅผ ๋‹ด์•„์„œ head๋กœ ๋ฏธ๋ฆฌ๋ณด๊ธฐ

df_null_count = null_count.reset_index()
df_null_count.head()