[데이터분석] 데이터 조작 및 분석을 위한 pandas 기본

2021. 11. 29. 02:10

1. Series

: data와 index를 가지며, 값을 ndarray 형태로 가지고 있음.

Series 생성

import pandas as pd

data = pd.Series([1, 3, 0 ,6])

data = pd.Series([1, 3, 0 ,6], dtype = "float")
print(data)
# 0 1
# 1 3
# 2 0
# 3 6

data = pd.Series([1, 3, 0 ,6], index = ['a', 'b', 'c', 'd'])
data['a'] = 2
print(data)
# a 2
# b 3
# c 0
# d 6

price_dict = {
    'apple': 1000, 
    'grape': 3000, 
    'peach': 1500, 
    'lemon': 900
}
price = pd.Series(price_dict)
print(price)
# apple 1000
# grape 3000
# peach 1500
# lemon 900

2. DataFrame

: 여러개의 Series가 모여서 행과 열을 이룬 데이터

DataFrame 생성

quantity_dict = {
    'apple': 200, 
    'grape': 50, 
    'peach': 150, 
    'lemon': 30
}
quantity = pd.Series(quantity_dict)

fruit = pd.DataFrame({
	'price': price,
    'quantity': quantity
)}

# Dictionary 활용하여 DataFrame 생성 가능
data = {
	'fruit': ['apple', 'grape', 'peach', 'lemon'],
    'price': [1000, 3000, 1500, 900],
    'quantity': [200, 50, 150, 30]
}

fruit = pd.DataFrame(data)
fruit = fruit.set_index('fruit')	# 'fruit' 컬럼 인덱스로 지정

DataFrame 속성 설정

print(fruit.shape)	# (4, 2)
print(fruit.size)	# 8
print(fruit.ndim)	# 2
print(fruit.values)
# [[1000, 200],
#  [3000, 50],
#  [1500, 150],
#  [900, 30]]
   
fruit.index.name = 'Fruit'	# 인덱스 이름 지정
fruit.columnns.name = 'Stock'	# 컬럼에 이름 지정

DataFrame 저장 및 불러오기

# DataFrame 저장
fruit.to_csv("./fruit.csv")
fruit.to_excel("./fruit.xlsx")

# DataFrame 불러오기
fruit = pd.read_csv("./fruit.csv")
fruit = pd.read_excel("./fruit.xslx")

3. 데이터 선택 및 변경하기

.loc : 명시적인 인덱스를 참조하는 인덱싱/슬라이싱
.iloc : 정수 인덱스 인덱싱/슬라이싱
.query() : 조건을 넣어 해당하는 행 추출
.drop() : 컬럼 삭제 (axis = 1 : 열방향, inplace = True : 원본 변경)

Example

fruit.loc['lemon']	# 인덱싱
fruit.loc['grape':'lemon', :'quantity'] # 슬라이싱
# grape부터 lemon까지의 행과 처음부터 quantity까지의 열 추출

# 정수 인덱스 사용하여 같은 부분 추출하기
fruit.iloc[-1]
fruit.iloc[1:3, :2]

fruit['price']	# Series 형태
fruit[['price']]	# DataFrame 형태

# 조건 활용
fruit[fruit['price'] < 1000]
fruit.query(fruit['price'] < 1000)

# 연산자 활용한 컬럼 추가
total_price = fruit['price'] * fruit['quantity']
fruit['total price'] = total_price

# 리스트 or 딕셔너리로 데이터 추가
df = pd.DataFrame(columns = ['fruit', 'price', 'quantity'])
df.loc[0] = ['apple', 1000, 200]
df.loc[1] = {'fruit':'lemon', 'price':3000, 'quantity':50}

df.loc[1, 'fruit'] = 'grape'	# 명시적 인덱스 활용하여 데이터 수정
df['color'] = np.nan	# 새로운 컬럼 추가 후 초기화
df.drop('color', axis = 1, inplace = True)	# color가 존재하는 컬럼 열 방향으로 삭제

저작자표시 비영리 변경금지 (새창열림)

'Python > 데이터분석' 카테고리의 다른 글

[데이터 분석] 데이터 시각화 - 워드 클라우드 (0)	2021.12.18
[데이터 분석] 데이터 전처리 방법 (0)	2021.12.17
[데이터 분석] 데이터 분석 절차 (0)	2021.12.17
[데이터분석] Matplotlib 데이터 시각화 그래프 (0)	2021.12.12
[데이터분석] 데이터 조작 및 분석을 통한 pandas 심화 (0)	2021.12.09

Base Line

Menu

Category

[데이터분석] 데이터 조작 및 분석을 위한 pandas 기본

1. Series

2. DataFrame

3. 데이터 선택 및 변경하기

'Python > 데이터분석' 카테고리의 다른 글

+ Recent posts

티스토리툴바