๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
โ–ช Research

Getting Started with Open API (3)

by soychoi 2024. 8. 1.
728x90
๋ฐ˜์‘ํ˜•
Converting Data from Open API to Excel on data.go.kr
ํ–‰์ •์•ˆ์ „๋ถ€_ํ†ต๊ณ„์—ฐ๋ณด_์ธ๊ตฌ ๊ทœ๋ชจ๋ณ„ ํ–‰์ •๊ตฌ์—ญ

 

 

1. ๊ณต๊ณต๋ฐ์ดํ„ฐํฌํ„ธ์— ์˜คํ”ˆAPI๋กœ ์ œ๊ณต๋œ "ํ–‰์ •์•ˆ์ „๋ถ€ - ํ†ต๊ณ„์—ฐ๋ณด ์ธ๊ตฌ ๊ทœ๋ชจ๋ณ„ ํ–‰์ •๊ตฌ์—ญ" ์ž๋ฃŒ๋ฅผ ๊ตฌ๊ธ€์ฝ”๋žฉ ํŒŒ์ด์ฌ ์ฝ”๋“œ๋ฅผ ํ†ตํ•ด ํŒŒ์ผ๋ฐ์ดํ„ฐ๋กœ ๋ณ€ํ™˜ํ•˜๊ธฐ ์œ„ํ•˜์—ฌ "ํ™œ์šฉ์‹ ์ฒญ"์„ ํ•˜์—ฌ URL๊ณผ ์„œ๋น„์Šค ์ธ์ฆํ‚ค๋ฅผ ๋ฐœ๊ธ‰๋ฐ›์•˜์Šต๋‹ˆ๋‹ค.

 

 

 

2. ๊ธฐ์ˆ ๋ฌธ์„œ ๋‹ค์šด๋ฐ›์•„ ์š”์ฒญ ๋ฐ ์‘๋‹ต ๋ฉ”์„ธ์ง€ ํ™•์ธ

๊ธฐ์ˆ ๋ฌธ์„œ๋ช… : ํ–‰์ •์•ˆ์ „๋ถ€ ํ–‰์ •·์•ˆ์ „ ๊ณต๊ณต๋ฐ์ดํ„ฐ Open API ํ™œ์šฉ๊ฐ€์ด๋“œ
1. ์„œ๋น„์Šค ๋ช…์„ธ > ๋‹ค. ์ƒ์„ธ๊ธฐ๋Šฅ ๋‚ด์—ญ > b. ์š”์ฒญ ๋ฉ”์‹œ์ง€ ๋ช…์„ธ & c. ์‘๋‹ต ๋ฉ”์‹œ์ง€ ๋ช…์„ธ

 

 

 

3. ๊ตฌ๊ธ€์ฝ”๋žฉ ์‹คํ–‰์„ ์œ„ํ•œ API URL์€ "Call Back URL"์นธ์„ ํ™œ์šฉํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

1. ์„œ๋น„์Šค ๋ช…์„ธ > 1.1 ๊ณต๊ณต๋ฐ์ดํ„ฐ API ์„œ๋น„์Šค > ๋‹ค. ์ƒ์„ธ๊ธฐ๋Šฅ ๋‚ด์—ญ
* End Point์˜ URL์™€ ์ƒ์ดํ•จ

 

Call Back URL

 

 

๊ฒฐ๊ณผ๋ฌผ:

 

 

์ฝ”๋“œ:

import pandas as pd
import requests
from google.colab import files

# API URL and key
service_key = "kGUKTo8Mh/FFlsfrtw7HOGgfkKwZkXH8TNNzDcOdXpdg4I5RrbSu89yzFp9PLET6xzVDhwQU5VyRCUKIoG2YQg=="

# Function to fetch data from API
def fetch_data(year, page_no, num_of_rows):
params = {
'ServiceKey': service_key,
'type': 'xml',
'pageNo': page_no,
'numOfRows': num_of_rows,
'bas_yy': year
}
response = requests.get(api_url, params=params)
response.raise_for_status() # Check for request errors
return response.content

# Parse XML and extract data
def parse_xml(xml_data):
root = ET.fromstring(xml_data)
data = []
for row in root.findall('.//row'):
record = {}
for elem in row:
record[elem.tag] = elem.text
data.append(record)
return data

# Loop through years and pages to fetch all data
all_data = []
years = range(2010, 2020) # Change range as needed
num_of_rows = 1000
for year in years:
page_no = 1
while True:
xml_data = fetch_data(year, page_no, num_of_rows)
data = parse_xml(xml_data)
if not data:
break # No more data to fetch
all_data.extend(data)
page_no += 1

# Create DataFrame and save to Excel
df = pd.DataFrame(all_data)
excel_file = 'AdministrativeDistStatPopSize_all_years.xlsx'
df.to_excel(excel_file, index=False)

# Download the Excel file
files.download(excel_file)

 

 

 

4. ๋งŒ์•ฝ ํ•ญ๋ชฉ๋ช…์„ ๊ตญ๋ฌธ์œผ๋กœ ์„ค์ •ํ•˜๊ณ  ์‹ถ๋‹ค๋ฉด ์•„๋ž˜์™€ ๊ฐ™์ด ์ฝ”๋“œ๋ฅผ ์ž‘์„ฑํ•˜์…”๋„ ๋˜์‹ญ๋‹ˆ๋‹ค.

 
import pandas as pd
import requests
import xml.etree.ElementTree as ET
from google.colab import files

# API URL and key
service_key = "kGUKTo8Mh/FFlsfrtw7HOGgfkKwZkXH8TNNzDcOdXpdg4I5RrbSu89yzFp9PLET6xzVDhwQU5VyRCUKIoG2YQg=="

# Column name mapping from English to Korean
column_mapping = {
"totalCount": "์ „์ฒด ๊ฒฐ๊ณผ ์ˆ˜",
"numOfRows": "ํ•œ ํŽ˜์ด์ง€๊ฒฐ๊ณผ ์ˆ˜",
"pageNo": "ํŽ˜์ด์ง€ ๋ฒˆํ˜ธ",
"type": "์ˆ˜์‹  ๋ฌธ์„œํ˜•์‹",
"resultCode": "๊ฒฐ๊ณผ์ฝ”๋“œ",
"resultMsg": "๊ฒฐ๊ณผ๋ฉ”์„ธ์ง€",
"bas_yy": "๊ธฐ์ค€๋…„๋„",
"city_smry": "์‹œ ๊ณ„",
"city_tths50_mor": "์‹œ 50๋งŒ์ด์ƒ",
"city_tths30_ut_tths50": "์‹œ 30๋งŒ์ด์ƒ 50๋งŒ๋ฏธ๋งŒ",
"city_tths10_ut_tths30": "์‹œ 10๋งŒ์ด์ƒ 30๋งŒ๋ฏธ๋งŒ",
"city_tths10_lss": "์‹œ 10๋งŒ๋ฏธ๋งŒ",
"cnti_smry": "๊ตฐ ๊ณ„",
"cnti_tths10_mor": "๊ตฐ 10๋งŒ์ด์ƒ",
"cnti_tths5_ut_tths10": "๊ตฐ 5๋งŒ์ด์ƒ 10๋งŒ๋ฏธ๋งŒ",
"cnti_tths3_ut_tths5": "๊ตฐ 3๋งŒ์ด์ƒ 5๋งŒ๋ฏธ๋งŒ",
"cnti_tths3_lss": "๊ตฐ 3๋งŒ๋ฏธ๋งŒ",
"atodstri_smry": "์ž์น˜๊ตฌ ๊ณ„",
"atodstri_tths50_mor": "์ž์น˜๊ตฌ 50๋งŒ์ด์ƒ",
"atodstri_tths30_ut_tths50": "์ž์น˜๊ตฌ 30๋งŒ์ด์ƒ 50๋งŒ๋ฏธ๋งŒ",
"atodstri_tths30_lss": "์ž์น˜๊ตฌ 30๋งŒ๋ฏธ๋งŒ",
"eup_smry": "์ ๊ณ„",
"eup_tths3_mor": "์ 3๋งŒ์ด์ƒ",
"eup_tths2_ut_tths3": "์ 2๋งŒ์ด์ƒ 3๋งŒ๋ฏธ๋งŒ",
"eup_tths1_ut_tths2": "์ 1๋งŒ์ด์ƒ 2๋งŒ๋ฏธ๋งŒ",
"eup_tths1_lss": "์ 1๋งŒ๋ฏธ๋งŒ",
"myeon_smry": "๋ฉด ๊ณ„",
"myeon_tths2_mor": "๋ฉด 2๋งŒ์ด์ƒ",
"myeon_tths1_ut_tths2": "๋ฉด 1๋งŒ์ด์ƒ 2๋งŒ๋ฏธ๋งŒ",
"myeon_ths5_ut_tths1": "๋ฉด 5์ฒœ์ด์ƒ 1๋งŒ๋ฏธ๋งŒ",
"myeon_ths5_lss": "๋ฉด 5์ฒœ๋ฏธ๋งŒ",
"dong_smry": "๋™ ๊ณ„",
"dong_tths3_mor": "๋™ 3๋งŒ์ด์ƒ",
"dong_tths2_ut_tths3": "๋™ 2๋งŒ์ด์ƒ 3๋งŒ๋ฏธ๋งŒ",
"dong_tths1_ut_tths2": "๋™ 1๋งŒ์ด์ƒ 2๋งŒ๋ฏธ๋งŒ",
"dong_ths5_ut_tths1": "๋™ 5์ฒœ์ด์ƒ 1๋งŒ๋ฏธ๋งŒ",
"dong_ths5_lss": "๋™ 5์ฒœ๋ฏธ๋งŒ"
}

# Function to fetch data from API
def fetch_data(year, page_no, num_of_rows):
params = {
'ServiceKey': service_key,
'type': 'xml',
'pageNo': page_no,
'numOfRows': num_of_rows,
'bas_yy': year
}
response = requests.get(api_url, params=params)
response.raise_for_status() # Check for request errors
return response.content

# Parse XML and extract data
def parse_xml(xml_data):
root = ET.fromstring(xml_data)
data = []
for row in root.findall('.//row'):
record = {}
for elem in row:
record[elem.tag] = elem.text
data.append(record)
return data

# Loop through years and pages to fetch all data
all_data = []
years = range(2010, 2020) # Adjust the range as needed
num_of_rows = 1000
for year in years:
page_no = 1
while True:
xml_data = fetch_data(year, page_no, num_of_rows)
data = parse_xml(xml_data)
if not data:
break # No more data to fetch
all_data.extend(data)
page_no += 1

# Create DataFrame and rename columns to Korean
df = pd.DataFrame(all_data)
df.rename(columns=column_mapping, inplace=True)

# Save to Excel
excel_file = 'AdministrativeDistStatPopSize_all_years_kr.xlsx'
df.to_excel(excel_file, index=False)

# Download the Excel file
files.download(excel_file)

 

๊ฒฐ๊ณผ๋ฌผ:

 

 

* ์˜ค๋ฅ˜ ๊ด€๋ จ

 

ํ•ด๋‹น ์ฝ”๋“œ์—์„œ๋Š” End Point๋ฅผ ์‚ฌ์šฉํ•˜๋‹ˆ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•˜๋Š” ๊ฒƒ์„ ๋ณด์•„, call back URL ์‚ฌ์šฉํ•˜์—ฌ ์ฝ”๋“œ ์‚ฌ์šฉํ•˜์‹œ๋Š” ๊ฒƒ์„ ์ถ”์ฒœ๋“œ๋ฆฝ๋‹ˆ๋‹ค.

 

 

728x90
๋ฐ˜์‘ํ˜•