Importing Data from OpenAPI on Public Data Portal to Excel File using Python in Google Colab ๊ณต๊ณต๋ฐ์ดํฐํฌํธ ์คํAPI ๋ฐ์ดํฐ ๊ตฌ๊ธ์ฝ๋ฉ ํ์ด์ฌ ํตํด ์์
๋ก ๊ฐ์ ธ์ค๋ ๋ฐฉ๋ฒ ์ธ๊ต๋ถ_๊ตญ๊ฐโ์ง์ญ๋ณ ์ธ๊ตฌ์ฆ๊ฐ ์ ๋ณด
๊ณต๊ณต๋ฐ์ดํฐํฌํธ (data.go.kr)
1. ํ์ํ ์๋ฃ ๊ฒ์ ํ Open API "ํ์ฉ์ ์ฒญ"ํ์ฌ ์ธ์ฆํค๋ฅผ ๋ฐ๊ธ๋ฐ์์ผ ํฉ๋๋ค.
์ ๋ ์คํ API๋ก ์ ์ฅ๋ "์ธ๊ต๋ถ_๊ตญ๊ฐโ์ง์ญ๋ณ ์ธ๊ตฌ์ฆ๊ฐ ์ ๋ณด" ๋ฅผ ์์
ํ์ผ๋ก ๊ฐ์ ธ์ค๋๋ก ํ๊ฒ ์ต๋๋ค.
2. ํ์ฉ์ ์ฒญ์ ํ์ฌ ์ธ์ฆํค ๋ฑ ์์
๋ก ๋ฐ์ดํฐ๋ฅผ ๋ณํํ๊ธฐ ์ํด ํ์ํ ์ ๋ณด๋ฅผ ๋ฐ๊ธ ๋ฐ์์ผ ํฉ๋๋ค.
- End Point (URL) - ์ผ๋ฐ ์ธ์ฆํค (Decoding)
(1) ํ > ๋ง์ดํ์ด์ง > ๋ฐ์ดํฐ ํ์ฉ > Open API > ํ์ฉ์ ์ฒญ ํํฉ > ๊ฐ๋ฐ๊ณ์ ์์ธ๋ณด๊ธฐ - ์๋น์ค ์ ๋ณด - End Point (2) ๋๋ ์ฐธ๊ณ ๋ฌธ์์ API ์๋น์ค ๋ฐฐํฌ์ ๋ณด > ์๋น์ค URL ํตํด ๋งํฌ๋ฅผ ํ์ธํ ์ ์์ต๋๋ค.
End Point = URL
Service key(์๋น์คํค)๋ ์ผ๋ฐ ์ผ์ฆํค(Decoding) ์ ์ฌ์ฉํ์ฌ์ผ ํฉ๋๋ค. ํน์ ์๋น์คํค๋ฅผ ์ฌ๋ฐ๊ธ์ ๋ฐ์ ๊ฒฝ์ฐ์๋ "์ฌ๋ฐ๊ธ ์ธ์ฆํค"์ "์ ๊ธ๋ฐ๊ธ ์ธ์ฆํค"๋ฅผ ํจ๊ป ๊ธฐ์
ํ๊ฑฐ๋ "์ฌ๋ฐ๊ธ ์ธ์ฆํค"๋ง ๊ธฐ์
ํ์ฌ์ผ ํฉ๋๋ค.
3. ์ฐธ๊ณ ๋ฌธ์์ "Open API ํ์ฉ๊ฐ์ด๋"๋ฅผ ๋ค์ด ๋ฐ์ ๋ฐ์ดํฐํ๋ ์(DataFrame)์ ์ฌ์ฉํ ์ฟผ๋ฆฌ ๋งค๊ฐ๋ณ์(Query parameters) ์ ๋ณด๋ฅผ ํ๋ํฉ๋๋ค.
4. ์๋ ํ์ด์ฌ ์ฝ๋๋ฅผ ํ์ฉํ์ฌ ๊ตฌ๊ธ ์ฝ๋ฉ์์ ์๋์ํต๋๋ค.
import urllib.request
import json
from pandas import json_normalize
from urllib.parse import urlencode
import pandas as pd
from google.colab import files
# Service URL
๋ง์ฝ "์ผ๋ฐ ์ธ์ฆํค ์ฌ๋ฐ๊ธํ ์ํ๋ผ๋ฉด, (1) '์ฌ๋ฐ๊ธ'๋ ์ผ๋ฐ ์ธ์ฆํค(Decoding)๋ง ์
๋ ฅํ๊ฑฐ๋ (2) ๋ ํค๋ฅผ ํจ๊ป ์
๋ ฅํ๋ ๊ฒ์ด ๊ฐ๋ฅํฉ๋๋ค.
(1)
# Service key
service_key = ['RRdu2sm4lP7tX+3YDmpkSRj9Sy1BEyChV+h64EQYYedyvS+163PkCfupS+FWJYSX447gYoadn4Xxq0s50F4I2A==' ] # ์ผ๋ฐ ์ธ์ฆํค(Decoding)
(2)
# Service keys
service_keys = [
'RRdu2sm4lP7tX+3YDmpkSRj9Sy1BEyChV+h64EQYYedyvS+163PkCfupS+FWJYSX447gYoadn4Xxq0s50F4I2A==' , # ์ฌ๋ฐ๊ธ ์ธ์ฆํค
'Opv87t5OZt71vaZcn1U+BMh6oN8REiAIoOJ+8z8uV0sfel1fzta8EKsfeCUktff5Xz0GAOOQx6Kpq0AUG3KjSLQ==' # ์ ๊ท๋ฐ๊ธ ์ธ์ฆํค
]
(1) ์ ๊ฒฝ์ฐ๋ฅผ ์์ ํ ๊ฒฝ์ฐ, ์ดํ ์๋ ์ฝ๋๋ฅผ ์์ฑํด์ ์๋์์ผ์ฃผ์๋ฉด ํ์ผ์ด ๋ค์ด๋ก๋ ๋ฉ๋๋ค.
# Fetch and save data with error handling
for key in service_key:
try :
# Initialize an empty list to store DataFrames
all_data_frames = []
# Query parameters without country-specific conditions
queryParams = '?' + urlencode({
'serviceKey' : key,
'returnType' : 'json' ,
'numOfRows' : '1000' , # Adjusted to fetch more rows per request
'pageNo' : '1'
})
# Construct URL
url = api + queryParams
# Fetch data
response = urllib.request.urlopen(url).read().decode( 'utf-8' )
print ( "Response content:" , response) # Debugging line to check the response content
data = json.loads(response)
print ( "Parsed JSON:" , data) # Debugging line to inspect parsed JSON
# Check if 'data' key is present
if 'data' in data:
items = data[ 'data' ]
df = json_normalize(items)
all_data_frames.append(df)
# Concatenate all dataframes
all_data = pd.concat(all_data_frames, ignore_index= True )
# Save to Excel
all_data.to_excel( 'country_population_all.xlsx' , index= False )
print ( "Data has been saved to country_population_all.xlsx" )
# Download the saved Excel file
files.download( 'country_population_all.xlsx' )
break # Exit loop after successful fetch
else :
print ( "Key 'data' not found in the response." )
except urllib.error.HTTPError as e:
print ( "HTTP Error:" , e.code, e.reason)
except urllib.error.URLError as e:
print ( "URL Error:" , e.reason)
except json.JSONDecodeError as e:
print ( "JSON Decode Error:" , e.msg)
except Exception as e:
print ( "Unexpected Error:" , str (e))
print ( f "Failed with service key: {key} " )
๊ฒฐ๊ณผ๋ฌผ์ ์๋์ ๊ฐ์ต๋๋ค.
๊ฒฐ๊ณผ๋ฌผ ์บก์ฒ๋ณธ (แแ
ฌแแ
ญแแ
ฎ_แแ
ฎแจแแ
ก·แแ
ตแแ
งแจแแ
งแฏแแ
ตแซแแ
ฎแแ
ณแผแแ
กแทแแ
ฅแผแแ
ฉ_23๋
์
๋ก๋)
์ ์ฒด์ฝ๋
import urllib.request
import json
from pandas import json_normalize
from urllib.parse import urlencode
import pandas as pd
from google.colab import files
# Service URL
# Service keys
service_key = [ 'RRdu2sm4lP7tX+3YDmpkSRj9Sy1BEyChV+h64EQYYedyvS+163PkCfupS+FWJYSX447gYoadn4Xxq0s50F4I2A==' ] # ์ผ๋ฐ ์ธ์ฆํค(Decoding)
# Fetch and save data with error handling
for key in service_key:
try :
# Initialize an empty list to store DataFrames
all_data_frames = []
# Query parameters without country-specific conditions
queryParams = '?' + urlencode({
'serviceKey' : key,
'returnType' : 'json' ,
'numOfRows' : '1000' , # Adjusted to fetch more rows per request
'pageNo' : '1'
})
# Construct URL
url = api + queryParams
# Fetch data
response = urllib.request.urlopen(url).read().decode( 'utf-8' )
print ( "Response content:" , response) # Debugging line to check the response content
data = json.loads(response)
print ( "Parsed JSON:" , data) # Debugging line to inspect parsed JSON
# Check if 'data' key is present
if 'data' in data:
items = data[ 'data' ]
df = json_normalize(items)
all_data_frames.append(df)
# Concatenate all dataframes
all_data = pd.concat(all_data_frames, ignore_index= True )
# Save to Excel
all_data.to_excel( 'country_population_all.xlsx' , index= False )
print ( "Data has been saved to country_population_all.xlsx" )
# Download the saved Excel file
files.download( 'country_population_all.xlsx' )
break # Exit loop after successful fetch
else :
print ( "Key 'data' not found in the response." )
except urllib.error.HTTPError as e:
print ( "HTTP Error:" , e.code, e.reason)
except urllib.error.URLError as e:
print ( "URL Error:" , e.reason)
except json.JSONDecodeError as e:
print ( "JSON Decode Error:" , e.msg)
except Exception as e:
print ( "Unexpected Error:" , str (e))
print ( f "Failed with service key: {key} " )
์ฐธ๊ณ ๋ก ํด๋น ์ฝ๋ ์๋์ ๋ฐ๋ ์๋ต์ ์๋์ ๊ฐ์ต๋๋ค.