Payer Account에서 Athena로 AWS Config Resource 조회하기

Cloud/AWS

Payer Account에서 Athena로 AWS Config Resource 조회하기

chronosa 2021. 6. 15. 17:00

Payer Account에서 Linked Account의 Config 리소스를 조회하려면 Config Aggregator를 사용하는 방법이 있다. 그러나 Config Aggregator에서는 Tag 별 검색 등 검색에 제한이 있어 이를 Athena에 연동하였다. 대부분 참조로 기재해둔 Document에 내용이 명시되어 있으며 별도로 추가한 부분은 Payer S3에 일괄저장하기, Stackset 사용, Lambda 코드 일부 수정 정도이다.

1. Payer Account S3 생성

Payer Account에 S3를 생성한다. S3 Policy는 아래를 참조한다. (특정 Org에서 접근할 수 있도록 전체 권한 부여)

2. Linked Account Config 설정

각 Linked Account에 Config를 설정한다. Cloudformation stackset을 사용하면 구현이 용이하며, 본인은 AWS에서 Sample로 제공해주는 Stack을 일부 수정하여 적용하였다.

3. Athena Table 생성

실제로 S3에 있는 데이터를 쿼리하기 위한 Athena Table을 생성한다. "※ 참조 > 1. How to ~~"에 있는 생성 쿼리를 참조하면 된다. Document 기준 accountid, dt, region 기준으로 Partioning 처리를 하도록 되어있으며. 참고로 데이터가 HIVE형태(예:year=2021)와 같이 분할되어 있지 않으므로, Lambda를 통해 Partioning 해준다.

4. Lambda 생성

Document에 있는 생성코드와 거의 흡사하다. Document에 있는 바와 같이 Object가 생성될 때마다 트리거가 동작하게 하여 Partioning하도록 구성되어 있다. 코드에서 수정한 부분은 Document 기준 dt가 연월일까지만 되어 있는데, 여기서 시간까지 포함하도록 수정하였다. (※ 근데 이렇게 해도 실제 쿼리 수행 시 데이터가 중복되는 부분이 있기는 하다)

5. Athena Query 수행

이제 사전 준비가 모두 마무리되었다.

별도 구문 없이 SELECT * 조회 시 아래와 같이 데이터가 추출되며,

(>> SELECT * FROM "sampledb"."aws_config_configuration_snapshot" limit 10;)

파티션은 아래와 같다.

(>> SHOW Partitions sampledb.aws_config_configuration_snapshot;)

또한 아래와 같이 Athena 쿼리 수행이 가능하다.

SELECT DISTINCT ci.awsaccountid,
         ci.resourceId,
         ci.awsRegion,
         ci.tags['name'] as name,         
         json_extract_scalar(ci.configuration, '$.cidrblock') as cidr,
         json_extract(ci.configuration, '$.cidrblockassociationset') as cidr_asso
FROM "sampledb"."aws_config_configuration_snapshot"
CROSS JOIN UNNEST(configurationitems) AS t(ci)
WHERE dt = 'latest'
        AND ci.resourceType = 'AWS::EC2::VPC'
        AND json_extract_scalar(ci.configuration, '$.cidrblock') != '172.31.0.0/16'

- DISTINCT가 없을 경우에는 중복제거 처리가 되지 않으므로 꼭 붙여주어야 한다.

- 위 쿼리는 VPC Cidr을 전체 추출하는 쿼리인데, Default VPC(172.31.0.0/16)을 제외하고 추출하도록 하였다.

6. (보충) Account ID 대신 Account Name 표시하기

위의 5번까지 하면 데이터를 추출하는 데에는 문제가 없으나, 한 가지 불편한 부분이 있다면 Account Name(Alias)가 표시되지 않고 Account ID만 표시된다는 점이다. 이를 보완하기 위해, DynamoDB에 Table을 하나 만들고, Lambda에서 매일 이를 업데이트하도록 하였다. (DynamoDB Write, AWS Org Read 권한 필요)

[Lambda - updateAccountIds]

import json
import os
import boto3

table_name = os.environ.get('TableName')

def lambda_handler(event, context):
    dynamodb = boto3.resource('dynamodb')
    
    # Define Organizations Paginator For list accounts
    org = boto3.client("organizations")
    paginator = org.get_paginator('list_accounts')
    response_iterator = paginator.paginate()

    for response in response_iterator:
        for account in response["Accounts"]:
            try:
                response = put_account(account["Id"], account["Name"], dynamodb)
            # Exception Case: If Account Already Exist.
            except dynamodb.meta.client.exceptions.ConditionalCheckFailedException as e:
                print(f"{account['Id']} already exist...")
                continue
            else:
                print(f"{account['Id']} create done.")
    
    return {
        'statusCode': 200,
        'body': json.dumps('Success')
    }

def put_account(account_id: str, account_name: str, dynamodb=None):
    """
    Simple Wrapper Function for put item
    """
    if not dynamodb:
        dynamodb = boto3.resource('dynamodb')

    table = dynamodb.Table(table_name)
    response = table.put_item(
       Item={
            'id': account_id,
            'name': account_name
        },
        ConditionExpression="attribute_not_exists(id)"
    )
    return response

DynamoDB 및 Lambda가 구성되었으면, Athena에서 아래와 같이 데이터 원본 연결을 해주어야 한다.

Amazon Athena DynamoDB Connector 생성 창이 아래와 같이 뜨며, spill bucket 및 catalog 이름만 임의로 지정해주면 된다.

함수 생성 후 다시 데이터 원본을 연결해준다. 연결이 정상적으로 완료되면 DynamoDB의 값을 Athena에서 조회할 수 있다.

7. (보충) Account Alias와 함께 Athena Query 수행

account id를 기준으로 Join하여 Account Name을 함께 조회한다.

WITH accounts AS (select * from "accountids"."default"."accountids")


SELECT DISTINCT accounts.name as accountName, ci.awsaccountid, ci.resourceId, ci.awsRegion, ci.tags['name'] as name, 
       json_extract_scalar(ci.configuration, '$.instancetype') as instanceType,
       json_extract_scalar(ci.configuration, '$.privateipaddress') as privateIp,
       ci.tags['natip'] as natIp,
       json_extract_scalar(ci.configuration, '$.platform') as platform,
       json_extract_scalar(ci.configuration, '$.state.name') as state,
       ci.tags['mgdlevel'] as managedLevel,
       ci.tags['mgdbackup'] as backupRetension
FROM "sampledb"."aws_config_configuration_snapshot"
CROSS JOIN UNNEST(configurationitems) AS t(ci)
LEFT JOIN accounts ON accounts.id = ci.awsaccountid
WHERE dt = 'latest'
AND ci.resourceType = 'AWS::EC2::Instance'

※ 참조

1. How to query your AWS resource configuration states using AWS Config and Amazon Athena
2. Visualizing AWS Config data using Amazon Athena and Amazon QuickSight

저작자표시 비영리

'Cloud > AWS' 카테고리의 다른 글

AWS Console에 들어가지 않고 EC2 Instance 기동시키기 (0)	2021.01.26
Zabbix 5.0 Install With Ubuntu 20.04 (0)	2020.11.02
[AWS] AppStream With Prometheus Monitoring (0)	2020.10.19
AWS와 GDPR (General Data Protection Regulation) (0)	2020.09.06

현재글Payer Account에서 Athena로 AWS Config Resource 조회하기

Cloud 및 Devops와 관련된 내용을 간단히 다루는 블로그입니다.

Inspec, PYTHON, Go, monitoring, 비동기논블록, AppStream2.0, Kinesis, Golang, fluentd, iperf3, NW, Docker, CKA, Iometer, terraform, AWS, RabitMQ, CodeBuild, DevOps, Lambda,

Today :
Yesterday :

일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

share clouds