[Python] re 모듈: 정규표현식으로 문자열 처리하기

Python에서 정규표현식(Regular Expressions)은 강력한 문자열 검색 및 처리 도구입니다. 이를 가능하게 하는 표준 라이브러리가 바로 re 모듈입니다. 이 포스팅에서는 re 모듈의 주요 기능과 활용 사례를 살펴봅니다.

1. 정규표현식이란?

정규표현식은 문자열에서 특정 패턴을 찾고, 추출하거나, 치환하는 데 사용됩니다. 예를 들어, 이메일 주소나 전화번호 같은 패턴을 인식하거나 텍스트 데이터를 전처리할 때 유용합니다.

2. Python의 re 모듈 주요 함수

Python의 re 모듈에는 다양한 함수가 포함되어 있습니다. 자주 사용되는 몇 가지를 소개합니다:

re.match(pattern, string): 문자열의 시작 부분이 패턴과 일치하는지 확인합니다.
re.search(pattern, string): 문자열 내에서 패턴이 일치하는 첫 번째 위치를 찾습니다.
re.findall(pattern, string): 패턴과 일치하는 모든 부분을 리스트로 반환합니다.
re.sub(pattern, repl, string): 패턴과 일치하는 부분을 다른 문자열로 치환합니다.
re.compile(pattern): 패턴을 미리 컴파일하여 여러 번 사용할 때 성능을 최적화합니다.

3. 사용 예제

아래는 re 모듈의 기본적인 사용법을 보여주는 예제입니다:

import re

# 1. 문자열에서 특정 단어 찾기
text = "Python is fun, and learning Python is great!"
pattern = r"Python"

matches = re.findall(pattern, text)
print(f"'{pattern}' found {len(matches)} times: {matches}")

# 2. 텍스트 치환
updated_text = re.sub(r"Python", "Programming", text)
print(updated_text)

# 3. 정규표현식 패턴 컴파일
compiled_pattern = re.compile(r"\b\w{4}\b")  # 4글자 단어
four_letter_words = compiled_pattern.findall(text)
print(f"4-letter words: {four_letter_words}")

4. 활용 사례: 이메일 주소 검증

re 모듈은 입력값이 유효한 이메일 주소인지 검증하는 데도 유용합니다.

import re

# 이메일 검증 함수
def is_valid_email(email):
    pattern = r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"
    return bool(re.match(pattern, email))

# 테스트
emails = ["test@example.com", "invalid-email@", "hello@world.org"]
for email in emails:
    print(f"{email}: {'Valid' if is_valid_email(email) else 'Invalid'}")

test@example.com: Valid
invalid-email@: Invalid
hello@world.org: Valid

5. 결론

Python의 re 모듈은 강력하면서도 유연한 문자열 처리 도구입니다. 텍스트 데이터 전처리, 로그 분석, 데이터 유효성 검사 등 다양한 분야에서 사용될 수 있습니다. 정규표현식의 패턴을 이해하고 re 모듈을 적절히 활용하면 프로그래밍의 생산성을 크게 향상시킬 수 있습니다.

Reference

https://docs.python.org/ko/3/library/re.html

re — Regular expression operations

Source code: Lib/re/ This module provides regular expression matching operations similar to those found in Perl. Both patterns and strings to be searched can be Unicode strings ( str) as well as 8-...

docs.python.org

Only Advance