자료구조 및 알고리즘/알고리즘

[알고리즘] 2018 KAKAO BLIND RECRUITMENT - [1차] 뉴스 클러스터링

benjykim 2021. 1. 10. 11:56

[알고리즘] 2018 KAKAO BLIND RECRUITMENT - [1차] 뉴스 클러스터링

2018 KAKAO BLIND RECRUITMENT - [1차] 뉴스 클러스터링 문제를 풀며 아름다운 코드를 발견했습니다. 저만 보기 아까워 기록하려합니다.

url = https://programmers.co.kr/learn/courses/30/lessons/17677

내가 생각하기에 깔끔한 코드

import re
from collections import Counter

pattern = re.compile('[a-z][a-z]')

def preprocessing(content):
    result = []
    for i in range(0, len(content)-1):
        two_letter = content[i:i+2]
        if pattern.search(two_letter):
            result.append(two_letter)
    print(result)
    return result

def solution(str1, str2):
    # 두 개의 문자열에 대해서 소문자로 처리
    str1 = str1.lower()
    str2 = str2.lower()

    if str1 == str2: # 일치한다면 유사도는 1
        return 65536

    str1 = preprocessing(str1)
    str2 = preprocessing(str2)

    counter_1 = Counter(str1)
    counter_2 = Counter(str2)

    intersections = counter_1 & counter_2 # 교집합
    intersections = sum(list(intersections.values()))

    unions = counter_1 | counter_2 # 합집합
    unions = sum(list(unions.values()))
    return int(intersections/unions * 65536)

출처: https://jeongchul.tistory.com/661 [Jeongchul]

문제를 풀며 깔끔함에 감탄했다. collections의 Counter가 앞으로도 굉장히 자주 쓰일 것 같아 정리해보기로 했다.

collections.Counter

예제

>>> li = ['apple', 'banana', 'coconut']
>>> new_li = ['apple', 'banana', 'citron']

>>> from collections import Counter
>>> counter = Counter(li)
>>> print(counter)
Counter({'apple': 1, 'banana': 1, 'coconut': 1})

# 기존 결과에 새로운 리스트를 추가한다. (누적됨)
>>> counter.update(new_li)
>>> print(counter)
Counter({'apple': 2, 'banana': 2, 'coconut': 1, 'citron': 1})

# 가장 많이 나타난 순서대로 3개를 출력한다.
>>> print(counter.most_common(n=3))
[('apple', 2), ('banana', 2), ('coconut', 1)]

# 포함되지 않은 요소에 접근하면 0을 리턴한다.
>>> print(counter['durian']) 
0

# elements(): 해당 요소의 개수(a의 경우 4만큼 해당 요소(a)를 리턴하고, 1보다 작은 요소를 만나면 무시한다.
>>> c = Counter(a=4, b=2, c=0, d=-2)
>>> sorted(c.elements())
['a', 'a', 'a', 'a', 'b', 'b']

# subtract([iterable-or-mapping])
>>> c = Counter(a=4, b=2, c=0, d=-2)
>>> d = Counter(a=1, b=2, c=3, d=4)
>>> c.subtract(d)
>>> c
Counter({'a': 3, 'b': 0, 'c': -3, 'd': -6})

>>> c = Counter(a=4, b=2, c=0, d=-2)
>>> e = Counter(a=1, b=2, e=3, f=-1)
>>> c
Counter({'a': 3, 'f': 1, 'b': 0, 'c': 0, 'd': -2, 'e': -3})

합집합, 교집합

>>> c = Counter(a=3, b=1)
>>> d = Counter(a=1, b=2)
>>> c + d                       # add two counters together:  c[x] + d[x]
Counter({'a': 4, 'b': 3})
>>> c - d                       # subtract (keeping only positive counts)
Counter({'a': 2})
>>> c & d                       # intersection:  min(c[x], d[x]) 
Counter({'a': 1, 'b': 1})
>>> c | d                       # union:  max(c[x], d[x])
Counter({'a': 3, 'b': 2})

저작자표시

'자료구조 및 알고리즘 > 알고리즘' 카테고리의 다른 글

[알고리즘] 10진수를 n진수로 변경하기 (0)	2021.01.12
[알고리즘] 2020 KAKAO BLIND RECRUITMENT - 후보키 (0)	2021.01.11
[알고리즘] 2020 카카오 인턴십 - 수식 최대화 (0)	2021.01.07
[알고리즘] 프로그래머스 2017 팁스타운 - 짝지어 제거하기 (0)	2021.01.06
[알고리즘] 프로그래머스 연습문제 - 다음 큰 숫자 (0)	2021.01.06

현재글[알고리즘] 2018 KAKAO BLIND RECRUITMENT - [1차] 뉴스 클러스터링

benjykim

[알고리즘] 2018 KAKAO BLIND RECRUITMENT - [1차] 뉴스 클러스터링

[알고리즘] 2018 KAKAO BLIND RECRUITMENT - [1차] 뉴스 클러스터링

url = https://programmers.co.kr/learn/courses/30/lessons/17677

collections.Counter

'자료구조 및 알고리즘 > 알고리즘' 카테고리의 다른 글

'자료구조 및 알고리즘/알고리즘'의 다른글

티스토리툴바

[알고리즘] 2018 KAKAO BLIND RECRUITMENT - [1차] 뉴스 클러스터링

[알고리즘] 2018 KAKAO BLIND RECRUITMENT - [1차] 뉴스 클러스터링

url = https://programmers.co.kr/learn/courses/30/lessons/17677

collections.Counter

'자료구조 및 알고리즘 > 알고리즘' 카테고리의 다른 글

'자료구조 및 알고리즘/알고리즘'의 다른글

관련글

티스토리툴바