Data Scientist

[Math] Perplexity-Language_Model

Implementation/Loss 2021. 9. 21. 09:32

# Perplexity 문장이 나올 확률이 정규화( root n ) 된 수의 역수 => 각 단어들의 조건부 확률의 곱이 결과값이 되어서 문장이 전체 모델에서 나타날 확률을 의미 => 그러한 확률값들의 평균 혹은 정규화 ( root n ) => 따라서 문장이 모델에서 나타날 확률이 높을 수록, 즉 P(w1, w2, ..., wn) 값이 클수록 전체 값이 커지게 되고, 그 역수인 Perplexity 값은 작아지게 된다. # Chain Rule # N-gram style - 예시는 bigram 이다. # 분기 계수(Branching factor) => 평균적으로 다음단어를 선택할 때 10개 정도의 단어를 고민하고 있다는 의미. 아래 예제의 경우 #참조 : https://wikidocs.net/21697

[Math] Entropy

Implementation/Loss 2021. 9. 21. 08:49

- 텍스트 모델에서의 모델 성능기준이 되는 Perplexity 개념을 접근하기 위해서 기초가 되는 Entropy - 엔트로피 공식 - 확률 분포에 따른 엔트로피 공식 계산 예제 -0.5 * np.log2(0.5) - 0.5 * np.log2(0.5) = 1 -0.8 * np.log2(0.8) - 0.2 * np.log2(0.2) = 0.7219280948873623 eps = np.finfo(float).eps -1 * np.log2(1) - eps * np.log2(eps) = 1.1546319456101628e-14

[Leetcode] 1748. Sum of Unique Elements

Algorithm/LeetCode 2021. 8. 22. 17:02

You are given an integer array nums. The unique elements of an array are the elements that appear exactly once in the array. Return the sum of all the unique elements of nums. Example 1: Input: nums = [1,2,3,2] Output: 4 Explanation: The unique elements are [1,3], and the sum is 4. Example 2: Input: nums = [1,1,1,1,1] Output: 0 Explanation: There are no unique elements, and the sum is 0. Example..

[Leetcode] 961. N-Repeated Element in Size 2N Array

Algorithm/LeetCode 2021. 8. 22. 16:35

You are given an integer array nums with the following properties: nums.length == 2 * n. nums contains n + 1 unique elements. Exactly one element of nums is repeated n times. Return the element that is repeated n times. Example 1: Input: nums = [1,2,3,3] Output: 3 Example 2: Input: nums = [2,1,2,5,3,2] Output: 2 Example 3: Input: nums = [5,1,5,2,5,3,5,4] Output: 5 Constraints: 2

ABOUT ME

Data Scientist Data Scientist

티스토리툴바

ABOUT ME

전체 글

티스토리툴바