14334 - Advanced - TFIDF   

Description

This is an advanced question.

TF-IDF (Term Frequency-Inverse Document Frequency) is a statistical measure used to evaluate the importance of a word in a document relative to a collection of documents (corpus). It combines two metrics: term frequency (TF), which counts how often a word appears in a document, and inverse document frequency (IDF), which measures the rarity of the word across the corpus. A higher TF-IDF score indicates that the word is significant in the document but rare in the corpus. This technique helps identify the most relevant words in a document, enhancing text analysis and information retrieval tasks. TF-IDF is widely used in natural language processing and search engine algorithms.
In this problem, we will calculate TF-IDF using a simplified method.

Input

Multiple lines of documents (sentences), you will not know the number of input.

  1. TF is defined as the number of times a word appears in the document divided by the total number of words in the document.
  2. IDF is defined as the total number of documents divided by the number of documents containing the word.
  3. Print the TF-IDF on each document of the most frequently occurring word of all documents.
  4. Ignore the upper case and the lower case. str.lower() can make the whole string into lower case, str is the string.

Example:


Input:
I love cats
You like orange cats and black cats
They don't like animals

'cats' is the most frequently occurring word.
'cats' TF on first document = TF('cats', 1)
The number of 'cat' in document 1 is 1
The total number of words in document 1 is 3
TF('cats', 1) = 1/3 = 0.3333333333333333
TF('cats', 2) = 2/7 = 0.2857142857142857
TF('cats', 3) = 0/4 = 0.0

The total number of document is 3.
The number of document including 'cats' is 2.
IDF('cats') = 3/2 = 1.5

The TFIDF of 'cats' on document 1 = TFIDF('cats', 1)
TFIDF('cats', 1) = 0.3333333333333333 * 1.5 = 0.5
TFIDF('cats', 2) = 0.2857142857142857 * 1.5 = 0.42857142857142855 => 0.43
TFIDF('cats', 3) = 0.0 * 1.5 = 0.0

Output

Print the most frequently occurring word's TFIDF on the number of documents and round to the second decimal place.

The final results should round to the second decimal place.
We recommend using python instead of a calculator to compute the value yourself, since the result might be different due to the floating point problems.
If you want to calculate 0.1+0.2, you can use print(0.1+0.2).

Sample Input  Download

Sample Output  Download

Tags




Discuss