This is an advanced question.
TF-IDF (Term Frequency-Inverse Document Frequency) is a statistical measure used to evaluate the importance of a word in a document relative to a collection of documents (corpus). It combines two metrics: term frequency (TF), which counts how often a word appears in a document, and inverse document frequency (IDF), which measures the rarity of the word across the corpus. A higher TF-IDF score indicates that the word is significant in the document but rare in the corpus. This technique helps identify the most relevant words in a document, enhancing text analysis and information retrieval tasks. TF-IDF is widely used in natural language processing and search engine algorithms.
In this problem, we will calculate TF-IDF using a simplified method.
Multiple lines of documents (sentences), you will not know the number of input.
Example:
Input:
I love cats
You like orange cats and black cats
They don't like animals
'cats' is the most frequently occurring word.
'cats' TF on first document = TF('cats', 1)
The number of 'cat' in document 1 is 1
The total number of words in document 1 is 3
TF('cats', 1) = 1/3 = 0.3333333333333333
TF('cats', 2) = 2/7 = 0.2857142857142857
TF('cats', 3) = 0/4 = 0.0
The total number of document is 3.
The number of document including 'cats' is 2.
IDF('cats') = 3/2 = 1.5
The TFIDF of 'cats' on document 1 = TFIDF('cats', 1)
TFIDF('cats', 1) = 0.3333333333333333 * 1.5 = 0.5
TFIDF('cats', 2) = 0.2857142857142857 * 1.5 = 0.42857142857142855 => 0.43
TFIDF('cats', 3) = 0.0 * 1.5 = 0.0
Print the most frequently occurring word's TFIDF on the number of documents and round to the second decimal place.
The final results should round to the second decimal place.
We recommend using python instead of a calculator to compute the value yourself, since the result might be different due to the floating point problems.
If you want to calculate 0.1+0.2, you can use print(0.1+0.2).