Adding Gunning Fog Index and updating documentation

dkhundley · Dec 15, 2024 · 19639a3 · 19639a3
1 parent 5f0186b
commit 19639a3
Show file tree

Hide file tree

Showing 2 changed files with 83 additions and 4 deletions.
diff --git a/docs/wiki/metrics/text/readability_metrics.md b/docs/wiki/metrics/text/readability_metrics.md
@@ -26,6 +26,12 @@ This score may be interpreted using the table below:
 
 Generally speaking, writers should aim for a score of 60 or higher, which indicates that the text is easily understood by most adults. Flesch-Kincaid Reading Ease is widely used in the field of education and is often used to evaluate the readability of textbooks and other educational materials. The Flesch-Kincaid Reading Ease metric is also used by software tools, like Microsoft Word, as a metric for readability analysis.
 
+To optimize the Flesch-Kincaid Reading Ease metric, consider applying the following strategies:
+- Simplify sentence structure. (e.g. Use sentence structure, avoid compound sentences)
+- Simplify vocabulary. (e.g. Avoid jargon, use common words)
+- Use an "active voice."
+- Improve sentence and paragraph flow. (e.g. Group related ideas)
+- Use lists and formatting.
 
 
 ## Flesch-Kincaid Grade Level
@@ -60,3 +66,34 @@ For context, the Flesch-Kincaid Grade Level metric is indeed correlated closely
 | **Target Audience**         | Designed to match text to education levels (e.g., for educators). | Designed to evaluate overall text difficulty for general readers. |
 | **Use Cases**               | Used in education to grade reading material difficulty by age or school grade. | Used in broader contexts (e.g., legal, technical writing) to ensure text is accessible. |
 | **Focus**                   | Focuses on education and school alignment. | Focuses on accessibility and readability for general audiences. |
+
+To optimize the Flesch-Kincaid Grade Level metric, consider applying the following strategies:
+- (See the Flesch-Kincaid Reading Ease metric as there is much overlap)
+- Use concrete and familiar examples.
+- Avoid "passive voice."
+
+
+
+## Gunning Fog Index
+The Gunning Fog Index is a readability metric used to estimate the complexity of English-language text and the level of education required to understand it on a first reading. It was developed by Robert Gunning in 1952 and is widely applied in journalism, business communication, and education to assess whether written material is appropriate for its intended audience.
+
+The formula for the Gunning Fog Index is:
+
+$$
+0.4 \times (\text{average words per sentence} + 100 \times \text{percentage of complex words})
+$$
+
+The following table provides an interpretation of the Gunning Fog Index:
+
+| Fog Index Score | Description                              | Target Audience                     |
+|------------------|------------------------------------------|--------------------------------------|
+| **7-8**         | Easy to read                             | Suitable for middle school students |
+| **9-12**        | Moderately difficult                     | Suitable for high school students   |
+| **13-16**       | Difficult                                | Requires college-level reading      |
+| **17+**         | Very complex                             | Suitable for post-graduate level    |
+
+To optimize the Gunning Fog index, consider applying the following strategies:
+
+- Use shorter sentences.
+- Avoid complex words where simpler alternatives exist.
+- Focus on clarity and brevity.
diff --git a/whetstone/metrics/text/readability_metrics.py b/whetstone/metrics/text/readability_metrics.py
@@ -1,3 +1,4 @@
+import re
 from typing import Union, List
 from whetstone.utils.text.text_parsers import tokenize_sentence, count_syllables
 
@@ -77,6 +78,45 @@ def calculate_flesch_kincaid_grade_level(texts: Union[str, List[str]]) -> List[f
 
 
 
+def calculate_gunning_fog_index(text: str) -> float:
+    '''
+    Calculates the Gunning Fog Index for the given text.
+    
+    Inputs:
+        - text (str): The input text to analyze.
+    
+    Returns:
+        - gunning_fog_index (float): The Gunning Fog Index score
+    '''
+    # Tokenizing sentences (basic method using punctuation)
+    sentences = re.split(r'[.!?]+', text)
+    sentences = [sentence.strip() for sentence in sentences if sentence.strip()]
+
+    # Tokenizing words
+    words = re.findall(r'\b\w+\b', text.lower())
+
+    # Getting the complex words (words with 3+ syllables)
+    complex_words = [word for word in words if count_syllables(word) >= 3]
+
+    # Getting number of sentences, words, and complex words
+    num_sentences = len(sentences)
+    num_words = len(words)
+    num_complex_words = len(complex_words)
+
+    # Avoiding division by zero
+    if num_sentences == 0 or num_words == 0:
+        return 0.0
+
+    # Performing the Gunning Fog Index calculation
+    avg_sentence_length = num_words / num_sentences
+    percent_complex_words = (num_complex_words / num_words) * 100
+    gunning_fog_index = 0.4 * (avg_sentence_length + percent_complex_words)
+
+    return gunning_fog_index
+
+
+
+
 def calculate_all_readability_metrics(texts: Union[str, List[str]]) -> List[dict]:
     '''
     Calculate all readability metrics for one or multiple texts.
@@ -95,14 +135,16 @@ def calculate_all_readability_metrics(texts: Union[str, List[str]]) -> List[dict
     # Calculating all the respective metrics all metrics as lists
     fk_reading_ease_scores = calculate_flesch_kincaid_reading_ease(texts)
     fk_grade_level_scores = calculate_flesch_kincaid_grade_level(texts)
-
+    gunning_fog_scores = [calculate_gunning_fog_index(text) for text in texts]
+
     # Combining into a list of dictionaries
     results = []
-    for re_score, gl_score in zip(fk_reading_ease_scores, fk_grade_level_scores):
+    for re_score, gl_score, gf_score in zip(fk_reading_ease_scores, fk_grade_level_scores, gunning_fog_scores):
         readability_metrics = {
             'flesch_kincaid_reading_ease': re_score,
-            'flesch_kincaid_grade_level': gl_score
+            'flesch_kincaid_grade_level': gl_score,
+            'gunning_fog_index': gf_score
         }
         results.append(readability_metrics)
 
-    return results
+    return results