Skip to content

Commit

Permalink
Adding Gunning Fog Index and updating documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
dkhundley committed Dec 15, 2024
1 parent 5f0186b commit 19639a3
Show file tree
Hide file tree
Showing 2 changed files with 83 additions and 4 deletions.
37 changes: 37 additions & 0 deletions docs/wiki/metrics/text/readability_metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,12 @@ This score may be interpreted using the table below:

Generally speaking, writers should aim for a score of 60 or higher, which indicates that the text is easily understood by most adults. Flesch-Kincaid Reading Ease is widely used in the field of education and is often used to evaluate the readability of textbooks and other educational materials. The Flesch-Kincaid Reading Ease metric is also used by software tools, like Microsoft Word, as a metric for readability analysis.

To optimize the Flesch-Kincaid Reading Ease metric, consider applying the following strategies:
- Simplify sentence structure. (e.g. Use sentence structure, avoid compound sentences)
- Simplify vocabulary. (e.g. Avoid jargon, use common words)
- Use an "active voice."
- Improve sentence and paragraph flow. (e.g. Group related ideas)
- Use lists and formatting.


## Flesch-Kincaid Grade Level
Expand Down Expand Up @@ -60,3 +66,34 @@ For context, the Flesch-Kincaid Grade Level metric is indeed correlated closely
| **Target Audience** | Designed to match text to education levels (e.g., for educators). | Designed to evaluate overall text difficulty for general readers. |
| **Use Cases** | Used in education to grade reading material difficulty by age or school grade. | Used in broader contexts (e.g., legal, technical writing) to ensure text is accessible. |
| **Focus** | Focuses on education and school alignment. | Focuses on accessibility and readability for general audiences. |

To optimize the Flesch-Kincaid Grade Level metric, consider applying the following strategies:
- (See the Flesch-Kincaid Reading Ease metric as there is much overlap)
- Use concrete and familiar examples.
- Avoid "passive voice."



## Gunning Fog Index
The Gunning Fog Index is a readability metric used to estimate the complexity of English-language text and the level of education required to understand it on a first reading. It was developed by Robert Gunning in 1952 and is widely applied in journalism, business communication, and education to assess whether written material is appropriate for its intended audience.

The formula for the Gunning Fog Index is:

$$
0.4 \times (\text{average words per sentence} + 100 \times \text{percentage of complex words})
$$

The following table provides an interpretation of the Gunning Fog Index:

| Fog Index Score | Description | Target Audience |
|------------------|------------------------------------------|--------------------------------------|
| **7-8** | Easy to read | Suitable for middle school students |
| **9-12** | Moderately difficult | Suitable for high school students |
| **13-16** | Difficult | Requires college-level reading |
| **17+** | Very complex | Suitable for post-graduate level |

To optimize the Gunning Fog index, consider applying the following strategies:

- Use shorter sentences.
- Avoid complex words where simpler alternatives exist.
- Focus on clarity and brevity.
50 changes: 46 additions & 4 deletions whetstone/metrics/text/readability_metrics.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
import re
from typing import Union, List
from whetstone.utils.text.text_parsers import tokenize_sentence, count_syllables

Expand Down Expand Up @@ -77,6 +78,45 @@ def calculate_flesch_kincaid_grade_level(texts: Union[str, List[str]]) -> List[f



def calculate_gunning_fog_index(text: str) -> float:
'''
Calculates the Gunning Fog Index for the given text.
Inputs:
- text (str): The input text to analyze.
Returns:
- gunning_fog_index (float): The Gunning Fog Index score
'''
# Tokenizing sentences (basic method using punctuation)
sentences = re.split(r'[.!?]+', text)
sentences = [sentence.strip() for sentence in sentences if sentence.strip()]

# Tokenizing words
words = re.findall(r'\b\w+\b', text.lower())

# Getting the complex words (words with 3+ syllables)
complex_words = [word for word in words if count_syllables(word) >= 3]

# Getting number of sentences, words, and complex words
num_sentences = len(sentences)
num_words = len(words)
num_complex_words = len(complex_words)

# Avoiding division by zero
if num_sentences == 0 or num_words == 0:
return 0.0

# Performing the Gunning Fog Index calculation
avg_sentence_length = num_words / num_sentences
percent_complex_words = (num_complex_words / num_words) * 100
gunning_fog_index = 0.4 * (avg_sentence_length + percent_complex_words)

return gunning_fog_index




def calculate_all_readability_metrics(texts: Union[str, List[str]]) -> List[dict]:
'''
Calculate all readability metrics for one or multiple texts.
Expand All @@ -95,14 +135,16 @@ def calculate_all_readability_metrics(texts: Union[str, List[str]]) -> List[dict
# Calculating all the respective metrics all metrics as lists
fk_reading_ease_scores = calculate_flesch_kincaid_reading_ease(texts)
fk_grade_level_scores = calculate_flesch_kincaid_grade_level(texts)

gunning_fog_scores = [calculate_gunning_fog_index(text) for text in texts]

# Combining into a list of dictionaries
results = []
for re_score, gl_score in zip(fk_reading_ease_scores, fk_grade_level_scores):
for re_score, gl_score, gf_score in zip(fk_reading_ease_scores, fk_grade_level_scores, gunning_fog_scores):
readability_metrics = {
'flesch_kincaid_reading_ease': re_score,
'flesch_kincaid_grade_level': gl_score
'flesch_kincaid_grade_level': gl_score,
'gunning_fog_index': gf_score
}
results.append(readability_metrics)

return results
return results

0 comments on commit 19639a3

Please sign in to comment.