top of page

Pythons Project for beginners - Post 24: Word Frequency Counter from a File

ree

Let's build a Word Frequency Counter from a File. This project is a great practical application that combines file handling, string manipulation, and the use of dictionaries to store and count data.

First, here is the explanation of how the project is built.


How the "Word Frequency Counter" is Made


  1. The Core Idea: The goal is to create a program that can read any text file, count the occurrences of every unique word, and then display the results to the user. This is a fundamental task in text analysis.

  2. Getting the Filename: The program will start by asking the user for the name of the text file they want to analyze.

  3. Safe File Handling: The most important part is reading the file without crashing. We'll use a try...except FileNotFoundError block. This allows the program to handle cases where the user enters a filename that doesn't exist, showing a friendly error message instead of an error traceback. We will also use the with open(...)statement, which is the best practice for file handling as it ensures the file is automatically closed.

  4. Reading and Cleaning the Text:

    • Once the file is open, we read its entire content into a single string variable.

    • For an accurate word count, we must clean this text. We'll convert the entire string to lowercase using .lower() so that "The" and "the" are counted as the same word.

    • We also need to remove punctuation. A simple way to do this is to loop through a string of common punctuation marks and replace each one with an empty string, effectively deleting it from our main text.

  5. The Data Structure: A Dictionary for Counting: A dictionary is the perfect tool for counting word frequencies. The keys will be the unique words (strings), and the values will be their counts (integers).

  6. The Counting Logic:

    • First, we split our large, cleaned string into a list of individual words using the .split() method.

    • We initialize an empty dictionary, for example, word_counts = {}.

    • We then loop through our list of words. For each word in the list:

      • We check if the word is already a key in our word_counts dictionary.

      • If it is, we increment its value by one: word_counts[word] += 1.

      • If it's not in the dictionary yet, we add it as a new key with a value of 1: word_counts[word] = 1.

  7. Displaying the Results: After the loop has processed all the words, our dictionary will contain the complete frequency count. We can then loop through the dictionary's items using .items() and print each word along with its corresponding count in a neat, readable format.



Comments


bottom of page