Speller (CS50)

High-Speed Spell-Checking

Description:

A spell-checker implementation in C, focusing on high-performance through a hash table for efficient dictionary word retrieval. Prioritizing real-time speed, it demonstrates the importance of algorithmic efficiency and memory management. The project underlines the significance of selecting optimal data structures to ensure fast processing while minimizing resource usage.

Objectives:

  • To implement a high-efficiency spell-checker with an emphasis on speed optimization.
  • To construct and fine-tune a hash table for rapid word retrieval.
  • To understand and apply the balance between time complexity and efficient resource use.
  • To gain hands-on experience with algorithmic performance and effective memory management.

Detail:

Key Features

  1. Hash Table Structure:
    • Utilized a hash table (node *table[N]) with a prime number (N = 29) of buckets to store words. The prime number reduces collisions and evenly distributes words across the hash table.
  2. Custom Hash Function:
    • Implemented a custom hash function (hash(const char *word)) using the djb2 algorithm. This function converts words into a hash value, ensuring efficient placement and retrieval from the hash table.
    • The function handles word case-insensitivity by converting each character to lowercase.
  3. Dictionary Loading and Word Insertion:
    • In the load function, the dictionary is opened, and words are read in. For each word, a new node is dynamically allocated and inserted into the hash table at the position determined by the hash function.
    • The insertion process involves adding the new node at the beginning of the list in the relevant bucket, with n -> next = table[key]; table[key] = n;.
  4. Word Checking:
    • The check function searches for a word in the hash table. It hashes the input word and traverses the linked list at the corresponding index, comparing with existing words using strcasecmp.
  5. Memory Management:
    • The unload function iterates through each bucket in the hash table, freeing all nodes to prevent memory leaks.
    • The memory for each node is dynamically allocated during the loading process and properly deallocated during unloading.
  6. Dictionary Size Retrieval:
    • The size function returns the total number of words loaded into the dictionary, providing a quick reference to the dictionary’s capacity.

Technical aspects

  • Efficiency in Spell-Checking: The hash table, combined with an effective hash function, allows for rapid word retrieval, which is critical for real-time spell-checking applications.
  • Memory Optimization: Dynamic memory allocation for nodes and careful memory management demonstrate an understanding of efficient memory use, crucial in resource-constrained environments.
  • Algorithm Optimization: The choice of the djb2 hashing algorithm and the use of a prime number for the hash table size reflect a thoughtful approach to minimizing collisions and optimizing search time.
Explore details

Project Status:

Completed

Data sources:

Problem Set 4 from Harvard’s CS50x 2023.

Full Overview