Scourgify (CS50P)

CSV Data Transformation

Description:

Python script designed for processing CSV data, particularly focusing on names and houses. It reads input CSV files, splits names into first and last, and outputs a reformatted CSV. The script ensures accurate data handling with features like command-line argument checks and error management for file operations.

Objectives:

  • To read an existing CSV file provided as a command-line argument.
  • To split each name into a first name and a last name.
  • To write the processed data into a new CSV file with the specified format.
  • To handle errors related to command-line arguments and file accessibility.

Detail:

Features and Implementation:

  • Command-Line Argument Validation: The script starts by checking the command-line arguments to ensure the correct number of arguments is provided and that the input file is a CSV.
  • File Handling and Error Management: It includes try-except blocks for robust error handling, particularly for cases where the input file might not exist or be unreadable.
  • Data Processing: The core functionality involves reading the input CSV file using csv.DictReader, which creates an iterable of dictionaries where each row in the CSV corresponds to a dictionary.
  • Name Splitting and Reformatting: The script splits the ‘name’ field from the input CSV into first and last names. This is a crucial feature for data normalization, especially useful when dealing with datasets where name consistency is important.
  • Data Transformation: After splitting and reformatting the names, it creates a new dictionary for each row with the fields ‘first’, ‘last’, and ‘house’, aligning with the desired output format.
  • for row in reader:
        last_name, first_name = row['name'].split(', ')
        transformed_row = {'first': first_name, 'last': last_name, 'house': row['house']}
        transformed_data.append(transformed_row)
    
  • Output File Creation: Using csv.DictWriter, the script writes these transformed dictionaries to a new CSV file. This involves writing a header row followed by each transformed row, ensuring the output file maintains a consistent and readable format.
  • writer = csv.DictWriter(csv_output, fieldnames=fieldnames)
    writer.writeheader()
    for row in transformed_data:
        writer.writerow({'first': row['first'], 'last': row['last'], 'house': row['house']})
    
Explore details

Project Status:

Completed

Data sources:

Problem Set 6 from Harvard’s CS50P 2023.

Full Overview