Scourgify (CS50P)

CSV Data Transformation

Description:

Python script designed for processing CSV data, particularly focusing on names and houses. It reads input CSV files, splits names into first and last, and outputs a reformatted CSV. The script ensures accurate data handling with features like command-line argument checks and error management for file operations.

Objectives:

To read an existing CSV file provided as a command-line argument.
To split each name into a first name and a last name.
To write the processed data into a new CSV file with the specified format.
To handle errors related to command-line arguments and file accessibility.

Detail:

Features and Implementation:

Command-Line Argument Validation: The script starts by checking the command-line arguments to ensure the correct number of arguments is provided and that the input file is a CSV.
File Handling and Error Management: It includes try-except blocks for robust error handling, particularly for cases where the input file might not exist or be unreadable.
Data Processing: The core functionality involves reading the input CSV file using csv.DictReader, which creates an iterable of dictionaries where each row in the CSV corresponds to a dictionary.
Name Splitting and Reformatting: The script splits the ‘name’ field from the input CSV into first and last names. This is a crucial feature for data normalization, especially useful when dealing with datasets where name consistency is important.
Data Transformation: After splitting and reformatting the names, it creates a new dictionary for each row with the fields ‘first’, ‘last’, and ‘house’, aligning with the desired output format.

for row in reader:
    last_name, first_name = row['name'].split(', ')
    transformed_row = {'first': first_name, 'last': last_name, 'house': row['house']}
    transformed_data.append(transformed_row)

Output File Creation: Using csv.DictWriter, the script writes these transformed dictionaries to a new CSV file. This involves writing a header row followed by each transformed row, ensuring the output file maintains a consistent and readable format.

writer = csv.DictWriter(csv_output, fieldnames=fieldnames)
writer.writeheader()
for row in transformed_data:
    writer.writerow({'first': row['first'], 'last': row['last'], 'house': row['house']})

Explore details

Project Status:

Completed

Data sources:

Problem Set 6 from Harvard’s CS50P 2023.

Full Overview

Scourgify (CS50P)

CSV Data Transformation

Description:

Objectives:

Detail:

Features and Implementation:

Project Status:

Data sources:

Big Data analytics

Common solvable problems:

Examples:

Statistical analysis

Common solvable problems:

Examples:

DataViz

Common solvable problems:

Examples: