Description:
Python script designed for processing CSV data, particularly focusing on names and houses. It reads input CSV files, splits names into first and last, and outputs a reformatted CSV. The script ensures accurate data handling with features like command-line argument checks and error management for file operations.
Objectives:
- To read an existing CSV file provided as a command-line argument.
- To split each name into a first name and a last name.
- To write the processed data into a new CSV file with the specified format.
- To handle errors related to command-line arguments and file accessibility.
Detail:
Features and Implementation:
- Command-Line Argument Validation: The script starts by checking the command-line arguments to ensure the correct number of arguments is provided and that the input file is a CSV.
- File Handling and Error Management: It includes try-except blocks for robust error handling, particularly for cases where the input file might not exist or be unreadable.
- Data Processing: The core functionality involves reading the input CSV file using
csv.DictReader
, which creates an iterable of dictionaries where each row in the CSV corresponds to a dictionary. - Name Splitting and Reformatting: The script splits the ‘name’ field from the input CSV into first and last names. This is a crucial feature for data normalization, especially useful when dealing with datasets where name consistency is important.
- Data Transformation: After splitting and reformatting the names, it creates a new dictionary for each row with the fields ‘first’, ‘last’, and ‘house’, aligning with the desired output format.
for row in reader:
last_name, first_name = row['name'].split(', ')
transformed_row = {'first': first_name, 'last': last_name, 'house': row['house']}
transformed_data.append(transformed_row)
csv.DictWriter
, the script writes these transformed dictionaries to a new CSV file. This involves writing a header row followed by each transformed row, ensuring the output file maintains a consistent and readable format.writer = csv.DictWriter(csv_output, fieldnames=fieldnames)
writer.writeheader()
for row in transformed_data:
writer.writerow({'first': row['first'], 'last': row['last'], 'house': row['house']})