Notification texts go here Contact Us Buy Now!

Read multiple parquet files in a folder and write to single csv file using python

Here's a comprehensive solution for reading multiple Parquet files from a folder and writing them into a single CSV file using Python. We'll use the robust Pandas library for this task.

Prerequisites:

  • Ensure you have Pandas installed. You can install it using pip install pandas.

Implementation:

import pandas as pd

# Define the folder path containing the Parquet files
folder_path = 'path/to/folder/'

# Create an empty CSV file to write the data
with open('combined_data.csv', 'w') as csv_file:
  # Iterate through the Parquet files in the folder
  for file in os.listdir(folder_path):
    # Check if the file is a Parquet file
    if file.endswith('.parquet'):
      # Read the Parquet file
      df = pd.read_parquet(os.path.join(folder_path, file))
      
      # Append the data to the CSV file
      df.to_csv(csv_file, mode='a', index=False, header=False)

# Print success message
print("All Parquet files have been combined into a single CSV file.")

Explanation:

  • We define the folder_path variable to specify the directory where the Parquet files reside.
  • We create an empty CSV file named combined_data.csv.
  • We iterate through the files in the specified folder.
  • For each file that ends with .parquet, we read it using pd.read_parquet() and store the data in a Pandas DataFrame df.
  • We append the data from df to the CSV file using df.to_csv(). We set mode='a' to append the data, index=False to exclude the index column, and header=False to prevent writing headers for each DataFrame.
  • After processing all Parquet files, we print a success message.

Output:

This script will create a single CSV file named combined_data.csv in the same directory as your Parquet files. The CSV file will contain all the data from the individual Parquet files, appended together.

Post a Comment

Cookie Consent
We serve cookies on this site to analyze traffic, remember your preferences, and optimize your experience.
Oops!
It seems there is something wrong with your internet connection. Please connect to the internet and start browsing again.
AdBlock Detected!
We have detected that you are using adblocking plugin in your browser.
The revenue we earn by the advertisements is used to manage this website, we request you to whitelist our website in your adblocking plugin.
Site is Blocked
Sorry! This site is not available in your country.