Notification texts go here Contact Us Buy Now!

Read multiple parquet files in a folder and write to single csv file using python

To read multiple parquet files in a folder and write them to a single CSV file using Python, follow these steps:

  1. Create an empty CSV file: Open a new file called "csv_file.csv" in write mode ('w') using the open() function. This file will store the combined data from all the parquet files.
  2. Read the first parquet file and append it to the CSV file: Use the pandas.read_parquet() function to read the first parquet file ("par_file1.parquet") and store it in a Pandas DataFrame. Then, use the DataFrame.to_csv() method to append the DataFrame to the CSV file, specifying the header=True parameter to include the column names in the CSV file.
  3. Read the remaining parquet files and append them to the CSV file: Iterate through the remaining parquet files (from "par_file2.parquet" to "par_file100.parquet") using a for loop. For each parquet file, read it using pandas.read_parquet(), convert it to a DataFrame, and append it to the CSV file using DataFrame.to_csv() with the header=False parameter to avoid duplicating the column names.
  4. Close the CSV file: After all the parquet files have been processed and appended to the CSV file, close the CSV file using the close() method.
Here's the Python code that implements these steps: ```python import pandas as pd # Create an empty CSV file and write the first parquet file with headers with open('csv_file.csv','w') as csv_file: print('Reading par_file1.parquet') df = pd.read_parquet('par_file1.parquet') df.to_csv(csv_file, index=False) print('par_file1.parquet appended to csv_file.csv\n') csv_file.close() # create your file names and append to an empty list to look for in the current directory files = [] for i in range(2,101): files.append(f'par_file{i}.parquet') # open files and append to csv_file.csv for f in files: print(f'Reading {f}') df = pd.read_parquet(f) with open('csv_file.csv','a') as file: df.to_csv(file, header=False, index=False) print(f'{f} appended to csv_file.csv\n') ``` You can customize the code to match your specific file naming convention and directory structure. Remember to adjust the file paths and file names accordingly.

Post a Comment

Cookie Consent
We serve cookies on this site to analyze traffic, remember your preferences, and optimize your experience.
Oops!
It seems there is something wrong with your internet connection. Please connect to the internet and start browsing again.
AdBlock Detected!
We have detected that you are using adblocking plugin in your browser.
The revenue we earn by the advertisements is used to manage this website, we request you to whitelist our website in your adblocking plugin.
Site is Blocked
Sorry! This site is not available in your country.