To read multiple parquet files in a folder and write them to a single CSV file using Python, follow these steps:
- Create an empty CSV file: Open a new file called "csv_file.csv" in write mode ('w') using the
open()
function. This file will store the combined data from all the parquet files. - Read the first parquet file and append it to the CSV file: Use the
pandas.read_parquet()
function to read the first parquet file ("par_file1.parquet") and store it in a Pandas DataFrame. Then, use theDataFrame.to_csv()
method to append the DataFrame to the CSV file, specifying theheader=True
parameter to include the column names in the CSV file. - Read the remaining parquet files and append them to the CSV file: Iterate through the remaining parquet files (from "par_file2.parquet" to "par_file100.parquet") using a
for
loop. For each parquet file, read it usingpandas.read_parquet()
, convert it to a DataFrame, and append it to the CSV file usingDataFrame.to_csv()
with theheader=False
parameter to avoid duplicating the column names. - Close the CSV file: After all the parquet files have been processed and appended to the CSV file, close the CSV file using the
close()
method.