--- title: Parquet --- ## Overview This code block demonstrates how to query Parquet file(s). ```sql CREATE FOREIGN DATA WRAPPER HANDLER parquet_fdw_handler VALIDATOR parquet_fdw_validator; CREATE SERVER FOREIGN DATA WRAPPER ; CREATE FOREIGN TABLE () SERVER OPTIONS (files ''); ``` ```sql CREATE FOREIGN DATA WRAPPER parquet_wrapper HANDLER parquet_fdw_handler VALIDATOR parquet_fdw_validator; CREATE SERVER parquet_server FOREIGN DATA WRAPPER parquet_wrapper; CREATE FOREIGN TABLE parquet_table () SERVER parquet_server OPTIONS (files 's3://bucket/folder/file.parquet'); ```` Foreign data wrapper name. Can be any string. Foreign server name. Can be any string. Foreign table name. Can be any string. The path of a single Parquet file or [multiple Parquet files](#multiple-parquet-files). For instance, `s3://bucket/folder/file.parquet` if the file is in Amazon S3, `https://domain.tld/file.parquet` if the file is on a HTTP server, or `/path/to/file.parquet` if the file is on the local file system. ## Parquet Options There are a number of options that can be passed into the `CREATE FOREIGN TABLE` statement. These are the same [options](https://duckdb.org/docs/data/parquet/overview#parameters) accepted by DuckDB's `read_parquet` function. ```sql CREATE FOREIGN TABLE parquet_table () SERVER parquet_server OPTIONS ( files 's3://bucket/folder/file.parquet', binary_as_string 'true', hive_partitioning 'true' ); ```` The path of a single Parquet file or [multiple Parquet files](#multiple-parquet-files). For instance, `s3://bucket/folder/file.parquet` if the file is in Amazon S3 or `/path/to/file.parquet` if the file is on the local file system. Parquet files generated by legacy writers do not correctly set the `UTF8` flag for strings, causing string columns to be loaded as `BLOB` instead. Set this to true to load binary columns as strings. Whether or not an extra `filename` column should be included in the result. Whether or not to include the `file_row_number` column. Whether or not to interpret the path as a Hive partitioned path. If `hive_partitioning` is enabled, `hive_types` can be used to specify the logical types of the hive partitions in a struct. ```sql -- Dollar-quoted strings are used to contain single quotes CREATE FOREIGN TABLE parquet_table () SERVER parquet_server OPTIONS ( files 's3://bucket/folder/file.parquet', hive_partitioning 'true', hive_types $${'release': DATE, 'orders': BIGINT}$$ ); ``` hive_types will be autodetected for the following types: `DATE`, `TIMESTAMP` and `BIGINT`. To switch off the autodetection, this option can be set to `0`. ```sql CREATE FOREIGN TABLE parquet_table () SERVER parquet_server OPTIONS ( files 's3://bucket/folder/file.parquet', hive_partitioning 'true', hive_types $${'release': DATE, 'orders': BIGINT}$$, hive_types_autocast '0' ); ``` Whether the columns of multiple schemas should be unified by name, rather than by position. ## Multiple Parquet Files To treat multiple Parquet files as a single table, their paths should be passed in as a comma-separated string. ```sql CREATE FOREIGN TABLE parquet_table () SERVER parquet_server OPTIONS ( files '/path/to/file1.parquet, /path/to/file2.parquet' ); ``` To treat a directory of Parquet files as a single table, the glob pattern should be used. ```sql CREATE FOREIGN TABLE parquet_table () SERVER parquet_server OPTIONS ( files '/folder/*.parquet', ); ``` The glob pattern can also be used to read all Parquet files from multiple directories. ```sql CREATE FOREIGN TABLE parquet_table () SERVER parquet_server OPTIONS ( files '/folder1/*.parquet, /folder2/*.parquet' ); ``` ## Parquet Schema Once a foreign table has been created, `parquet_describe` can return the underlying Parquet file's column names and types. ```sql SELECT * FROM parquet_describe('trips') ``` The `parquet_schema` function returns the internal schema contained within the metadata of a Parquet file. ```sql SELECT * FROM parquet_schema('trips'); ``` ## Cloud Object Stores The [object stores](/integrations/object_stores) documentation explains how to provide secrets and other credentials for Parquet files stored in object stores like S3.