Creating Parquet Datasets with Squid SDK

· 5 min · SQD Team
TutorialParquetData Analytics
Creating Parquet Datasets with Squid SDK

Overview

This article explains how to leverage the Squid SDK to build Parquet datasets for blockchain data analytics. Parquet is a highly efficient columnar storage file format widely used for big data analytics.

Key Features of Parquet

The format offers several advantages:

  1. Columnar Storage -- Data from the same column is stored together, enabling efficient compression
  2. Compression Support -- Works with Snappy, Gzip, and LZO algorithms
  3. Cross-Platform Compatibility -- Readable across Java, Python, R, and other languages
  4. Python Integration -- Easily converted to Python dataframes for analysis with numpy and related tools

Implementation Steps

Converting a Squid to use S3 buckets and Parquet format requires three main actions:

  1. Import necessary Squid SDK packages for Parquet and S3 operations
  2. Transform the GraphQL schema into a table with appropriate column types
  3. Modify data-saving logic to use Parquet format with batch saving

Data Access and Analysis

Once created, Parquet datasets can be accessed through:

  • AWS SDK for listing and reading S3 objects
  • DuckDB for efficient querying
  • Python notebooks with boto3 for downloading files
  • Visualization libraries like plotly for charts and graphs

The article provides examples of tracking NFT transfers and contract deployments, demonstrating practical applications of this approach for blockchain data analysis workflows.