Snowflake Unveils Public Preview of Snowpark Connect for Apache Spark
Snowpark Connect simplifies infrastructure by removing the need for separate Spark environments.

Snowflake has launched the public preview of Snowpark Connect for Spark, a new architecture that allows Apache Spark code to run directly within Snowflake warehouses — eliminating the need to maintain separate Spark clusters.
Built on Apache Spark 3.4’s Spark Connect client-server model, Snowpark Connect simplifies infrastructure by removing the need for separate Spark environments. It supports modern Spark DataFrames, Spark SQL, and user-defined functions (UDFs), offering performance gains and cost savings — Snowflake reports 5.6x faster execution and 41% lower costs on average compared to managed Spark systems.
"With Snowpark Connect, customers can take advantage of the powerful Snowflake vectorized engine for their Spark code while avoiding the complexity of maintaining or tuning separate Spark environments — including managing dependencies, version compatibility and upgrades. You can now run all modern Spark DataFrame, Spark SQL and user-defined function (UDF) code with Snowflake," the company said in a blog post.
Previously, organisations relied on the Spark Connector to process Snowflake data using Spark, a method that required moving data outside the platform, increasing costs, latency, and governance challenges. With Snowpark Connect, Spark workloads can now execute directly in Snowflake, reducing data movement and improving performance while retaining unified governance.
Currently, Snowpark Connect supports Spark 3.5.x in Python environments. Java and Scala support, along with broader Spark APIs like RDD, MLlib, and Streaming, are in development.
The solution is compatible with Apache Iceberg tables and integrates with tools like Snowflake Notebooks, Jupyter, VSCode, Airflow, and Snowpark Submit, enabling organisations to run Spark code without rewrites or infrastructure overhead.