Use when building Apache Spark applications, distributed data processing pipelines, or optimizing big data workloads. Invoke for DataFrame API, Spark SQL, RDD operations, performance tuning, streaming analytics.
Initial release of spark-engineer skill. - Provides expert support for building and optimizing Apache Spark applications, ETL pipelines, and streaming analytics. - Covers workflows for requirement analysis, pipeline design, implementation, optimization, and validation. - Includes reference guides for DataFrame API, Spark SQL, RDD operations, partitioning, caching, performance tuning, and streaming. - Lists critical best practices and anti-patterns for production Spark workloads. - Supplies structured output templates including code, configurations, partitioning strategies, performance analysis, and monitoring advice.