Run any Skill in Manus with one click

$pwd:

wire-datafusion-function

Name: Wire Datafusion Function
Author: apache

// Use when wiring an existing DataFusion or datafusion-spark function into Comet for a Spark expression. Identifies the right wiring pattern (one-line passthrough, datafusion-spark UDF registration, or custom serde with input massaging / restrictions), applies the Scala serde, registers the UDF in jni_api when needed, and adds SQL file tests. Assumes the function already exists upstream — if not, switch to `implement-comet-expression`.

Run Skill in Manus

$ git log --oneline --stat

stars:1,194

forks:321

updated:May 20, 2026 at 14:02

SKILL.md

readonly

related-skills.json

same repository

bug-triage.md

from "apache/datafusion-comet"

Triage open Comet issues marked `requires-triage` per the project bug triage guide. Applies the recommended priority and area labels, removes `requires-triage`, and files a dated summary issue listing what was done. A human reviews the summary issue and closes it when satisfied.

2026-05-221.2k

implement-comet-expression.md

from "apache/datafusion-comet"

Use when implementing a new Spark expression in DataFusion Comet. Walks through cloning latest Spark master to study the canonical implementation, checking the upstream datafusion-spark crate before writing native code, building the Comet serde and Rust wire-up from the contributor guide, then running audit-comet-expression to drive a test-coverage iteration loop.

2026-05-011.2k

review-comet-pr.md

from "apache/datafusion-comet"

Review a DataFusion Comet pull request for Spark compatibility and implementation correctness. Provides guidance to a reviewer rather than posting comments directly.

2026-04-291.2k

audit-comet-expression.md

from "apache/datafusion-comet"

Audit an existing Comet expression for correctness and test coverage. Studies the Spark implementation across versions 3.4.3, 3.5.8, and 4.0.1, reviews the Comet and DataFusion implementations, identifies missing test coverage, and offers to implement additional tests.

2026-04-281.2k

package.json

"author": "apache"

"repository": "apache/datafusion-comet"

View GitHub Repository View Creator Repositories

$ install --global

$ download --local

Run Skill in Manus

$ useful --forSOC

Software DevelopersComputer and Mathematical Occupations15-1252L4

Pattern

When

What to change

A — passthrough

DF built-in (e.g. acos), or datafusion-spark UDF already registered in register_datafusion_spark_function

one line in the right map of QueryPlanSerde.scala: classOf[Foo] -> CometScalarFunction("foo")

B — register + passthrough

datafusion-spark UDF, not yet registered, semantics already match Spark

Pattern A line plus session_ctx.register_udf(ScalarUDF::new_from_impl(SparkFoo::default())); in native/core/src/execution/jni_api.rs::register_datafusion_spark_function

C — custom serde

Inputs need preprocessing (cast, nullIfNegative, +0.0 flip), or you need to set return type / failOnError, restrict input types via getSupportLevel, enforce foldable-only args, or attach getCompatibleNotes/getIncompatibleReasons/getUnsupportedReasons

new CometXxx object in the topic file (math.scala, strings.scala, …); see CometCeil, CometAtan2, CometLog, CometSha2, CometAbs

REPO_ROOT=$(git rev-parse --show-toplevel) # Portable regex — BSD awk on macOS does not support \b. DF_SPARK_VER=$(awk -F'"' '/^datafusion-spark[ =]/ {print $2; exit}' "$REPO_ROOT/native/Cargo.toml") DF_FUNCS_VER=$(awk -F'"' '/^datafusion[ =]/ {print $2; exit}' "$REPO_ROOT/native/Cargo.toml") [ -z "$DF_SPARK_VER" ] || [ -z "$DF_FUNCS_VER" ] && { echo "ERROR: version extraction failed"; exit 1; } EXPR='$ARGUMENTS'

DF_SPARK=$(ls -d ~/.cargo/registry/src/*/datafusion-spark-${DF_SPARK_VER}/ 2>/dev/null | head -1) DF_FUNCS=$(ls -d ~/.cargo/registry/src/*/datafusion-functions-${DF_FUNCS_VER}/ 2>/dev/null | head -1) # If empty, run `cargo fetch` from native/. grep -rin "fn name" "$DF_SPARK/src/function/" 2>/dev/null | grep -i "$EXPR" grep -rin "fn name" "$DF_FUNCS/src/" 2>/dev/null | grep -i "$EXPR"

DF_CLONE=<path from memory> [ -z "$(git -C "$DF_CLONE" tag -l "$DF_SPARK_VER")" ] && echo "WARNING: tag missing — git fetch --tags or use Option A" git -C "$DF_CLONE" grep -in "fn name" "$DF_SPARK_VER" -- 'datafusion/spark/src/function/' | grep -i "$EXPR" git -C "$DF_CLONE" grep -in "fn name" "$DF_FUNCS_VER" -- 'datafusion/functions/src/' | grep -i "$EXPR"

Pattern

When

What to change

A — passthrough

DF built-in (e.g. acos), or datafusion-spark UDF already registered in register_datafusion_spark_function

one line in the right map of QueryPlanSerde.scala: classOf[Foo] -> CometScalarFunction("foo")

B — register + passthrough

datafusion-spark UDF, not yet registered, semantics already match Spark

Pattern A line plus session_ctx.register_udf(ScalarUDF::new_from_impl(SparkFoo::default())); in native/core/src/execution/jni_api.rs::register_datafusion_spark_function

C — custom serde

new CometXxx object in the topic file (math.scala, strings.scala, …); see CometCeil, CometAtan2, CometLog, CometSha2, CometAbs

wire-datafusion-function

Wiring patterns

Workflow

1. Study the Spark contract

2. Find the upstream function (at the pinned version)

2a. Decision gate — confirm the source crate

3. Apply the Scala wiring

4. Register the UDF (Pattern B only)

5. Add SQL file tests

6. Update docs

7. Build, audit, finalize

Wiring patterns

Workflow

1. Study the Spark contract

2. Find the upstream function (at the pinned version)

2a. Decision gate — confirm the source crate

3. Apply the Scala wiring

4. Register the UDF (Pattern B only)

5. Add SQL file tests

6. Update docs

7. Build, audit, finalize

name	wire-datafusion-function
description	Use when wiring an existing DataFusion or datafusion-spark function into Comet for a Spark expression. Identifies the right wiring pattern (one-line passthrough, datafusion-spark UDF registration, or custom serde with input massaging / restrictions), applies the Scala serde, registers the UDF in jni_api when needed, and adds SQL file tests. Assumes the function already exists upstream — if not, switch to `implement-comet-expression`.
argument-hint	<expression-name>

wire-datafusion-function

More from this repository

Wiring patterns

Workflow

1. Study the Spark contract

2. Find the upstream function (at the pinned version)

2a. Decision gate — confirm the source crate

3. Apply the Scala wiring

4. Register the UDF (Pattern B only)

5. Add SQL file tests

6. Update docs

7. Build, audit, finalize

Wiring patterns

Workflow

1. Study the Spark contract

2. Find the upstream function (at the pinned version)

2a. Decision gate — confirm the source crate

3. Apply the Scala wiring

4. Register the UDF (Pattern B only)

5. Add SQL file tests

6. Update docs

7. Build, audit, finalize

More from this repository