一键在 Manus 中运行任何 Skill

$pwd:

audit-comet-expression

Name: Audit Comet Expression
Author: apache

// Audit an existing Comet expression for correctness and test coverage. Studies the Spark implementation across versions 3.4.3, 3.5.8, and 4.0.1, reviews the Comet and DataFusion implementations, identifies missing test coverage, and offers to implement additional tests.

在 Manus 中运行

$ git log --oneline --stat

stars:1,194

forks:321

updated:2026年4月28日 19:45

SKILL.md

readonly

related-skills.json

同仓库

bug-triage.md

from "apache/datafusion-comet"

Triage open Comet issues marked `requires-triage` per the project bug triage guide. Applies the recommended priority and area labels, removes `requires-triage`, and files a dated summary issue listing what was done. A human reviews the summary issue and closes it when satisfied.

2026-05-221.2k

wire-datafusion-function.md

from "apache/datafusion-comet"

Use when wiring an existing DataFusion or datafusion-spark function into Comet for a Spark expression. Identifies the right wiring pattern (one-line passthrough, datafusion-spark UDF registration, or custom serde with input massaging / restrictions), applies the Scala serde, registers the UDF in jni_api when needed, and adds SQL file tests. Assumes the function already exists upstream — if not, switch to `implement-comet-expression`.

2026-05-201.2k

implement-comet-expression.md

from "apache/datafusion-comet"

Use when implementing a new Spark expression in DataFusion Comet. Walks through cloning latest Spark master to study the canonical implementation, checking the upstream datafusion-spark crate before writing native code, building the Comet serde and Rust wire-up from the contributor guide, then running audit-comet-expression to drive a test-coverage iteration loop.

2026-05-011.2k

review-comet-pr.md

from "apache/datafusion-comet"

Review a DataFusion Comet pull request for Spark compatibility and implementation correctness. Provides guidance to a reviewer rather than posting comments directly.

2026-04-291.2k

package.json

"author": "apache"

"repository": "apache/datafusion-comet"

打开 GitHub 仓库查看创作者相关仓库

$ install --global

$ download --local

在 Manus 中运行

$ useful --forSOC

软件质量保证分析师与测试员计算机与数学类职业15-1253L4

set -eu -o pipefail for tag in v3.4.3 v3.5.8 v4.0.1; do dir="/tmp/spark-${tag}" if [ ! -d "$dir" ]; then git clone --depth 1 --branch "$tag" https://github.com/apache/spark.git "$dir" fi done

for tag in v3.4.3 v3.5.8 v4.0.1; do dir="/tmp/spark-${tag}" echo "=== $tag ===" find "$dir/sql/catalyst/src/main/scala" -name "*.scala" | \ xargs grep -l "case class $ARGUMENTS\b\|object $ARGUMENTS\b" 2>/dev/null done

for tag in v3.4.3 v3.5.8 v4.0.1; do dir="/tmp/spark-${tag}" echo "=== $tag ===" find "$dir/sql" -name "*.scala" | \ xargs grep -l "case class $ARGUMENTS\b\|object $ARGUMENTS\b" 2>/dev/null done

if [ -n "${DATAFUSION_SRC:-}" ]; then grep -r "$ARGUMENTS" "$DATAFUSION_SRC" --include="*.rs" -l 2>/dev/null | head -10 else # Fall back to cargo registry (may include unrelated crates) grep -r "$ARGUMENTS" ~/.cargo/registry/src/*/datafusion* --include="*.rs" -l 2>/dev/null | head -10 fi

# Find SQL test files for this expression find spark/src/test/resources/sql-tests/expressions/ -name "*.sql" | \ xargs grep -l "$ARGUMENTS" 2>/dev/null # Also check if there's a dedicated file find spark/src/test/resources/sql-tests/expressions/ -name "*$(echo $ARGUMENTS | tr '[:upper:]' '[:lower:]')*"

Dimension

Spark tests it

Comet SQL Test

Comet Scala Test

Gap?

Column reference argument(s)

Literal argument(s)

NULL input

Empty string / empty array / empty map

Array/map with NULL elements

Zero, negative zero, negative values (numeric)

Underflow, overflow

Boundary values (INT_MIN, INT_MAX, Long.MinValue, minimum positive, etc.)

NaN, Infinity, -Infinity, subnormal (float/double)

Multibyte / special UTF-8 (composed vs decomposed, e.g. é U+00E9 vs e + U+0301, non-Latin scripts)

ANSI mode (failOnError=true)

Non-ANSI mode (failOnError=false)

All supported input types

Parquet dictionary encoding (ConfigMatrix)

Cross-version behavior differences

-- Licensed to the Apache Software Foundation (ASF) under one -- or more contributor license agreements. See the NOTICE file -- distributed with this work for additional information -- regarding copyright ownership. The ASF licenses this file -- to you under the Apache License, Version 2.0 (the -- "License"); you may not use this file except in compliance -- with the License. You may obtain a copy of the License at -- -- http://www.apache.org/licenses/LICENSE-2.0 -- -- Unless required by applicable law or agreed to in writing, -- software distributed under the License is distributed on an -- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -- KIND, either express or implied. See the License for the -- specific language governing permissions and limitations -- under the License. -- ConfigMatrix: parquet.enable.dictionary=false,true statement CREATE TABLE test_$ARGUMENTS(...) USING parquet statement INSERT INTO test_$ARGUMENTS VALUES (...), (NULL) -- column argument query SELECT $ARGUMENTS(col) FROM test_$ARGUMENTS -- literal arguments query SELECT $ARGUMENTS('value'), $ARGUMENTS(''), $ARGUMENTS(NULL)

set -eu -o pipefail for tag in v3.4.3 v3.5.8 v4.0.1; do dir="/tmp/spark-${tag}" if [ ! -d "$dir" ]; then git clone --depth 1 --branch "$tag" https://github.com/apache/spark.git "$dir" fi done

for tag in v3.4.3 v3.5.8 v4.0.1; do dir="/tmp/spark-${tag}" echo "=== $tag ===" find "$dir/sql" -name "*.scala" | \ xargs grep -l "case class $ARGUMENTS\b\|object $ARGUMENTS\b" 2>/dev/null done

Dimension

Spark tests it

Comet SQL Test

Comet Scala Test

Gap?

Column reference argument(s)

Literal argument(s)

NULL input

Empty string / empty array / empty map

Array/map with NULL elements

Zero, negative zero, negative values (numeric)

Underflow, overflow

Boundary values (INT_MIN, INT_MAX, Long.MinValue, minimum positive, etc.)

NaN, Infinity, -Infinity, subnormal (float/double)

Multibyte / special UTF-8 (composed vs decomposed, e.g. é U+00E9 vs e + U+0301, non-Latin scripts)

ANSI mode (failOnError=true)

Non-ANSI mode (failOnError=false)

All supported input types

Parquet dictionary encoding (ConfigMatrix)

Cross-version behavior differences

name	audit-comet-expression
description	Audit an existing Comet expression for correctness and test coverage. Studies the Spark implementation across versions 3.4.3, 3.5.8, and 4.0.1, reviews the Comet and DataFusion implementations, identifies missing test coverage, and offers to implement additional tests.
argument-hint	<expression-name>

audit-comet-expression

同仓库更多 Skills

同仓库更多 Skills

Overview

Step 1: Locate the Spark Implementations

Find the expression class in each Spark version

Read the Spark source for each version

Compare across Spark versions

Step 2: Locate the Spark Tests

Step 3: Locate the Comet Implementation

Scala serde

Shims

Rust / DataFusion implementation

Step 4: Locate Existing Comet Tests

Comet SQL Tests

Comet Scala Tests

Step 5: Gap Analysis

Coverage matrix

Implementation gaps

Step 6: Recommendations

High priority

Medium priority

Low priority

Step 7: Offer to Implement Missing Tests

Comet SQL Tests template

Verify the tests pass

Step 8: Update the Expression Support Doc

Output Format

Tone and Style

Overview

Step 1: Locate the Spark Implementations

Find the expression class in each Spark version

Read the Spark source for each version

Compare across Spark versions

Step 2: Locate the Spark Tests

Step 3: Locate the Comet Implementation

Scala serde

Shims

Rust / DataFusion implementation

Step 4: Locate Existing Comet Tests

Comet SQL Tests

Comet Scala Tests

Step 5: Gap Analysis

Coverage matrix

Implementation gaps

Step 6: Recommendations

High priority

Medium priority

Low priority

Step 7: Offer to Implement Missing Tests

Comet SQL Tests template

Verify the tests pass

Step 8: Update the Expression Support Doc

Output Format

Tone and Style