ワンクリックでManusで任意のスキルを実行

$pwd:

review-comet-pr

Name: Review Comet Pr
Author: apache

// Review a DataFusion Comet pull request for Spark compatibility and implementation correctness. Provides guidance to a reviewer rather than posting comments directly.

Manusで実行

$ git log --oneline --stat

stars:1,194

forks:321

updated:2026年4月29日 03:08

SKILL.md

readonly

related-skills.json

同じリポジトリ

bug-triage.md

from "apache/datafusion-comet"

Triage open Comet issues marked `requires-triage` per the project bug triage guide. Applies the recommended priority and area labels, removes `requires-triage`, and files a dated summary issue listing what was done. A human reviews the summary issue and closes it when satisfied.

2026-05-221.2k

wire-datafusion-function.md

from "apache/datafusion-comet"

Use when wiring an existing DataFusion or datafusion-spark function into Comet for a Spark expression. Identifies the right wiring pattern (one-line passthrough, datafusion-spark UDF registration, or custom serde with input massaging / restrictions), applies the Scala serde, registers the UDF in jni_api when needed, and adds SQL file tests. Assumes the function already exists upstream — if not, switch to `implement-comet-expression`.

2026-05-201.2k

implement-comet-expression.md

from "apache/datafusion-comet"

Use when implementing a new Spark expression in DataFusion Comet. Walks through cloning latest Spark master to study the canonical implementation, checking the upstream datafusion-spark crate before writing native code, building the Comet serde and Rust wire-up from the contributor guide, then running audit-comet-expression to drive a test-coverage iteration loop.

2026-05-011.2k

audit-comet-expression.md

from "apache/datafusion-comet"

Audit an existing Comet expression for correctness and test coverage. Studies the Spark implementation across versions 3.4.3, 3.5.8, and 4.0.1, reviews the Comet and DataFusion implementations, identifies missing test coverage, and offers to implement additional tests.

2026-04-281.2k

package.json

"author": "apache"

"repository": "apache/datafusion-comet"

GitHub リポジトリを開く Creator のリポジトリを見る

$ install --global

$ download --local

Manusで実行

$ useful --forSOC

ソフトウェア品質保証アナリスト・テスターコンピュータ・数学職15-1253L4

name	review-comet-pr
description	Review a DataFusion Comet pull request for Spark compatibility and implementation correctness. Provides guidance to a reviewer rather than posting comments directly.
argument-hint	<pr-number>

Review Comet PR #$ARGUMENTS

Before You Start

Gather PR Metadata

Fetch the PR details to understand the scope:

gh pr view $ARGUMENTS --repo apache/datafusion-comet --json title,body,author,isDraft,state,files

Review Existing Comments First

Before forming your review:

Read all existing review comments on the PR
Check the conversation tab for any discussion
Avoid duplicating feedback that others have already provided
Build on existing discussions rather than starting new threads on the same topic
If you have no additional concerns beyond what's already discussed, say so
Ignore Copilot reviews - do not reference or build upon comments from GitHub Copilot

# View existing comments on a PR
gh pr view $ARGUMENTS --repo apache/datafusion-comet --comments

Review Workflow

1. Gather Context

Read the changed files and understand the area of the codebase being modified:

# View the diff
gh pr diff $ARGUMENTS --repo apache/datafusion-comet

For expression PRs, check how similar expressions are implemented in the codebase. Look at the serde files in spark/src/main/scala/org/apache/comet/serde/ and Rust implementations in native/spark-expr/src/.

2. Read Spark Source (Expression PRs)

For any PR that adds or modifies an expression, you must read the Spark source code to understand the canonical behavior. This is the authoritative reference for what Comet must match.

Clone or update the Spark repo:

# Clone if not already present (use /tmp to avoid polluting the workspace)
if [ ! -d /tmp/spark ]; then
  git clone --depth 1 https://github.com/apache/spark.git /tmp/spark
fi

Find the expression implementation in Spark:

# Search for the expression class (e.g., for "Conv", "Hex", "Substring")
find /tmp/spark/sql/catalyst/src/main/scala -name "*.scala" | xargs grep -l "case class <ExpressionName>"

Read the Spark implementation carefully. Pay attention to:
- The eval and doGenEval/nullSafeEval methods. These define the exact behavior.
- The inputTypes and dataType fields. These define which types Spark accepts and what it returns.
- Null handling. Does it use nullable = true? Does nullSafeEval handle nulls implicitly?
- Special cases, guards, and require assertions.
- ANSI mode branches (look for SQLConf.get.ansiEnabled or failOnError).

Read the Spark tests for the expression:

# Find test files
find /tmp/spark/sql -name "*.scala" -path "*/test/*" | xargs grep -l "<ExpressionName>"

Compare the Spark behavior against the Comet implementation in the PR. Identify:
- Edge cases tested in Spark but not in the PR
- Data types supported in Spark but not handled in the PR
- Behavioral differences that should be marked Incompatible
Suggest additional tests for any edge cases or type combinations covered in Spark's tests that are missing from the PR's tests.

3. Spark Compatibility Check

This is the most critical aspect of Comet reviews. Comet must produce identical results to Spark.

For expression PRs, verify against the Spark source you read in step 2:

Check edge cases
- Null handling
- Overflow behavior
- Empty input behavior
- Type-specific behavior
Verify all data types are handled
- Does Spark support this type? (Check inputTypes in Spark source)
- Does the PR handle all Spark-supported types?
Check for ANSI mode differences
- Spark behavior may differ between legacy and ANSI modes
- PR should handle both or mark as Incompatible

4. Check Against Implementation Guidelines

Always verify PRs follow the implementation guidelines.

Scala Serde (`spark/src/main/scala/org/apache/comet/serde/`)

Expression class correctly identified
All child expressions converted via exprToProtoInternal
Return type correctly serialized
getSupportLevel reflects true compatibility:
- Compatible() - matches Spark exactly
- Incompatible(Some("reason")) - differs in documented ways
- Unsupported(Some("reason")) - cannot be implemented
Serde in appropriate file (datetime.scala, strings.scala, arithmetic.scala, etc.)

Registration (`QueryPlanSerde.scala`)

Added to correct map (temporal, string, arithmetic, etc.)
No duplicate registrations
Import statement added

Rust Implementation (if applicable)

Location: native/spark-expr/src/

Matches DataFusion and Arrow conventions
Null handling is correct
No panics. Use Result types.
Efficient array operations (avoid row-by-row)

Tests - Prefer Comet SQL Tests

Expression tests should use Comet SQL Tests (CometSqlFileTestSuite) where possible. This framework automatically runs each query through both Spark and Comet and compares results. No Scala code is needed. Only fall back to Comet Scala Tests in CometExpressionSuite when Comet SQL Tests cannot express the test. Examples include complex DataFrame setup, programmatic data generation, or non-expression tests.

Comet SQL Test location: spark/src/test/resources/sql-tests/expressions/<category>/

Categories include: aggregate/, array/, string/, math/, struct/, map/, datetime/, hash/, etc.

Comet SQL Test structure:

-- Create test data
statement
CREATE TABLE test_crc32(col string, a int, b float) USING parquet

statement
INSERT INTO test_crc32 VALUES ('Spark', 10, 1.5), (NULL, NULL, NULL), ('', 0, 0.0)

-- Default mode: verifies native Comet execution + result matches Spark
query
SELECT crc32(col) FROM test_crc32

-- spark_answer_only: compares results without requiring native execution
query spark_answer_only
SELECT crc32(cast(a as string)) FROM test_crc32

-- tolerance: allows numeric variance for floating-point results
query tolerance=0.0001
SELECT cos(v) FROM test_trig

-- expect_fallback: asserts fallback to Spark occurs
query expect_fallback(unsupported expression)
SELECT unsupported_func(v) FROM test_table

-- expect_error: verifies both engines throw matching exceptions
query expect_error(ARITHMETIC_OVERFLOW)
SELECT 2147483647 + 1

-- ignore: skip queries with known bugs (include GitHub issue link)
query ignore(https://github.com/apache/datafusion-comet/issues/NNNN)
SELECT known_buggy_expr(v) FROM test_table

Running Comet SQL Tests:

# All Comet SQL Tests
./mvnw test -Dsuites="org.apache.comet.CometSqlFileTestSuite" -Dtest=none

# Specific test file (substring match)
./mvnw test -Dsuites="org.apache.comet.CometSqlFileTestSuite crc32" -Dtest=none

CRITICAL: Verify all test requirements (regardless of framework):

Basic functionality tested (column data, not just literals)
Null handling tested (SELECT expression(NULL))
Edge cases tested (empty input, overflow, boundary values)
Both literal values and column references tested (they use different code paths)
For timestamp/datetime expressions, timezone handling is tested (e.g., UTC, non-UTC session timezone, timestamps with and without timezone)
One expression per SQL file for easier debugging

If using Comet Scala Tests instead, literal tests MUST disable constant folding:

withSQLConf(SQLConf.OPTIMIZER_EXCLUDED_RULES.key ->
    "org.apache.spark.sql.catalyst.optimizer.ConstantFolding") {
  checkSparkAnswerAndOperator("SELECT func(literal)")
}

5. Performance Review (Expression PRs)

For PRs that add new expressions, performance is not optional. The whole point of Comet is to be faster than Spark. If a new expression is not faster, it may not be worth adding.

Check that the PR includes microbenchmark results. The PR description should contain benchmark numbers comparing Comet vs Spark for the new expression. If benchmark results are missing, flag this as a required addition.
Look for a microbenchmark implementation. Expression benchmarks live in spark/src/test/scala/org/apache/spark/sql/benchmark/. Check whether the PR adds a benchmark for the new expression.
Review the benchmark results if provided:
- Is Comet actually faster than Spark for this expression?
- Are the benchmarks representative? They should test with realistic data sizes, not just trivial inputs.
- Are different data types benchmarked if the expression supports multiple types?
Review the Rust implementation for performance concerns:
- Unnecessary allocations or copies
- Row-by-row processing where batch/array operations are possible
- Redundant type conversions
- Inefficient string handling (e.g., repeated UTF-8 validation)
- Missing use of Arrow compute kernels where they exist
If benchmark results show Comet is slower than Spark, flag this clearly. The PR should explain why the regression is acceptable or include a plan to optimize.

6. Check CI Test Failures

Always check the CI status and summarize any test failures in your review.

# View CI check status
gh pr checks $ARGUMENTS --repo apache/datafusion-comet

# View failed check details
gh pr checks $ARGUMENTS --repo apache/datafusion-comet --failed

7. Documentation Check

Some user-facing docs are auto-generated from the serde. Others are hand-edited. Treat them differently.

Generated by GenerateDocs — do NOT ask the contributor to edit these by hand. CI regenerates and publishes them on every merge to main:

Compatibility guide pages under docs/source/user-guide/latest/compatibility/expressions/ (math.md, datetime.md, array.md, string.md, aggregate.md, struct.md, map.md, misc.md, cast.md)
Configuration reference at docs/source/user-guide/latest/configs.md

For these, check the source instead. Does the new or modified CometExpressionSerde provide accurate getIncompatibleReasons() and getUnsupportedReasons() strings? Each returned string is rendered as a bullet on the corresponding compat page. Common gaps to flag:

Expression marked Incompatible(Some("...")) in getSupportLevel but getIncompatibleReasons() is empty, so the compat page shows it as supported with no caveats.
Unsupported(Some("...")) for specific data types or argument shapes but no getUnsupportedReasons() to surface the limitation to users.
Reason strings drifting from the notes argument passed to Compatible / Incompatible / Unsupported. They do not have to match exactly, but consistency helps users.
Reason strings that are too terse to be useful in user-facing docs (a single word, no context, no link to a tracking issue when behavior is known to differ).

See docs/source/contributor-guide/adding_a_new_expression.md (sections "Documenting Incompatible and Unsupported Reasons") for the contract these methods follow.

Hand-edited — PR should update if relevant:

docs/source/user-guide/latest/expressions.md — the supported-expressions list. New expressions belong here.
Other latest/compatibility/ pages such as floating-point.md, operators.md, regex.md, scans.md.
Top-level user-guide pages such as iceberg.md, installation.md, tuning.md when the PR changes user-visible behavior.

If the PR adds a new expression but does not update expressions.md, flag that. If it touches incompatibility behavior, flag that the serde reasons should reflect the change.

8. Common Comet Review Issues

Incomplete type support: Spark expression supports types not handled in PR
Missing edge cases: Null, overflow, empty string, negative values
Wrong return type: Return type must match Spark exactly
Tests in wrong framework: Expression tests should use Comet SQL Tests (CometSqlFileTestSuite) rather than adding to Comet Scala Tests like CometExpressionSuite. Suggest migration if the PR adds Comet Scala Tests for expressions that could use Comet SQL Tests instead.
Stale native code: PR might need ./mvnw install -pl common -DskipTests
Missing getSupportLevel: Edge cases should be marked as Incompatible
Scalar function name collides with a DataFusion built-in: If the PR registers a Spark function whose name is also defined by datafusion-functions (e.g. levenshtein, concat, coalesce, sha2, regexp_replace), check that the serde sets the return type explicitly via scalarFunctionExprToProtoWithReturnType rather than scalarFunctionExprToProto or the bare CometScalarFunction(name) shortcut. Without an explicit return type, the native planner consults DataFusion's UDF registry first for type resolution, and any arity or input-type difference between the Spark and DataFusion versions will fail native execution with Error from DataFusion: Function 'X' expects N arguments but received M. The Comet UDF is only swapped in after DF's signature validation passes. See the "When to set the return type explicitly" section in docs/source/contributor-guide/adding_a_new_expression.md.

Output Format

Present your review as guidance for the reviewer. Structure your output as:

PR Summary - Brief description of what the PR does
CI Status - Summary of CI check results
Findings - Your analysis organized by area (Spark compatibility, implementation, tests, etc.)
Suggested Review Comments - Specific comments the reviewer could leave on the PR, with file and line references where applicable

Review Tone and Style

Write reviews that sound human and conversational. Avoid:

Robotic or formulaic language
Em dashes. Use separate sentences instead.
Semicolons. Use separate sentences instead.

Instead:

Write in flowing paragraphs using simple grammar
Keep sentences short and separate rather than joining them with punctuation
Be kind and constructive, even when raising concerns
Use backticks around any code references (function names, file paths, class names, types, config keys, etc.)
Suggest adding tests rather than stating tests are missing (e.g., "It might be worth adding a test for X" not "Tests are missing for X")
Ask questions about edge cases rather than asserting they aren't handled (e.g., "Does this handle the case where X is null?" not "This doesn't handle null")
Frame concerns as questions or suggestions when possible
Acknowledge what the PR does well before raising concerns

Do Not Post Comments

IMPORTANT: Never post comments or reviews on the PR directly. This skill is for providing guidance to a human reviewer. Present all findings and suggested comments to the user. The user will decide what to post.

review-comet-pr

このリポジトリの他の Skills

このリポジトリの他の Skills

Before You Start

Gather PR Metadata

Review Existing Comments First

Review Workflow

1. Gather Context

2. Read Spark Source (Expression PRs)

3. Spark Compatibility Check

4. Check Against Implementation Guidelines

Scala Serde (spark/src/main/scala/org/apache/comet/serde/)

Registration (QueryPlanSerde.scala)

Rust Implementation (if applicable)

Tests - Prefer Comet SQL Tests

5. Performance Review (Expression PRs)

6. Check CI Test Failures

7. Documentation Check

8. Common Comet Review Issues

Output Format

Review Tone and Style

Do Not Post Comments

Before You Start

Gather PR Metadata

Review Existing Comments First

Review Workflow

1. Gather Context

2. Read Spark Source (Expression PRs)

3. Spark Compatibility Check

4. Check Against Implementation Guidelines

Scala Serde (spark/src/main/scala/org/apache/comet/serde/)

Registration (QueryPlanSerde.scala)

Rust Implementation (if applicable)

Tests - Prefer Comet SQL Tests

5. Performance Review (Expression PRs)

6. Check CI Test Failures

7. Documentation Check

8. Common Comet Review Issues

Output Format

Review Tone and Style

Do Not Post Comments

Scala Serde (`spark/src/main/scala/org/apache/comet/serde/`)

Registration (`QueryPlanSerde.scala`)

Scala Serde (`spark/src/main/scala/org/apache/comet/serde/`)

Registration (`QueryPlanSerde.scala`)