remote-benchmark

Name: Remote Benchmark
Author: datafusion-contrib

// deploys the code to a remote EC2 cluster with the commands available in the package.json, port-forwards a machine port, and runs benchmarks against it.

Executar no Manus

$ git log --oneline --stat

stars:106

forks:48

updated:24 de fevereiro de 2026 às 15:28

SKILL.md

readonly

related-skills.json

mesmo repositório

ec2-cluster-provision.md

from "datafusion-contrib/datafusion-distributed"

uses the code present in this repository for provision an EC2 cluster for benchmarking purposes

2026-02-24106

package.json

"author": "datafusion-contrib"

"repository": "datafusion-contrib/datafusion-distributed"

Abrir repositório GitHub Ver repositórios do creator

$ install --global

$ download --local

Executar no Manus

$ useful --forSOC

Arquitetos de redes de computadoresInformática e Matemática15-1241L4

# Method 1: AWS SSO profile commands (preferred) # How to get these values: # - aws configure sso # - aws configure list-profiles # - aws configure get region --profile <profile> unset AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY AWS_SESSION_TOKEN AWS_SECURITY_TOKEN AWS_CREDENTIAL_EXPIRATION export AWS_PROFILE=<profile> export AWS_REGION=${AWS_REGION:-us-east-1} export AWS_DEFAULT_REGION="$AWS_REGION" export AWS_SDK_LOAD_CONFIG=1 aws sso login --profile "$AWS_PROFILE" aws sts get-caller-identity --profile "$AWS_PROFILE" --region "$AWS_REGION" # Method 2: Command prefix wrapper (example: aws-vault) # How to get these values: # - same <profile> discovery as method 1 # - aws-vault list # - aws-vault exec <profile> -- aws sts get-caller-identity unset AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY AWS_SESSION_TOKEN AWS_SECURITY_TOKEN AWS_CREDENTIAL_EXPIRATION export AWS_REGION=${AWS_REGION:-us-east-1} export AWS_DEFAULT_REGION="$AWS_REGION" awscmd() { aws-vault exec <profile> -- "$@"; } awscmd aws sts get-caller-identity --region "$AWS_REGION" # Method 3: Explicit environment credentials # How to get these values: # - https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html # - include AWS_SESSION_TOKEN when using temporary credentials unset AWS_PROFILE export AWS_ACCESS_KEY_ID=<access-key-id> export AWS_SECRET_ACCESS_KEY=<secret-access-key> # export AWS_SESSION_TOKEN=<session-token> # when credentials are temporary export AWS_REGION=${AWS_REGION:-us-east-1} export AWS_DEFAULT_REGION="$AWS_REGION" aws sts get-caller-identity --region "$AWS_REGION"

# Method 1: AWS SSO profile commands ACCOUNT_ID=$(aws sts get-caller-identity --profile "$AWS_PROFILE" --query Account --output text) npm run bootstrap -- aws://$ACCOUNT_ID/$AWS_REGION npm run deploy npm run sync-bucket # Method 2: Command prefix wrapper (example: aws-vault) ACCOUNT_ID=$(awscmd aws sts get-caller-identity --region "$AWS_REGION" --query Account --output text) awscmd npm run bootstrap -- aws://$ACCOUNT_ID/$AWS_REGION awscmd npm run deploy awscmd npm run sync-bucket # Method 3: Explicit environment credentials ACCOUNT_ID=$(aws sts get-caller-identity --region "$AWS_REGION" --query Account --output text) npm run bootstrap -- aws://$ACCOUNT_ID/$AWS_REGION npm run deploy npm run sync-bucket

# Method 1: AWS SSO profile commands INSTANCE_ID=$(aws cloudformation describe-stacks \ --stack-name DataFusionDistributedBenchmarks \ --profile "$AWS_PROFILE" \ --region "$AWS_REGION" \ --query "Stacks[0].Outputs[?OutputKey=='WorkerInstanceIds'].OutputValue" \ --output text | cut -d',' -f1) aws ssm start-session --target "$INSTANCE_ID" --profile "$AWS_PROFILE" --region "$AWS_REGION" --document-name AWS-StartPortForwardingSession --parameters "portNumber=9000,localPortNumber=9000" # Method 2: Command prefix wrapper (example: aws-vault) INSTANCE_ID=$(awscmd aws cloudformation describe-stacks \ --stack-name DataFusionDistributedBenchmarks \ --region "$AWS_REGION" \ --query "Stacks[0].Outputs[?OutputKey=='WorkerInstanceIds'].OutputValue" \ --output text | cut -d',' -f1) awscmd aws ssm start-session --target "$INSTANCE_ID" --region "$AWS_REGION" --document-name AWS-StartPortForwardingSession --parameters "portNumber=9000,localPortNumber=9000" # Method 3: Explicit environment credentials INSTANCE_ID=$(aws cloudformation describe-stacks \ --stack-name DataFusionDistributedBenchmarks \ --region "$AWS_REGION" \ --query "Stacks[0].Outputs[?OutputKey=='WorkerInstanceIds'].OutputValue" \ --output text | cut -d',' -f1) aws ssm start-session --target "$INSTANCE_ID" --region "$AWS_REGION" --document-name AWS-StartPortForwardingSession --parameters "portNumber=9000,localPortNumber=9000"

$ npm run datafusion-bench -- --help Usage: datafusion-bench [options] Options: --dataset <string> Dataset to run queries on -i, --iterations <number> Number of iterations (default: "3") --files-per-task <number> Files per task (default: "8") --cardinality-task-sf <number> Cardinality task scale factor (default: "1") --batch-size <number> Standard Batch coalescing size (number of rows) (default: "32768") --shuffle-batch-size <number> Shuffle batch coalescing size (number of rows) (default: "32768") --children-isolator-unions <number> Use children isolator unions (default: "true") --broadcast-joins <boolean> Use broadcast joins (default: "false") --collect-metrics <boolean> Propagates metric collection (default: "true") --compression <string> Compression algo to use within workers (lz4, zstd, none) (default: "lz4") --queries <string> Specific queries to run --debug <boolean> Print the generated plans to stdout --warmup <boolean> Perform a warmup query before the benchmarks (default: "true") -h, --help display help for command

remote-benchmark

analyzing results

analyzing results

name	remote-benchmark
description	deploys the code to a remote EC2 cluster with the commands available in the package.json, port-forwards a machine port, and runs benchmarks against it.

remote-benchmark

Mais deste repositório

analyzing results

analyzing results

Mais deste repositório