ワンクリックでManusで任意のスキルを実行

$pwd:

aqua-deployment

Name: Aqua Deployment
Author: oracle

// Deploy LLM models on OCI using AI Quick Actions (AQUA) - single model, multi-model, stacked (LoRA), with GPU shape selection, vLLM configuration, streaming, and tool calling. Triggered when user wants to deploy, update, or manage model deployments.

Manusで実行

$ git log --oneline --stat

stars:126

forks:65

updated:2026年2月28日 16:13

ファイルエクスプローラー

5 ファイル

SKILL.md

readonly

related-skills.json

同じリポジトリ

aqua-cli.md

from "oracle/accelerated-data-science"

Complete CLI reference for the ADS AQUA command-line interface (ads aqua). Covers all model, deployment, evaluation, and fine-tuning commands with full parameter documentation. Triggered when user asks about CLI commands, wants to run AQUA operations from terminal, or needs command syntax.

2026-02-28126

aqua-evaluation.md

from "oracle/accelerated-data-science"

Evaluate LLM model quality using BERTScore, ROUGE, Perplexity, and Text Readability metrics on OCI AI Quick Actions (AQUA). Covers dataset preparation, evaluation job creation, and report interpretation. Triggered when user wants to evaluate or benchmark a model.

2026-02-28126

aqua-finetuning.md

from "oracle/accelerated-data-science"

Fine-tune LLM models using LoRA on OCI AI Quick Actions (AQUA). Covers dataset preparation (instruction, conversational, multimodal, tokenized formats), hyperparameter tuning, distributed training, and training metrics. Triggered when user wants to fine-tune or customize a model.

2026-02-28126

aqua-metrics.md

from "oracle/accelerated-data-science"

Set up Prometheus and Grafana monitoring for AQUA vLLM model deployments on OCI. Covers the signing proxy, container registry setup, OCI Container Instance deployment, and PromQL dashboards. Triggered when user wants to monitor LLM deployments, view TTFT/latency/throughput metrics, or set up observability for AQUA.

2026-02-28126

aqua-model-lifecycle.md

from "oracle/accelerated-data-science"

Register, list, get, and manage LLM models in OCI AI Quick Actions (AQUA) using the ADS SDK. Triggered when user wants to import models from HuggingFace or Object Storage, browse available models, or manage model catalog entries.

2026-02-28126

aqua-troubleshooting.md

from "oracle/accelerated-data-science"

Diagnose and fix OCI AI Quick Actions (AQUA) issues including deployment failures, OOM errors, authorization problems, capacity issues, container errors, and policy misconfigurations. Triggered when user encounters errors or needs help debugging AQUA workflows.

2026-02-28126

package.json

"author": "oracle"

"repository": "oracle/accelerated-data-science"

GitHub リポジトリを開く Creator のリポジトリを見る

$ install --global

$ download --local

Manusで実行

$ useful --forSOC

ネットワーク・コンピュータシステム管理者コンピュータ・数学職15-1244L4

ソフトウェア開発者L4

name	aqua-deployment
description	Deploy LLM models on OCI using AI Quick Actions (AQUA) - single model, multi-model, stacked (LoRA), with GPU shape selection, vLLM configuration, streaming, and tool calling. Triggered when user wants to deploy, update, or manage model deployments.
user-invocable	true
disable-model-invocation	false

AQUA Model Deployment

Use this skill when the user wants to deploy, manage, or configure LLM model deployments on OCI Data Science using AI Quick Actions.

Deployment Types

Type	Description
Single Model	One model per deployment (most common)
Multi-Model	Multiple LLMs on one instance via LiteLLM routing
Stacked	Base model + multiple LoRA fine-tuned weights sharing inference

Python SDK Usage

Import

from ads.aqua.modeldeployment import AquaDeploymentApp
deployment_app = AquaDeploymentApp()

Create Single Model Deployment

from ads.aqua.modeldeployment.entities import CreateModelDeploymentDetails

details = CreateModelDeploymentDetails(
    model_id="ocid1.datasciencemodel.oc1.iad.xxx",
    instance_shape="VM.GPU.A10.2",
    display_name="llama-3.1-8b-deployment",
    compartment_id="ocid1.compartment.oc1..xxx",
    project_id="ocid1.datascienceproject.oc1.iad.xxx",
    log_group_id="ocid1.loggroup.oc1.iad.xxx",
    log_id="ocid1.log.oc1.iad.xxx",
    env_var={
        "MODEL_DEPLOY_PREDICT_ENDPOINT": "/v1/completions",
        "PARAMS": "--max-model-len 4096",
    },
)
deployment = deployment_app.create(create_deployment_details=details)
print(f"Deployment: {deployment.id} | State: {deployment.state}")

Create with Chat Completions Endpoint

details = CreateModelDeploymentDetails(
    model_id="ocid1.datasciencemodel.oc1.iad.xxx",
    instance_shape="VM.GPU.A10.2",
    display_name="llama-3.1-8b-chat",
    env_var={
        "MODEL_DEPLOY_PREDICT_ENDPOINT": "/v1/chat/completions",
        "PARAMS": "--max-model-len 4096",
    },
)

Create Multi-Model Deployment

from ads.aqua.common.entities import AquaMultiModelRef

details = CreateModelDeploymentDetails(
    models=[
        AquaMultiModelRef(
            model_id="ocid1.datasciencemodel.oc1.iad.model1",
            model_name="llama-3.1-8b",
            gpu_count=1,
        ),
        AquaMultiModelRef(
            model_id="ocid1.datasciencemodel.oc1.iad.model2",
            model_name="mistral-7b",
            gpu_count=1,
        ),
    ],
    instance_shape="VM.GPU.A10.2",
    display_name="multi-model-deployment",
    compartment_id="ocid1.compartment.oc1..xxx",
    project_id="ocid1.datascienceproject.oc1.iad.xxx",
)
deployment = deployment_app.create(create_deployment_details=details)

Create Stacked Deployment (Base + LoRA Fine-Tunes)

from ads.aqua.common.entities import AquaMultiModelRef, LoraModuleSpec

details = CreateModelDeploymentDetails(
    models=[
        AquaMultiModelRef(
            model_id="ocid1.datasciencemodel.oc1.iad.base_model",
            model_name="llama-3.1-8b",
            fine_tune_weights=[
                LoraModuleSpec(
                    model_id="ocid1.datasciencemodel.oc1.iad.ft1",
                    model_name="llama-3.1-8b-customer-support",
                ),
                LoraModuleSpec(
                    model_id="ocid1.datasciencemodel.oc1.iad.ft2",
                    model_name="llama-3.1-8b-summarization",
                ),
            ],
        ),
    ],
    instance_shape="VM.GPU.A10.2",
    display_name="stacked-llama-deployment",
    deployment_type="STACKED",
)
deployment = deployment_app.create(create_deployment_details=details)

List Deployments

deployments = deployment_app.list(compartment_id="ocid1.compartment.oc1..xxx")
for d in deployments:
    print(f"{d.display_name} | {d.state} | {d.endpoint}")

Get Deployment Details

deployment = deployment_app.get(model_deployment_id="ocid1.datasciencemodeldeployment.oc1.iad.xxx")

Get Deployment Config (Recommended Shapes)

config = deployment_app.get_deployment_config(model_id="ocid1.datasciencemodel.oc1.iad.xxx")

List Available Shapes

shapes = deployment_app.list_shapes(compartment_id="ocid1.compartment.oc1..xxx")

Shape Recommendation

recommendation = deployment_app.recommend_shape(model_id="ocid1.datasciencemodel.oc1.iad.xxx")

CLI Usage

Create Deployment

ads aqua deployment create \
  --model_id "ocid1.datasciencemodel.oc1.iad.xxx" \
  --instance_shape "VM.GPU.A10.2" \
  --display_name "llama-3.1-8b-deployment" \
  --compartment_id "ocid1.compartment.oc1..xxx" \
  --project_id "ocid1.datascienceproject.oc1.iad.xxx" \
  --log_group_id "ocid1.loggroup.oc1.iad.xxx" \
  --log_id "ocid1.log.oc1.iad.xxx"

Create Multi-Model Deployment

ads aqua deployment create \
  --models '[{"model_id":"ocid1...model1","model_name":"llama-8b","gpu_count":1},{"model_id":"ocid1...model2","model_name":"mistral-7b","gpu_count":1}]' \
  --instance_shape "VM.GPU.A10.2" \
  --display_name "multi-model"

Create Stacked Deployment

ads aqua deployment create \
  --models '[{"model_id":"ocid1...base","model_name":"llama-8b","fine_tune_weights":[{"model_id":"ocid1...ft1","model_name":"ft-support"}]}]' \
  --instance_shape "VM.GPU.A10.2" \
  --display_name "stacked-deployment" \
  --deployment_type "STACKED"

List / Get

ads aqua deployment list --compartment_id "ocid1.compartment.oc1..xxx"
ads aqua deployment get --model_deployment_id "ocid1.datasciencemodeldeployment.oc1.iad.xxx"

Invoking a Deployed Model

Python SDK (Streaming)

import ads
import oci
import requests

ads.set_auth("resource_principal")
endpoint = "https://modeldeployment.us-ashburn-1.oci.customer-oci.com/ocid1.datasciencemodeldeployment.oc1.iad.xxx"

# Non-streaming
response = requests.post(
    f"{endpoint}/predict",
    json={
        "model": "odsc-llm",
        "prompt": "Write a haiku about clouds",
        "max_tokens": 256,
        "temperature": 0.7,
    },
    auth=oci.auth.signers.get_resource_principals_signer(),
)
print(response.json())

OpenAI-Compatible Client (ADS)

from ads.aqua.client.openai_client import OpenAI

client = OpenAI(
    model_deployment_url="https://modeldeployment.us-ashburn-1.oci.customer-oci.com/ocid1.datasciencemodeldeployment.oc1.iad.xxx",
    auth={"signer": oci.auth.signers.get_resource_principals_signer()},
)
response = client.chat.completions.create(
    model="odsc-llm",
    messages=[{"role": "user", "content": "Hello!"}],
    max_tokens=500,
)
print(response.choices[0].message.content)

GPU Shape Reference

Quick sizing rule: GPU_memory_GB = num_params_billions × 2 for FP16/BF16, plus ~20% for KV cache.

Shape	GPUs	GPU Memory	Fits (FP16)
VM.GPU.A10.1	1	24 GB	≤ 7B
VM.GPU.A10.2	2	48 GB	≤ 13B
BM.GPU.A10.4	4	96 GB	≤ 34B, or 70B quantized
BM.GPU.A100-v2.8	8	640 GB	≤ 70B
BM.GPU.H100.8	8	640 GB	≤ 70B (faster)
BM.GPU.H200.8	8	1128 GB	405B+

For the full shape table, per-model recommendations, multi-model GPU count constraints, and quantization options, see references/shapes.md.

vLLM Configuration Parameters

Set via PARAMS environment variable or --params CLI flag:

Parameter	Description	Example
`--max-model-len`	Maximum context length	`4096`, `8192`, `32768`
`--gpu-memory-utilization`	Fraction of GPU memory for model	`0.9` (default), `0.95`
`--max-num-seqs`	Max concurrent sequences	`256`
`--quantization`	Quantization method	`fp8`, `bitsandbytes`
`--tensor-parallel-size`	Number of GPUs for tensor parallelism	`2`, `4`, `8`
`--trust-remote-code`	Allow custom model code from HF	(no value needed)
`--enable-auto-tool-choice`	Enable function/tool calling	(no value needed)
`--tool-call-parser`	Parser for tool calls	`llama3_json`, `granite`, `hermes`
`--limit-mm-per-prompt`	Limit multimodal inputs	`'{"image": 1}'`
`--task`	Model task override	`embedding`, `transcribe`
`--enforce-eager`	Disable CUDA graphs	(no value needed)

Tool Calling / Function Calling

Enable during deployment:

env_var={
    "MODEL_DEPLOY_PREDICT_ENDPOINT": "/v1/chat/completions",
    "PARAMS": "--enable-auto-tool-choice --tool-call-parser llama3_json --max-model-len 4096",
}

Supported parsers: llama3_json, llama4_json, granite, hermes, mistral, jamba, pythonic, internlm.

Advanced Topics

Topic	Reference
Shape recommender CLI + JSON output	`references/shapes.md` → Shape Recommendation Tool section
LMCache (KV cache persistence for multi-turn)	`references/lmcache.md`
Private endpoints (no public internet)	`references/private-endpoints.md`
Batch inferencing (offline Job-based)	`references/batch-inferencing.md`

Key Source Files

ads/aqua/modeldeployment/deployment.py — AquaDeploymentApp (create, list, get, update)
ads/aqua/modeldeployment/entities.py — CreateModelDeploymentDetails, AquaDeployment
ads/aqua/common/entities.py — AquaMultiModelRef, LoraModuleSpec
ads/aqua/client/openai_client.py — OpenAI-compatible client
ads/aqua/shaperecommend/recommend.py — GPU shape recommendation engine

aqua-deployment

このリポジトリの他の Skills

このリポジトリの他の Skills

AQUA Model Deployment

Deployment Types

Python SDK Usage

Import

Create Single Model Deployment

Create with Chat Completions Endpoint

Create Multi-Model Deployment

Create Stacked Deployment (Base + LoRA Fine-Tunes)

List Deployments

Get Deployment Details

Get Deployment Config (Recommended Shapes)

List Available Shapes

Shape Recommendation

CLI Usage

Create Deployment

Create Multi-Model Deployment

Create Stacked Deployment

List / Get

Invoking a Deployed Model

Python SDK (Streaming)

OpenAI-Compatible Client (ADS)

GPU Shape Reference

vLLM Configuration Parameters

Tool Calling / Function Calling

Advanced Topics

Key Source Files

AQUA Model Deployment

Deployment Types

Python SDK Usage

Import

Create Single Model Deployment

Create with Chat Completions Endpoint

Create Multi-Model Deployment

Create Stacked Deployment (Base + LoRA Fine-Tunes)

List Deployments

Get Deployment Details

Get Deployment Config (Recommended Shapes)

List Available Shapes

Shape Recommendation

CLI Usage

Create Deployment

Create Multi-Model Deployment

Create Stacked Deployment

List / Get

Invoking a Deployed Model

Python SDK (Streaming)

OpenAI-Compatible Client (ADS)

GPU Shape Reference

vLLM Configuration Parameters

Tool Calling / Function Calling

Advanced Topics

Key Source Files