Execute qualquer Skill no Manus
com um clique

Execute qualquer Skill no Manus com um clique

nvidia-api

NVIDIA API documentation for integrating NVIDIA services. Use for NVIDIA NIM (NVIDIA Inference Microservices), LLM APIs, visual models, multimodal APIs, retrieval APIs, healthcare APIs, and CUDA-X microservices integration.

Executar no Manus

Estrelas0

Forks0

Atualizado3 de novembro de 2025 às 17:58

Fonte

rish2jain

rish2jain/paperresearchagent

Abrir repositório GitHub Ver repositórios do creator

Comando de instalação

Download

Executar no Manus

Útil paraSOC

Desenvolvedores de softwareInformática e Matemática15-1252L4

Explorador de arquivos

5 arquivos

SKILL.md

readonly

Mais deste repositório

mesmo repositório

aws-eks

rish2jain/paperresearchagent

Amazon Elastic Kubernetes Service (EKS) for running Kubernetes on AWS. Use for container orchestration, deploying applications, managing clusters, and Kubernetes workloads on AWS.

2025-11-030

aws-sagemaker

rish2jain/paperresearchagent

Amazon SageMaker for building, training, and deploying machine learning models. Use for SageMaker AI endpoints, model training, inference, MLOps, and AWS machine learning services.

2025-11-030

nvidia-nemo

rish2jain/paperresearchagent

NVIDIA NeMo framework for building and training conversational AI models. Use for NeMo Retriever models, RAG (Retrieval-Augmented Generation), embedding models, enterprise search, and multilingual retrieval systems.

2025-11-030

aws-prescriptive-guidance

rish2jain/paperresearchagent

AWS Prescriptive Guidance for best practices and architectural patterns. Use for AWS architecture recommendations, SageMaker AI endpoints guidance, deployment patterns, and AWS solution architectures.

2025-11-030

nvidia-nim

rish2jain/paperresearchagent

NVIDIA NIM (NVIDIA Inference Microservices) for deploying and managing AI models. Use for NIM microservices, model inference, API integration, and building AI applications with NVIDIA's inference infrastructure.

2025-11-030

name	nvidia-api
description	NVIDIA API documentation for integrating NVIDIA services. Use for NVIDIA NIM (NVIDIA Inference Microservices), LLM APIs, visual models, multimodal APIs, retrieval APIs, healthcare APIs, and CUDA-X microservices integration.

Nvidia-Api Skill

Comprehensive assistance with NVIDIA API development, focusing on NIM (NVIDIA Inference Microservices) and cloud-hosted AI endpoints for building prototype and production applications.

When to Use This Skill

This skill should be triggered when:

Primary Use Cases

LLM Integration: Working with Large Language Models via NVIDIA NIM API (Llama, Mistral, Gemma, etc.)
Chat Completions: Implementing chat interfaces, chatbots, or conversational AI using NVIDIA-hosted models
Code Generation: Using code-specialized models (CodeLlama, StarCoder, Codestral, Granite)
Multimodal AI: Integrating visual design, image understanding, or vision-language models
Retrieval Systems: Building RAG (Retrieval-Augmented Generation) applications with NVIDIA retrieval APIs
Healthcare AI: Implementing medical AI solutions with NVIDIA healthcare-specific APIs
Weather & Simulation: Working with Earth-2 weather prediction APIs

Technical Scenarios

Setting up authentication with NVIDIA API keys
Migrating from OpenAI API to NVIDIA NIM (OpenAI-compatible endpoints)
Choosing between different LLM models for specific tasks
Implementing streaming responses for chat applications
Building production AI applications with NVIDIA cloud endpoints
Prototyping AI features before self-hosting NIMs

Quick Reference

Authentication Setup

Get your API key: Visit NVIDIA API Catalog to obtain your API key.

# Set environment variable
export NVIDIA_API_KEY="nvapi-your-key-here"

# Python - Using environment variable
import os
api_key = os.environ.get("NVIDIA_API_KEY")

Basic Chat Completion (Python)

import requests

url = "https://integrate.api.nvidia.com/v1/chat/completions"
headers = {
    "Authorization": f"Bearer {api_key}",
    "Content-Type": "application/json"
}

payload = {
    "model": "meta/llama3-70b-instruct",
    "messages": [
        {"role": "user", "content": "Explain quantum computing in simple terms"}
    ],
    "max_tokens": 150,
    "temperature": 0.7
}

response = requests.post(url, json=payload, headers=headers)
result = response.json()
print(result["choices"][0]["message"]["content"])

Streaming Chat Response

import requests
import json

url = "https://integrate.api.nvidia.com/v1/chat/completions"
headers = {
    "Authorization": f"Bearer {api_key}",
    "Content-Type": "application/json"
}

payload = {
    "model": "mistralai/mixtral-8x7b-instruct",
    "messages": [{"role": "user", "content": "Write a short story"}],
    "stream": True
}

response = requests.post(url, json=payload, headers=headers, stream=True)

for line in response.iter_lines():
    if line:
        line = line.decode('utf-8')
        if line.startswith('data: '):
            data = line[6:]
            if data != '[DONE]':
                chunk = json.loads(data)
                content = chunk["choices"][0]["delta"].get("content", "")
                print(content, end="", flush=True)

OpenAI-Compatible Integration

from openai import OpenAI

# Drop-in replacement for OpenAI client
client = OpenAI(
    base_url="https://integrate.api.nvidia.com/v1",
    api_key=api_key
)

completion = client.chat.completions.create(
    model="meta/llama3-70b-instruct",
    messages=[{"role": "user", "content": "Hello!"}],
    temperature=0.5,
    max_tokens=100
)

print(completion.choices[0].message.content)

Code Generation Example

# Using code-specialized models
payload = {
    "model": "bigcode/starcoder2-15b",
    "messages": [
        {"role": "system", "content": "You are an expert programmer."},
        {"role": "user", "content": "Write a Python function to calculate Fibonacci numbers"}
    ],
    "temperature": 0.2,  # Lower temperature for code generation
    "max_tokens": 500
}

response = requests.post(url, json=payload, headers=headers)
code = response.json()["choices"][0]["message"]["content"]

Multi-Turn Conversation

conversation = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is machine learning?"},
]

# First response
response1 = requests.post(url, json={"model": "meta/llama3-70b-instruct", "messages": conversation}, headers=headers)
assistant_reply = response1.json()["choices"][0]["message"]["content"]

# Continue conversation
conversation.append({"role": "assistant", "content": assistant_reply})
conversation.append({"role": "user", "content": "Can you give me an example?"})

response2 = requests.post(url, json={"model": "meta/llama3-70b-instruct", "messages": conversation}, headers=headers)

JavaScript/Node.js Example

const axios = require('axios');

const url = 'https://integrate.api.nvidia.com/v1/chat/completions';
const apiKey = process.env.NVIDIA_API_KEY;

async function chatCompletion() {
  const response = await axios.post(url, {
    model: "mistralai/mistral-7b-instruct",
    messages: [
      { role: "user", content: "What are the benefits of renewable energy?" }
    ],
    max_tokens: 200
  }, {
    headers: {
      'Authorization': `Bearer ${apiKey}`,
      'Content-Type': 'application/json'
    }
  });

  console.log(response.data.choices[0].message.content);
}

chatCompletion();

cURL Example

curl https://integrate.api.nvidia.com/v1/chat/completions \
  -H "Authorization: Bearer $NVIDIA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta/llama3-8b-instruct",
    "messages": [
      {"role": "user", "content": "Hello, how are you?"}
    ],
    "max_tokens": 50,
    "temperature": 0.7
  }'

Error Handling

try:
    response = requests.post(url, json=payload, headers=headers)
    response.raise_for_status()  # Raise exception for 4xx/5xx status codes
    result = response.json()
except requests.exceptions.HTTPError as e:
    if response.status_code == 401:
        print("Authentication failed. Check your API key.")
    elif response.status_code == 429:
        print("Rate limit exceeded. Please slow down requests.")
    else:
        print(f"HTTP Error: {e}")
except requests.exceptions.RequestException as e:
    print(f"Request failed: {e}")

Available Model Categories

Large Language Models (LLMs)

Access to top language models for various tasks:

Meta Models

meta/llama3-70b-instruct - High-performance instruction-following
meta/llama3-8b-instruct - Efficient smaller model
meta/codellama-70b - Specialized for code generation

Mistral Models

mistralai/mixtral-8x7b-instruct - High-quality mixture-of-experts
mistralai/mistral-large-2-instruct - Latest large model
mistralai/codestral-22b-instruct-v0.1 - Code generation specialist

Google Models

google/gemma-2-27b-it - Instruction-tuned Gemma
google/codegemma-7b - Code understanding and generation
google/shieldgemma-9b - Safety and content filtering

Other Notable Models

nvidia/llama3-chatqa-1.5-70b - Optimized for Q&A
ibm/granite-34b-code-instruct - Enterprise code model
deepseek-ai/deepseek-r1 - Advanced reasoning model

Code Generation Models

bigcode/starcoder2-15b - Open-source code completion
mistralai/codestral-22b-instruct-v0.1 - Instruction-following for code
google/codegemma-7b - Google's code specialist

Other API Categories

Visual Design - Image generation and visual models
Multimodal - Vision-language models
Retrieval - Embedding and retrieval APIs for RAG
Healthcare - Medical AI specialized models
Weather Prediction - Earth-2 climate simulation

Key Concepts

NVIDIA NIM (NVIDIA Inference Microservices)

Cloud-hosted inference endpoints that provide:

Simple REST API access to leading AI models
OpenAI API compatibility for easy migration
Prototype-friendly with production capabilities
Support for both cloud endpoints and downloadable containers

Endpoint Structure

Base URL: https://integrate.api.nvidia.com
Endpoint: POST /v1/chat/completions

API Compatibility

NVIDIA NIM APIs follow OpenAI's API specification, making it easy to:

Migrate existing OpenAI-based applications
Use OpenAI client libraries with minimal changes
Maintain familiar request/response patterns

Authentication

Uses Bearer token authentication
API keys obtained from NVIDIA API Catalog
Key format: nvapi-*

Response Format

Standard OpenAI-compatible response structure:

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "created": 1234567890,
  "model": "meta/llama3-70b-instruct",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "Response text here"
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 50,
    "total_tokens": 60
  }
}

Reference Files

This skill includes comprehensive documentation in references/:

getting_started.md

Overview of NVIDIA API platform
Getting started guide for cloud endpoints
Authentication and setup instructions
Links to main API categories

other.md

Additional NVIDIA API documentation
Related services and microservices
Integration patterns and best practices

Note: For detailed API specifications, parameter descriptions, and model-specific information, refer to the official documentation at docs.api.nvidia.com

Working with This Skill

For Beginners

Start with authentication: Get your API key from NVIDIA API Catalog
Try simple examples: Use the Quick Reference curl or Python examples above
Test different models: Experiment with various models to understand their strengths
Read the docs: Check references/getting_started.md for foundational concepts

For Intermediate Users

Implement streaming: Add real-time response streaming for better UX
Optimize parameters: Experiment with temperature, max_tokens, and other settings
Build conversations: Maintain context across multiple turns
Handle errors gracefully: Implement robust error handling and retry logic
Choose optimal models: Select models based on task requirements (speed vs accuracy)

For Advanced Users

Production deployment: Move from prototypes to production-ready applications
Batch processing: Implement efficient batch inference patterns
Self-hosted NIMs: Download and deploy container images for local inference
Multi-modal integration: Combine LLM, vision, and retrieval APIs
Performance optimization: Fine-tune requests for latency and cost efficiency
RAG implementation: Build retrieval-augmented generation systems

Navigation Tips

Model selection: Choose based on task complexity, latency needs, and cost
OpenAI migration: Use the OpenAI-compatible client for seamless migration
API documentation: Access detailed specs at docs.api.nvidia.com/nim
Model catalog: Browse available models at build.nvidia.com

Common Use Cases

Chatbots & Conversational AI

Use models like meta/llama3-70b-instruct or mistralai/mixtral-8x7b-instruct for building intelligent conversational interfaces.

Code Assistants

Use specialized models like bigcode/starcoder2-15b, mistralai/codestral-22b-instruct, or ibm/granite-34b-code-instruct.

Question Answering

Use nvidia/llama3-chatqa-1.5-70b optimized specifically for Q&A tasks.

Content Generation

Use creative models like mistralai/mistral-large-2-instruct with higher temperature settings.

RAG Applications

Combine LLM APIs with NVIDIA's Retrieval APIs for knowledge-grounded responses.

Best Practices

API Key Security

Never commit API keys to version control
Use environment variables for key storage
Rotate keys regularly for security
Monitor usage to detect unauthorized access

Parameter Tuning

Temperature: Lower (0.1-0.3) for factual/code, higher (0.7-1.0) for creative
Max tokens: Set appropriate limits to control costs and response length
Top-p: Alternative to temperature for controlling randomness
Streaming: Enable for better user experience in interactive applications

Error Handling

Implement retry logic with exponential backoff
Handle rate limits gracefully (HTTP 429)
Validate responses before using in production
Log errors for debugging and monitoring

Model Selection

Small models (7B-8B): Fast, cost-effective for simple tasks
Medium models (13B-34B): Balance of performance and efficiency
Large models (70B+): Best quality for complex reasoning and generation

Resources

Official Documentation

API Docs: docs.api.nvidia.com
NIM Reference: docs.api.nvidia.com/nim
Model Catalog: build.nvidia.com

Getting Started

Get API key at NVIDIA API Catalog
Browse available models and try them in the playground
Read getting started guides in references/getting_started.md

Community & Support

Check NVIDIA Developer Forums for community support
Review example applications and integration patterns
Explore NVIDIA AI Enterprise documentation for production deployments

Notes

NVIDIA NIM APIs follow OpenAI API specifications for easy integration
Models are cloud-hosted for prototyping; downloadable containers available for production
API is designed for both prototyping and production workloads
Multiple language support: Works with Python, JavaScript, Java, Ruby, PHP, and any HTTP client
Streaming support: Real-time response generation for interactive applications
Select models available as self-hosted NIMs with NVIDIA AI Enterprise entitlement

Updating

To refresh this skill with updated documentation:

Re-run the scraper with the same configuration
The skill will be rebuilt with the latest model information and API updates
Check for new models and API endpoints in the official documentation