com um clique
nvidia-api
NVIDIA API documentation for integrating NVIDIA services. Use for NVIDIA NIM (NVIDIA Inference Microservices), LLM APIs, visual models, multimodal APIs, retrieval APIs, healthcare APIs, and CUDA-X microservices integration.
Menu
NVIDIA API documentation for integrating NVIDIA services. Use for NVIDIA NIM (NVIDIA Inference Microservices), LLM APIs, visual models, multimodal APIs, retrieval APIs, healthcare APIs, and CUDA-X microservices integration.
Amazon Elastic Kubernetes Service (EKS) for running Kubernetes on AWS. Use for container orchestration, deploying applications, managing clusters, and Kubernetes workloads on AWS.
Amazon SageMaker for building, training, and deploying machine learning models. Use for SageMaker AI endpoints, model training, inference, MLOps, and AWS machine learning services.
NVIDIA NeMo framework for building and training conversational AI models. Use for NeMo Retriever models, RAG (Retrieval-Augmented Generation), embedding models, enterprise search, and multilingual retrieval systems.
AWS Prescriptive Guidance for best practices and architectural patterns. Use for AWS architecture recommendations, SageMaker AI endpoints guidance, deployment patterns, and AWS solution architectures.
NVIDIA NIM (NVIDIA Inference Microservices) for deploying and managing AI models. Use for NIM microservices, model inference, API integration, and building AI applications with NVIDIA's inference infrastructure.
| name | nvidia-api |
| description | NVIDIA API documentation for integrating NVIDIA services. Use for NVIDIA NIM (NVIDIA Inference Microservices), LLM APIs, visual models, multimodal APIs, retrieval APIs, healthcare APIs, and CUDA-X microservices integration. |
Comprehensive assistance with NVIDIA API development, focusing on NIM (NVIDIA Inference Microservices) and cloud-hosted AI endpoints for building prototype and production applications.
This skill should be triggered when:
Get your API key: Visit NVIDIA API Catalog to obtain your API key.
# Set environment variable
export NVIDIA_API_KEY="nvapi-your-key-here"
# Python - Using environment variable
import os
api_key = os.environ.get("NVIDIA_API_KEY")
import requests
url = "https://integrate.api.nvidia.com/v1/chat/completions"
headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
}
payload = {
"model": "meta/llama3-70b-instruct",
"messages": [
{"role": "user", "content": "Explain quantum computing in simple terms"}
],
"max_tokens": 150,
"temperature": 0.7
}
response = requests.post(url, json=payload, headers=headers)
result = response.json()
print(result["choices"][0]["message"]["content"])
import requests
import json
url = "https://integrate.api.nvidia.com/v1/chat/completions"
headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
}
payload = {
"model": "mistralai/mixtral-8x7b-instruct",
"messages": [{"role": "user", "content": "Write a short story"}],
"stream": True
}
response = requests.post(url, json=payload, headers=headers, stream=True)
for line in response.iter_lines():
if line:
line = line.decode('utf-8')
if line.startswith('data: '):
data = line[6:]
if data != '[DONE]':
chunk = json.loads(data)
content = chunk["choices"][0]["delta"].get("content", "")
print(content, end="", flush=True)
from openai import OpenAI
# Drop-in replacement for OpenAI client
client = OpenAI(
base_url="https://integrate.api.nvidia.com/v1",
api_key=api_key
)
completion = client.chat.completions.create(
model="meta/llama3-70b-instruct",
messages=[{"role": "user", "content": "Hello!"}],
temperature=0.5,
max_tokens=100
)
print(completion.choices[0].message.content)
# Using code-specialized models
payload = {
"model": "bigcode/starcoder2-15b",
"messages": [
{"role": "system", "content": "You are an expert programmer."},
{"role": "user", "content": "Write a Python function to calculate Fibonacci numbers"}
],
"temperature": 0.2, # Lower temperature for code generation
"max_tokens": 500
}
response = requests.post(url, json=payload, headers=headers)
code = response.json()["choices"][0]["message"]["content"]
conversation = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is machine learning?"},
]
# First response
response1 = requests.post(url, json={"model": "meta/llama3-70b-instruct", "messages": conversation}, headers=headers)
assistant_reply = response1.json()["choices"][0]["message"]["content"]
# Continue conversation
conversation.append({"role": "assistant", "content": assistant_reply})
conversation.append({"role": "user", "content": "Can you give me an example?"})
response2 = requests.post(url, json={"model": "meta/llama3-70b-instruct", "messages": conversation}, headers=headers)
const axios = require('axios');
const url = 'https://integrate.api.nvidia.com/v1/chat/completions';
const apiKey = process.env.NVIDIA_API_KEY;
async function chatCompletion() {
const response = await axios.post(url, {
model: "mistralai/mistral-7b-instruct",
messages: [
{ role: "user", content: "What are the benefits of renewable energy?" }
],
max_tokens: 200
}, {
headers: {
'Authorization': `Bearer ${apiKey}`,
'Content-Type': 'application/json'
}
});
console.log(response.data.choices[0].message.content);
}
chatCompletion();
curl https://integrate.api.nvidia.com/v1/chat/completions \
-H "Authorization: Bearer $NVIDIA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "meta/llama3-8b-instruct",
"messages": [
{"role": "user", "content": "Hello, how are you?"}
],
"max_tokens": 50,
"temperature": 0.7
}'
try:
response = requests.post(url, json=payload, headers=headers)
response.raise_for_status() # Raise exception for 4xx/5xx status codes
result = response.json()
except requests.exceptions.HTTPError as e:
if response.status_code == 401:
print("Authentication failed. Check your API key.")
elif response.status_code == 429:
print("Rate limit exceeded. Please slow down requests.")
else:
print(f"HTTP Error: {e}")
except requests.exceptions.RequestException as e:
print(f"Request failed: {e}")
Access to top language models for various tasks:
Meta Models
meta/llama3-70b-instruct - High-performance instruction-followingmeta/llama3-8b-instruct - Efficient smaller modelmeta/codellama-70b - Specialized for code generationMistral Models
mistralai/mixtral-8x7b-instruct - High-quality mixture-of-expertsmistralai/mistral-large-2-instruct - Latest large modelmistralai/codestral-22b-instruct-v0.1 - Code generation specialistGoogle Models
google/gemma-2-27b-it - Instruction-tuned Gemmagoogle/codegemma-7b - Code understanding and generationgoogle/shieldgemma-9b - Safety and content filteringOther Notable Models
nvidia/llama3-chatqa-1.5-70b - Optimized for Q&Aibm/granite-34b-code-instruct - Enterprise code modeldeepseek-ai/deepseek-r1 - Advanced reasoning modelbigcode/starcoder2-15b - Open-source code completionmistralai/codestral-22b-instruct-v0.1 - Instruction-following for codegoogle/codegemma-7b - Google's code specialistCloud-hosted inference endpoints that provide:
Base URL: https://integrate.api.nvidia.com
Endpoint: POST /v1/chat/completions
NVIDIA NIM APIs follow OpenAI's API specification, making it easy to:
nvapi-*Standard OpenAI-compatible response structure:
{
"id": "chatcmpl-...",
"object": "chat.completion",
"created": 1234567890,
"model": "meta/llama3-70b-instruct",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "Response text here"
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 50,
"total_tokens": 60
}
}
This skill includes comprehensive documentation in references/:
Note: For detailed API specifications, parameter descriptions, and model-specific information, refer to the official documentation at docs.api.nvidia.com
references/getting_started.md for foundational conceptsUse models like meta/llama3-70b-instruct or mistralai/mixtral-8x7b-instruct for building intelligent conversational interfaces.
Use specialized models like bigcode/starcoder2-15b, mistralai/codestral-22b-instruct, or ibm/granite-34b-code-instruct.
Use nvidia/llama3-chatqa-1.5-70b optimized specifically for Q&A tasks.
Use creative models like mistralai/mistral-large-2-instruct with higher temperature settings.
Combine LLM APIs with NVIDIA's Retrieval APIs for knowledge-grounded responses.
references/getting_started.mdTo refresh this skill with updated documentation: