What can you do with it?

Use Groq for high-speed AI inference with various open-source models. Ideal for applications requiring fast response times, high throughput, and efficient processing. The command provides access to multiple models with varying capabilities and context lengths for different use cases.

How to use it?

Basic Command Structure

/groq [prompt] [optional-parameters]

Parameters

Required:

  • prompt - Your instructions or questions

Optional:

  • model - Specific model to use (defaults to meta-llama/llama-4-maverick-17b-128e-instruct)

  • system prompt - Override the default system prompt

  • files - File URLs to include in the request (see LLM File Type Support for supported formats)

Response Format

The command returns:

{
  "response": "Model's generated response",
  "format": "Response format (JSON/plaintext/markdown/HTML)",
  "metadata": {
    "model": "Model used",
    "context_length": "Available context length"
  }
}

Examples

Basic Usage

/groq
prompt: Summarize the key benefits of renewable energy

Gets a quick response summarizing renewable energy benefits.

Advanced Usage

/groq
prompt: Solve this complex mathematical problem step by step
model: deepseek-r1-distill-llama-70b
system prompt: You are a mathematics professor

Uses a specific model optimized for complex reasoning with a custom system prompt.

Specific Use Case

/groq
prompt: Analyze this lengthy document and extract key insights
files: document.pdf
model: meta-llama/llama-4-maverick-17b-128e-instruct

Processes long documents using a model with extended context length.

Notes

Available models include meta-llama/llama-4-maverick-17b-128e-instruct (131K context, default), deepseek-r1-distill-llama-70b, llama-3.3-70b-versatile, llama-3.1-8b-instant, mixtral-8x7b-32768, and gemma2-9b-it. Default response format is JSON.

Supported Models

Choose the appropriate Groq model based on your specific needs:

  • meta-llama/llama-4-scout-17b-16e-instruct - General purpose with balanced performance (16,384 tokens, Very Fast)
  • meta-llama/llama-4-maverick-17b-128e-instruct (default) - Long context tasks and document analysis (131,072 tokens, Fast)
  • deepseek-r1-distill-llama-70b - Advanced reasoning and complex problem solving (8,192 tokens, Fast)
  • llama-3.3-70b-versatile - General purpose with high quality output (32,768 tokens, Fast)
  • llama-3.1-70b-versatile - Long context and versatile tasks (131,072 tokens, Fast)
  • llama-3.1-8b-instant - Quick responses with high throughput (131,072 tokens, Very Fast)
  • mixtral-8x7b-32768 - Multi-task capabilities with balanced performance (32,768 tokens, Fast)
  • gemma2-9b-it - Efficient processing for general tasks (8,192 tokens, Very Fast)

The default response format is JSON. Models are optimized for high-speed inference on Groq’s specialized hardware.