Learn how to use the Groq slash command for fast LLM inference
Use Groq for high-speed AI inference with various open-source models. Ideal for applications requiring fast response times, high throughput, and efficient processing. The command provides access to multiple models with varying capabilities and context lengths for different use cases.
Required:
prompt
- Your instructions or questionsOptional:
model
- Specific model to use (defaults to meta-llama/llama-4-maverick-17b-128e-instruct)
system prompt
- Override the default system prompt
files
- File URLs to include in the request (see LLM File Type Support for supported formats)
The command returns:
Gets a quick response summarizing renewable energy benefits.
Uses a specific model optimized for complex reasoning with a custom system prompt.
Processes long documents using a model with extended context length.
Available models include meta-llama/llama-4-maverick-17b-128e-instruct (131K context, default), deepseek-r1-distill-llama-70b, llama-3.3-70b-versatile, llama-3.1-8b-instant, mixtral-8x7b-32768, and gemma2-9b-it. Default response format is JSON.
Choose the appropriate Groq model based on your specific needs:
meta-llama/llama-4-scout-17b-16e-instruct
- General purpose with balanced performance (16,384 tokens, Very Fast)meta-llama/llama-4-maverick-17b-128e-instruct
(default) - Long context tasks and document analysis (131,072 tokens, Fast)deepseek-r1-distill-llama-70b
- Advanced reasoning and complex problem solving (8,192 tokens, Fast)llama-3.3-70b-versatile
- General purpose with high quality output (32,768 tokens, Fast)llama-3.1-70b-versatile
- Long context and versatile tasks (131,072 tokens, Fast)llama-3.1-8b-instant
- Quick responses with high throughput (131,072 tokens, Very Fast)mixtral-8x7b-32768
- Multi-task capabilities with balanced performance (32,768 tokens, Fast)gemma2-9b-it
- Efficient processing for general tasks (8,192 tokens, Very Fast)The default response format is JSON. Models are optimized for high-speed inference on Groq’s specialized hardware.