Version:

4B GPU: F16

PyTorch

This version is not quantized and a GPU is recommended.

Install our magic package manager:
```
curl -ssL https://magic.modular.com/ | bash
```
Then run the source command that's printed in your terminal.

Install Max Pipelines in order to run this model.

magic global install max-pipelines && magic global update

Start a local endpoint for c4ai-command-r-plus/4B:

max-pipelines serve --huggingface-repo-id=CohereForAI/c4ai-command-r-plus-4bit

The endpoint is ready when you see the URI printed in your terminal:

Server ready on http://0.0.0.0:8000 (Press CTRL+C to quit)

Now open another terminal to send a request using curl:

curl -N http://0.0.0.0:8000/v1/chat/completions -H "Content-Type: application/json" -d '{
    "model": "CohereForAI/c4ai-command-r-plus-4bit",
    "stream": true,
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who won the World Series in 2020?"}
    ]
}' | grep -o '"content":"[^"]*"' | sed 's/"content":"//g' | sed 's/"//g' | tr -d '
' | sed 's/\n//g'

🎉 Hooray! You’re running Generative AI. Our goal is to make this as easy as possible.

Deploy this model to cloud

Model Overview

C4AI Command R+ is a multilingual AI model with 104 billion parameters designed for sophisticated tasks such as Retrieval Augmented Generation (RAG) and tool utilization. It supports multiple languages including English, French, Spanish, Italian, German, Brazilian Portuguese, Japanese, Korean, Arabic, and Simplified Chinese. This model excels in reasoning, summarization, and question answering.

Model Details

Input: Text
Output: Generates text
Architecture: Optimized transformer with supervised fine-tuning for human-aligned behavior.
Languages Supported: Primarily 10 languages, with additional support for 13 others including Russian, Polish, and Dutch.
Context length: 128K

Evaluations

Command R+ received a high average score of 74.6 on the Open LLM leaderboard, outperforming many existing state-of-the-art models.

Model	Average	Arc (Challenge)	Hella Swag	MMLU	Truthful QA	Winogrande	GSM8k
C4AI Command R+	74.6	70.99	88.6	75.7	56.3	85.4	70.7
DBRX Instruct	74.5	68.9	89	73.7	66.9	81.8	66.9

Advanced Capabilities

Grounded Generation and RAG

The model is specially trained for grounded generation, with the ability to generate text supported by specific document citations.

Single and Multi-Step Tool Use

Equipped to interact with external tools like APIs and perform multi-step tasks or "Agents" capabilities for complex problem solving.

Code Abilities

Optimized for handling code snippets, providing explanations, and generating code, performing well in low-temperature settings.

Contact and Use

For inquiries, contact info@for.ai. Adhere to CC-BY-NC License with acceptable use policy.

Try the Model

Explore Command R+ capabilities in the playground and dedicated Spaces for experimentation.

Citations

Developed by Cohere and Cohere For AI.

Metadata

architectures.0	CohereForCausalLM
model_type	cohere
quantization_config.load_in_4bit	true
quantization_config.load_in_8bit	false
quantization_config.quant_method	bitsandbytes

Version: 4B GPU F16

This code works on compatible Linux machines.
We are actively working on enabling MAX Serve for MacOS ARM64 as well.

You can quickly deploy c4ai-command-r-plus-4B to an endpoint using our MAX container. It includes the latest version of MAX with GPU support and our Python-based inference server called MAX Serve.

With the following Docker command, you’ll get an OpenAI-compatible endpoint running c4ai-command-r-plus-4B:

docker run --gpus 1 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_HUB_ENABLE_HF_TRANSFER=1" \
    --env "HF_TOKEN=" \
    -p 8000:8000 \
    docker.modular.com/modular/max-openai-api:nightly \
    --huggingface-repo-id CohereForAI/c4ai-command-r-plus-4bit

In order to download the model from Hugging Face, you just need to fill in the HF_TOKEN value with your access token, unless the model is from https://huggingface.co/modularai.

Learn more

For more information about the container image, see the MAX container documentation.

To learn more about how to deploy MAX to the cloud, check out our MAX Serve tutorials.

Point of Contact: Cohere For AI: cohere.for.ai
License: CC-BY-NC, requires also adhering to C4AI's Acceptable Use Policy
Model: c4ai-command-r-plus
Model Size: 104 billion parameters
Context length: 128K

DETAILS

ChatMODEL CLASS

PyTorch

HARDWARE

GPU

QUANTIZATION

F16

ARCHITECTURE

PyTorch

MAX GITHUB

Modular / MAX

MODEL

CohereForAI

CohereForAI/c4ai-command-r-plus-4bit

QUESTIONS ABOUT THIS MODEL?

Resources & support for
running c4ai-command-r-plus-4B

Browse 27+ Tutorials

View Tutorials

Get help using MAX

Modular Forum

Read Documentation

Go to Docs

c4ai-command-r-plus-4B

Model Overview

Model Details

Evaluations

Advanced Capabilities

Grounded Generation and RAG

Single and Multi-Step Tool Use

Code Abilities

Contact and Use

Try the Model

Citations

Metadata

Learn more

Resources & support for running c4ai-command-r-plus-4B

Resources & support for
running c4ai-command-r-plus-4B