Version:

35B GPU: F16

PyTorch

This version is not quantized and a GPU is recommended.

Install our magic package manager:
```
curl -ssL https://magic.modular.com/ | bash
```
Then run the source command that's printed in your terminal.

Install Max Pipelines in order to run this model.

magic global install max-pipelines && magic global update

Start a local endpoint for c4ai-command-r-08-2024/35B:

max-pipelines serve --huggingface-repo-id=CohereForAI/c4ai-command-r-plus-08-2024

The endpoint is ready when you see the URI printed in your terminal:

Server ready on http://0.0.0.0:8000 (Press CTRL+C to quit)

Now open another terminal to send a request using curl:

curl -N http://0.0.0.0:8000/v1/chat/completions -H "Content-Type: application/json" -d '{
    "model": "CohereForAI/c4ai-command-r-plus-08-2024",
    "stream": true,
    "messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who won the World Series in 2020?"}
    ]
}' | grep -o '"content":"[^"]*"' | sed 's/"content":"//g' | sed 's/"//g' | tr -d '
' | sed 's/\n//g'

🎉 Hooray! You’re running Generative AI. Our goal is to make this as easy as possible.

Deploy this model to cloud

C4AI Command R 08-2024

Model Summary

C4AI Command R 08-2024 is an advanced 32 billion parameter language model released for research purposes. It is designed for a variety of applications such as reasoning, summarization, and question answering. The model supports multilingual text generation, trained across 23 languages, and has been evaluated in 10. Cohere and Cohere For AI developed it for highly effective Retrieval-Augmented Generation (RAG) and grounded generation. The model is available under CC-BY-NC License with adherence to the Acceptable Use Policy.

Key Details

Developed By: Cohere and Cohere For AI
License: CC-BY-NC
Model Size: 32 billion parameters
Context Length: 128K

Model Details

Input: Text only
Output: Text only

Model Architecture

This auto-regressive language model employs a transformer architecture. Post-pretraining, it uses supervised and preference fine-tuning for alignment with human preferences on helpfulness and safety. Grouped query attention (GQA) is used for enhanced inference speed.

Languages Covered

Training encompasses 23 languages including English, French, Spanish, Italian, German, Portuguese, Japanese, Korean, Arabic, Simplified Chinese, Russian, among others. It is evaluated on 10 languages.

Grounded Generation and RAG Capabilities

C4AI Command R is designed for grounded generation, using document snippets for response formulation inclusive of source citations. Grounded summarization and final step RAG operations are optimized through supervised and preference fine-tuning. Prompt adherence enhances performance.

Single-Step Tool Use Capabilities ("Function Calling")

Facilitates interaction with external tools such as APIs. The model selects, executes, and generates responses using tools, including a default directly_answer tool. Learn more in the tool use documentation.

Multi-Step Tool Use Capabilities ("Agents")

Aimed at executing multi-step processes through multiple tools, this function supports agent-based planning and action sequences. Comprehensive guides are available here.

Code Capabilities

C4AI Command R is optimized for code interaction, whether requesting snippets, explanations, or rewrites, but may require low temperature settings for optimal code generation.

Model Card Contact

Errors or questions about this model can be addressed to info@for.ai.

Terms of Use

This model is available under CC-BY-NC with an acceptable use addendum found in C4AI's Acceptable Use Policy.

Try Chat

Explore the Command-R chat feature in the playground.

Cite

To reference this model:

@misc {cohere_for_ai_2024,
	author       = { {Cohere For AI} },
	title        = { c4ai-command-r-08-2024 },
	year         = 2024,
	url          = { https://cohere.com/c4ai-command-r-08-2024 },
	doi          = { 10.57967/hf/3134 },
	publisher    = { Cohere }
}

Metadata

architectures.0	CohereForCausalLM
model_type	cohere

Version: 35B GPU F16

This code works on compatible Linux machines.
We are actively working on enabling MAX Serve for MacOS ARM64 as well.

You can quickly deploy c4ai-command-r-08-2024-35B to an endpoint using our MAX container. It includes the latest version of MAX with GPU support and our Python-based inference server called MAX Serve.

With the following Docker command, you’ll get an OpenAI-compatible endpoint running c4ai-command-r-08-2024-35B:

docker run --gpus 1 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_HUB_ENABLE_HF_TRANSFER=1" \
    --env "HF_TOKEN=" \
    -p 8000:8000 \
    docker.modular.com/modular/max-openai-api:nightly \
    --huggingface-repo-id CohereForAI/c4ai-command-r-plus-08-2024

In order to download the model from Hugging Face, you just need to fill in the HF_TOKEN value with your access token, unless the model is from https://huggingface.co/modularai.

Learn more

For more information about the container image, see the MAX container documentation.

To learn more about how to deploy MAX to the cloud, check out our MAX Serve tutorials.

Point of Contact: Cohere For AI: cohere.for.ai
License: CC-BY-NC, requires also adhering to C4AI's Acceptable Use Policy
Model: c4ai-command-r-plus-08-2024
Model Size: 104 billion parameters
Context length: 128K

DETAILS

ChatMODEL CLASS

PyTorch

HARDWARE

GPU

QUANTIZATION

F16

ARCHITECTURE

PyTorch

MAX GITHUB

Modular / MAX

MODEL

CohereForAI

CohereForAI/c4ai-command-r-plus-08-2024

QUESTIONS ABOUT THIS MODEL?

Resources & support for
running c4ai-command-r-08-2024-35B

Browse 27+ Tutorials

View Tutorials

Get help using MAX

Modular Forum

Read Documentation

Go to Docs

c4ai-command-r-08-2024-35B

C4AI Command R 08-2024

Model Summary

Key Details

Model Details

Model Architecture

Languages Covered

Grounded Generation and RAG Capabilities

Single-Step Tool Use Capabilities ("Function Calling")

Multi-Step Tool Use Capabilities ("Agents")

Code Capabilities

Model Card Contact

Terms of Use

Try Chat

Cite

Metadata

Learn more

Resources & support for running c4ai-command-r-08-2024-35B

Resources & support for
running c4ai-command-r-08-2024-35B