dolphin-llama3-8b

MAX Model

1 versions

Dolphin 2.9 is a new model with 8B and 70B sizes by Eric Hartford based on Llama 3 that has a variety of instruction, conversational, and coding skills.

Run this model

  1. Install our magic package manager:

    curl -ssL https://magic.modular.com/ | bash

    Then run the source command that's printed in your terminal.

  2. Install Max Pipelines in order to run this model.

    magic global install max-pipelines
  3. Start a local endpoint for dolphin-llama3/8b:

    max-serve serve --huggingface-repo-id cognitivecomputations/dolphin-2.9-llama3-8b

    The endpoint is ready when you see the URI printed in your terminal:

    Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
  4. Now open another terminal to send a request using curl:

    curl -N http://0.0.0.0:8000/v1/chat/completions -H "Content-Type: application/json" -d '{
        "model": "dolphin-llama3/8b",
        "stream": true,
        "messages": [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Who won the World Series in 2020?"}
        ]
    }' | grep -o '"content":"[^"]*"' | sed 's/"content":"//g' | sed 's/"//g' | tr -d '
    ' | sed 's/\n/
    /g'
  5. šŸŽ‰ Hooray! Youā€™re running Generative AI. Our goal is to make this as easy as possible.

About

šŸ¬ Dolphin 2.9 Llama 3

Dolphin-2.9 excels in instruction-following, conversational tasks, and coding capabilities. It introduces initial agentic abilities and supports function calling. This model is uncensored, with a dataset filtered to remove alignment and bias, making it more compliant and versatile.

The model was curated and trained by Eric Hartford, Lucas Atkins, Fernando Fernandes, and Cognitive Computations.

Sizes

  • dolphin-llama3-8b
  • dolphin-llama3-70b

256K Context Window

Dolphin Llama 3 features a 256k context window, enabling processing of extensive prompts and contexts. Utilizing this requires at least 64GB of memory. To extend the context window programmatically, you can include configuration options to support up to 256,000 tokens.

References

HuggingFace

DETAILS

MODEL CLASS
MAX Model

MAX Models are extremely optimized inference pipelines to run SOTA performance for that model on both CPU and GPU. For many of these models, they are the fastest version of this model in the world.

Browse 18+ MAX Models

MODULAR GITHUB

Modular

CREATED BY

cognitivecomputations

MODEL

cognitivecomputations/dolphin-2.9-llama3-8b

TAGS

autotrain_compatible
axolotl
base_model:finetune:meta-llama/Meta-Llama-3-8B
base_model:meta-llama/Meta-Llama-3-8B
conversational
dataset:HuggingFaceH4/ultrachat_200k
dataset:Locutusque/function-calling-chatml
dataset:abacusai/SystemChat-1.1
dataset:cognitivecomputations/Dolphin-2.9
dataset:cognitivecomputations/dolphin-coder
dataset:cognitivecomputations/samantha-data
dataset:internlm/Agent-FLAN
dataset:m-a-p/CodeFeedback-Filtered-Instruction
dataset:microsoft/orca-math-word-problems-200k
dataset:teknium/OpenHermes-2.5
endpoints_compatible
generated_from_trainer
license:other
llama
region:us
safetensors
text-generation
text-generation-inference
transformers

@ Copyright - Modular Inc - 2024