llava-llama3-8b

PyTorch

1 versions

A LLaVA model fine-tuned from Llama 3 Instruct with better scores in several benchmarks.

Run this model

  1. Install our magic package manager:

    curl -ssL https://magic.modular.com/ | bash

    Then run the source command that's printed in your terminal.

  2. Install Max Pipelines in order to run this model.

    magic global install max-pipelines
  3. Start a local endpoint for llava-llama3/8b:

    max-serve serve --huggingface-repo-id xtuner/llava-llama-3-8b-v1_1-gguf

    The endpoint is ready when you see the URI printed in your terminal:

    Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
  4. Now open another terminal to send a request using curl:

    curl -N http://0.0.0.0:8000/v1/chat/completions -H "Content-Type: application/json" -d '{
        "model": "llava-llama3/8b",
        "stream": true,
        "messages": [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Who won the World Series in 2020?"}
        ]
    }' | grep -o '"content":"[^"]*"' | sed 's/"content":"//g' | sed 's/"//g' | tr -d '
    ' | sed 's/\n/
    /g'
  5. 🎉 Hooray! You’re running Generative AI. Our goal is to make this as easy as possible.

About

llava-llama3 is a LLaVA model fine-tuned from Llama 3 Instruct and CLIP-ViT-Large-patch14-336 using datasets like ShareGPT4V-PT and InternVL-SFT. It integrates foundational advancements in both language and vision models to achieve a state-of-the-art blend of multimodal understanding and reasoning. The model leverages Llama 3 as its base, paired with CLIP's visual backbone, enabling it to process and reason about text and visual inputs in a synergistic manner.

This fine-tuning process employs XTuner, a framework designed to optimize model performance on multimodal tasks through advanced training techniques. The inclusion of high-quality datasets such as ShareGPT4V-PT and InternVL-SFT ensures robust performance across diverse scenarios, making it effective for applications requiring advanced vision-language alignment.

References

Hugging Face

GitHub

DETAILS

MODEL CLASS
PyTorch

MODULAR GITHUB

Modular

CREATED BY

xtuner

MODEL

xtuner/llava-llama-3-8b-v1_1-gguf

TAGS

dataset:Lin-Chen/ShareGPT4V
gguf
image-text-to-text
image-to-text
region:us

@ Copyright - Modular Inc - 2024