llava-7b

PyTorch

3 versions

πŸŒ‹ LLaVA is a novel end-to-end trained large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding. Updated to version 1.6.

Run this model

  1. Install our magic package manager:

    curl -ssL https://magic.modular.com/ | bash

    Then run the source command that's printed in your terminal.

  2. Install Max Pipelines in order to run this model.

    magic global install max-pipelines
  3. Start a local endpoint for llava/7b:

    max-serve serve --huggingface-repo-id liuhaotian/llava-v1.5-7b

    The endpoint is ready when you see the URI printed in your terminal:

    Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
  4. Now open another terminal to send a request using curl:

    curl -N http://0.0.0.0:8000/v1/chat/completions -H "Content-Type: application/json" -d '{
        "model": "llava/7b",
        "stream": true,
        "messages": [
            {
              "role": "user",
              "content": [
                {
                  "type": "text",
                  "text": "What is in this image?"
                },
                {
                  "type": "image_url",
                  "image_url": {
                    "url": "https://upload.wikimedia.org/wikipedia/commons/1/13/Tunnel_View%2C_Yosemite_Valley%2C_Yosemite_NP_-_Diliff.jpg"
                  }
                }
              ]
            }
        ]
    
    }' | grep -o '"content":"[^"]*"' | sed 's/"content":"//g' | sed 's/"//g' | tr -d '
    ' | sed 's/\n/
    /g'
  5. πŸŽ‰ Hooray! You’re running Generative AI. Our goal is to make this as easy as possible.

About

πŸŒ‹ LLaVA: Large Language and Vision Assistant

LLaVA is a multimodal model combining a vision encoder and Vicuna for versatile visual and language understanding, delivering advanced conversational abilities akin to multimodal GPT-4.

New in LLaVA 1.6

  • Enhanced image resolution up to 4x more pixels, supporting resolutions like 672x672, 336x1344, 1344x336.
  • Improved visual reasoning and OCR through refined visual instruction tuning data.
  • Expanded visual conversation capabilities for diverse scenarios and applications.
  • Enhanced world knowledge and logical reasoning.

References

Website
GitHub
HuggingFace

DETAILS

MODEL CLASS
PyTorch

MODULAR GITHUB

Modular

CREATED BY

liuhaotian

MODEL

liuhaotian/llava-v1.5-7b

TAGS

autotrain_compatible
image-text-to-text
llava
pytorch
region:us
text-generation
transformers

@ Copyright - Modular Inc - 2024