minicpm-v-8b

PyTorch

1 versions

A series of multimodal LLMs (MLLMs) designed for vision-language understanding.

Run this model

  1. Install our magic package manager:

    curl -ssL https://magic.modular.com/ | bash

    Then run the source command that's printed in your terminal.

  2. Install Max Pipelines in order to run this model.

    magic global install max-pipelines
  3. Start a local endpoint for minicpm-v/8b:

    max-serve serve --huggingface-repo-id openbmb/MiniCPM-V-2_6

    The endpoint is ready when you see the URI printed in your terminal:

    Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
  4. Now open another terminal to send a request using curl:

    curl -N http://0.0.0.0:8000/v1/chat/completions -H "Content-Type: application/json" -d '{
        "model": "minicpm-v/8b",
        "stream": true,
        "messages": [
            {
              "role": "user",
              "content": [
                {
                  "type": "text",
                  "text": "What is in this image?"
                },
                {
                  "type": "image_url",
                  "image_url": {
                    "url": "https://upload.wikimedia.org/wikipedia/commons/1/13/Tunnel_View%2C_Yosemite_Valley%2C_Yosemite_NP_-_Diliff.jpg"
                  }
                }
              ]
            }
        ]
    
    }' | grep -o '"content":"[^"]*"' | sed 's/"content":"//g' | sed 's/"//g' | tr -d '
    ' | sed 's/\n/
    /g'
  5. 🎉 Hooray! You’re running Generative AI. Our goal is to make this as easy as possible.

About

MiniCPM-V 2.6 is the most advanced model in the MiniCPM-V series, built on SigLip-400M and Qwen2-7B with 8B parameters. It demonstrates significant performance improvements over MiniCPM-Llama3-V 2.5, with cutting-edge features for multi-image and video understanding.

🔥 Leading Performance: It achieves a 65.2 average score on OpenCompass, outperforming prominent models like GPT-4o mini, GPT-4V, Gemini 1.5 Pro, and Claude 3.5 Sonnet for single-image understanding.

🖼️ Multi-Image Understanding: The model excels at multi-image reasoning, achieving state-of-the-art performance on benchmarks like Mantis-Eval and Mathverse mv, while showcasing strong in-context learning.

💪 Strong OCR Capability: MiniCPM-V 2.6 processes images up to 1.8M pixels with state-of-the-art OCR accuracy. It surpasses competing models, reduces hallucination rates, and supports multilingual functionality across various languages.

🚀 Superior Efficiency: With efficient token density, only 640 tokens are generated for a 1.8M pixel image—75% fewer than many models. This improves inference speed, latency, and resource usage.

For more, visit GitHub or Hugging Face.

DETAILS

MODEL CLASS
PyTorch

MODULAR GITHUB

Modular

CREATED BY

openbmb

MODEL

openbmb/MiniCPM-V-2_6

TAGS

arxiv:2408.01800
conversational
custom_code
dataset:openbmb/RLAIF-V-Dataset
feature-extraction
image-text-to-text
minicpm-v
minicpmv
multi-image
multilingual
ocr
region:us
safetensors
transformers
video
vision

@ Copyright - Modular Inc - 2024