llama2-chinese-7b

PyTorch

2 versions

Llama 2 based model fine tuned to improve Chinese dialogue ability.

Run this model

  1. Install our magic package manager:

    curl -ssL https://magic.modular.com/ | bash

    Then run the source command that's printed in your terminal.

  2. Install Max Pipelines in order to run this model.

    magic global install max-pipelines
  3. Start a local endpoint for llama2-chinese/7b:

    max-serve serve --huggingface-repo-id FlagAlpha/Llama2-Chinese-7b-Chat-LoRA

    The endpoint is ready when you see the URI printed in your terminal:

    Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
  4. Now open another terminal to send a request using curl:

    curl -N http://0.0.0.0:8000/v1/chat/completions -H "Content-Type: application/json" -d '{
        "model": "llama2-chinese/7b",
        "stream": true,
        "messages": [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Who won the World Series in 2020?"}
        ]
    }' | grep -o '"content":"[^"]*"' | sed 's/"content":"//g' | sed 's/"//g' | tr -d '
    ' | sed 's/\n/
    /g'
  5. 🎉 Hooray! You’re running Generative AI. Our goal is to make this as easy as possible.

About

Llama 2 中文对话模型微调

该模型基于 Meta Platforms, Inc. 发布的 Llama 2 Chat 开源模型进行微调。Llama 2 使用两万亿个 token 进行训练,支持的上下文长度提升至 4096。同时,使用 100 万条人类标注数据对对话能力进行了优化。

由于原版 Llama 2 对中文的适配较弱,开发者使用中文指令集对其进行微调,从而显著增强其中文对话能力。目前,发布了包含 7B 和 13B 参数的两种中文微调模型。

内存需求

  • 7B 模型需要至少 8GB 内存
  • 13B 模型需要至少 16GB 内存

参考

DETAILS

MODEL CLASS
PyTorch

MODULAR GITHUB

Modular

CREATED BY

FlagAlpha

MODEL

FlagAlpha/Llama2-Chinese-7b-Chat-LoRA

TAGS

en
endpoints_compatible
license:apache-2.0
question-answering
region:us
transformers
zh

@ Copyright - Modular Inc - 2024