Build AI apps fast on

Build AI apps fast on

Copy, customize, and deploy. The quickest way to get your GenAI app up and running, and have total control over every layer.

Model Repository

(218)

Model families that can be run using MAX.

MODEL FAMILY

MODALITY

TYPE(S)

HARDWARE

DeepSeek-R1-Distill-Llama

3 variants

DeepSeek-R1 models improve reasoning through reinforcement learning and fine-tuning, outperforming major benchmarks.

Chat
MAX Model
CPU
GPU

DeepSeek-R1-Distill-Qwen

3 variants

DeepSeek-R1 models improve reasoning through reinforcement learning and fine-tuning, outperforming major benchmarks.

Chat
MAX Model
GPU

Llama-3.2-Instruct

6 variants

Llama 3.2 offers advanced multilingual generative language models for diverse commercial and research applications.

Chat
MAX Model
CPU
GPU

Llama-3.2-Vision-Instruct

1 variant

Llama 3.2-Vision models by Meta, integrating advanced image reasoning capabilities with text, enhance visual tasks.

Vision
MAX Model
GPU

Llama-3.1-Instruct

3 variants

Meta Llama 3.1 is a suite of multilingual, large language models (LLMs) available in 8B,70B, and 405B sizes optimized for text.

Chat
MAX Model
CPU
GPU

Llama-Guard-3

2 variants

Llama Guard 3 improves content safety classification with high accuracy and supports multiple languages.

Chat
MAX Model
CPU
GPU

all-mpnet-base-v2

1 variant

all-mpnet-base-v2 is a sentence-transformers model for efficient semantic sentence representation and clustering.

Embedding
MAX Model
GPU

phi-4-llama-distill

1 variant

Transform and enhance conversational AI with Phi-4 for improved language model performance and safety.

Chat
MAX Model
CPU

Mistral-Instruct-v0.3

1 variant

Mistral-7B-Instruct-v0.3 is a fine-tuned model with extended vocabulary and function calling.

Chat
MAX Model
GPU

Mistral-Small-Instruct-2501

1 variant

Mistral Small 3 achieves state-of-the-art performance among small language models with 24 billion parameters.

Chat
MAX Model
GPU

Mistral-Nemo-Instruct-2407

1 variant

Mistral-Nemo-Instruct-2407 outperforms similar models by integrating multilingual data and improved architecture.

Chat
MAX Model
GPU

Ministral-Instruct-2410

1 variant

The Ministral-8B-Instruct-2410 model excels in multilingual tasks and code-related benchmarks, designed for on-device computing.

Chat
MAX Model
GPU

Qwen2.5-Instruct

5 variants

Qwen2.5 language models offer enhanced multilingual support, instruction-following, and long-context capabilities.

Chat
Code
Vision
MAX Model
PyTorch
GPU

Qwen2.5-Instruct-1M

2 variants

Qwen2.5-7B-Instruct-1M, a powerful language model, excels in long-context tasks with enhanced efficiency.

Chat
MAX Model
GPU

Qwen2.5-Coder-Instruct

2 variants

Qwen2.5-Coder excels in code generation, reasoning, and fixing with 128K context support.

Code
MAX Model
GPU

Qwen2.5-Math

2 variants

Qwen2.5-Math improves multilingual math problem-solving with enhanced reasoning and computational accuracy.

Chat
MAX Model
GPU

QwQ-Preview

1 variant

QwQ-32B-Preview is an AI model with potential but requires improvements in reasoning.

Chat
MAX Model
GPU

aya-expanse

2 variants

Aya Expanse 8B is a multilingual model leveraging advanced research breakthroughs for optimized performance.

Chat
PyTorch
GPU

aya-23

2 variants

Aya 23 is a highly capable multilingual language model optimized for 23 languages and research use.

Chat
PyTorch
GPU

aya-101

1 variant

Aya model is a multilingual AI outperforming rivals, supporting 101 languages, enhancing communication globally.

Chat
PyTorch
GPU

EXAONE-3.5-Instruct

9 variants

Bilingual EXAONE 3.5 language models offer advanced, versatile text generation across device types.

Chat
Code
MAX Model
CPU
GPU

EXAONE-3.0-Instruct

3 variants

EXAONE-3.0-7.8B-Instruct is an advanced bilingual AI model offering competitive benchmark performance.

Chat
Code
MAX Model
CPU
GPU

Mistral-Small-Base-2501

1 variant

Mistral Small 3 redefines small Large Language Models with 24B parameters, innovative capabilities, multilingual support.

Chat
MAX Model
GPU

Mistral-v0.1

1 variant

This generative text model, Mistral-7B-v0.1, excels compared to similar models on multiple benchmarks.

Chat
MAX Model
GPU

Mistral-Instruct-v0.2

1 variant

Mistral-7B-Instruct-v0.2 offers refined instruction-based capabilities for effective text generation tasks.

Chat
MAX Model
GPU

Mistral-Instruct-v0.1

1 variant

The Mistral-7B-Instruct-v0.1 is a fine-tuned, instruction-following language model.

Chat
MAX Model
GPU

Mistral-Small-Instruct-2409

1 variant

Mistral-Small-Instruct-2409 is a fine-tuned model designed for sequence tasks with 22B parameters.

Chat
MAX Model
GPU

Meta-Llama-3-Instruct

2 variants

Meta Llama 3 models excel at generating text through optimized transformer architecture for diverse applications.

Chat
MAX Model
CPU
GPU

Phi-3.5-mini-instruct

1 variant

Phi-3.5-mini offers a powerful multilingual AI model designed for effective text generation across various use cases with enhanced reasoning capabilities and long-context understanding.

Code
MAX Model
GPU

Phi-3.5-vision-instruct

1 variant

Discover the Phi-3.5-vision model, a state-of-the-art, lightweight multimodal solution for image-text tasks.

Vision
PyTorch
GPU

Llama-2-chat-hf

3 variants

Llama 2 offers scalable generative text models enhancing dialogue applications with large parameter sizes.

Chat
MAX Model
CPU
GPU

llava-v1.5

1 variant

LLaVA is an advanced open-source chatbot designed for multimodal instruction research and language processing.

Vision
PyTorch
GPU

LLaVA-delta-v0

1 variant

LLaVA is a fine-tuned open-source chatbot model using GPT for multimodal instruction-following.

Chat
MAX Model
GPU

CodeLlama-hf

6 variants

Code Llama offers versatile models for code synthesis and understanding, designed for multiple programming needs.

Code
MAX Model
CPU
GPU

TinyLlama-Chat-v1.0

2 variants

TinyLlama efficiently pretrains a compact 1.1B Llama model, optimizing resources and enhancing adaptability.

Chat
MAX Model
CPU
GPU

starcoder2-instruct-v0.1

1 variant

StarCoder2-15B-Instruct is a self-aligned model optimizing code generation without human annotations.

Code
PyTorch
GPU

DeepSeek-Coder-V2-Lite-Instruct

1 variant

DeepSeek-Coder-V2 excels in code tasks, supports 338 languages, and boasts advanced features.

Code
PyTorch
GPU

Dolphin3.0-R1-Mistral

1 variant

Dolphin 3.0 R1, a flexible AI model, enables customized solutions for coding and general tasks.

Chat
MAX Model
GPU

Dolphin3.0-Mistral

1 variant

Dolphin 3.0 Mistral 24B is an adaptable, user-controlled AI model for diverse applications.

Chat
MAX Model
GPU

Dolphin3.0-Llama3.1

3 variants

Dolphin 3.0, a versatile AI model, empowers businesses with customizable and private solutions.

Chat
MAX Model
CPU
GPU

Dolphin3.0-Llama3.2

5 variants

Dolphin 3.0 is an adaptable AI model focused on privacy, control, and customization.

Chat
MAX Model
CPU
GPU

Dolphin3.0-Qwen2.5

3 variants

Dolphin 3.0 is an advanced AI model offering personalized, general-purpose functionalities for diverse applications.

Chat
MAX Model
GPU

WizardLM-2

1 variant

Introducing WizardLM-2, an advanced multilingual model with enhanced performance for complex tasks.

Chat
MAX Model
GPU

OLMo-2-1124-Instruct

1 variant

OLMo-2 models offer diverse language capabilities, perfect for cutting-edge state-of-the-art tasks.

Chat
PyTorch
GPU

OLMo-Instruct

1 variant

OLMo 7B Instruct enhances language model science, excels at question answering, with open access.

Code
PyTorch
GPU

OLMo-hf

1 variant

OLMo is an open-source language model series by AI2, optimized for language model research.

Chat
MAX Model
GPU

OLMo-0424

1 variant

OLMo 7B April 2024 improves language model performance using updated training techniques and datasets.

Chat
PyTorch
GPU

OLMo-0724-hf

1 variant

The OLMo 1B July 2024 model is enhanced with advanced dataset training, showing substantial performance improvements.

Chat
MAX Model
GPU

OLMo-2-1124-Instruct-preview

1 variant

OLMo-2 models excel in diverse tasks using advanced training techniques and a versatile dataset.

Chat
PyTorch
GPU

Nous-Hermes

1 variant

Nous-Hermes-13b is a cutting-edge language model outperforming in long responses and task accuracy.

Chat
MAX Model
GPU

Nous-Hermes-Llama2

2 variants

Nous-Hermes-Llama2-13b is optimized for long, accurate responses without censorship, outperforming predecessors.

Chat
MAX Model
CPU
GPU

Nous-Hermes-2-Yi

3 variants

Nous Hermes 2 - Yi-34B sets new standards with exceptional benchmark performance and usability.

Chat
MAX Model
CPU
GPU

Nous-Hermes-2-SOLAR

3 variants

Nous Hermes 2 on SOLAR 10.7B achieves enhanced performance on various AI benchmarks, closer to Yi-34B.

Chat
MAX Model
CPU
GPU

Nous-Hermes-2-Mistral-DPO

1 variant

Nous Hermes 2 Mistral 7B DPO model exhibits improved performance across multiple benchmarking tests.

Chat
MAX Model
GPU

Nous-Hermes-llama-2

3 variants

Nous-Hermes-Llama2-7b excels with long responses and reduced hallucination using diverse datasets.

Chat
MAX Model
CPU
GPU

c4ai-command-r-plus

1 variant

C4AI Command R+ is a multilingual, advanced AI model trained for reasoning, summarization, and question answering using 104 billion parameters.

Chat
PyTorch
GPU

c4ai-command-r-v01

2 variants

C4AI Command-R is a 35 billion parameter, multilingual generative model optimized for various tasks.

Chat
Code
PyTorch
GPU

c4ai-command-r-08-2024

1 variant

C4AI Command R 08-2024 is a multidimensional AI model excelling in multilingual tasks, reasoning, and citations.

Chat
PyTorch
GPU

Yi-1.5-Chat

6 variants

Yi-1.5 enhances performance in language tasks, coding, and reasoning using extensive training.

Chat
MAX Model
CPU
GPU

Yi-Coder

2 variants

Yi-Coder offers advanced open-source code models supporting 52 languages with exceptional performance and efficiency.

Chat
Code
MAX Model
CPU
GPU

Yi-Coder-Chat

3 variants

Yi-Coder delivers top coding performance using models under 10 billion parameters for 52 languages.

Chat
MAX Model
CPU
GPU

Yi

5 variants

Explore Yi's next-gen, bilingual open-source models excelling in various benchmarks globally.

Chat
MAX Model
CPU
GPU

Yi-200K

6 variants

Yi's innovative bilingual language models excel in large-scale multilingual benchmarks, offering top-tier performance across tasks.

Chat
MAX Model
CPU
GPU

Yi-Chat

8 variants

Yi series models are powerful bilingual, open-source AI language models, excelling in various NLP tasks.

Chat
MAX Model
CPU
GPU

Codestral-v0.1

1 variant

Codestral-22B-v0.1 excels in code generation, utilizing 80+ programming languages like Python and Java.

Code
MAX Model
GPU

Mamba-Codestral-v0.1

1 variant

Codestral Mamba 7B is a leading open code model, excelling in multiple benchmarks.

Chat
PyTorch
GPU

Sailor2-Chat

3 variants

Sailor2 offers a multilingual language model for 15 South-East Asian languages, supporting diverse applications.

Chat
MAX Model
GPU

Sailor2

3 variants

Sailor2 develops multilingual language models for Southeast Asia with accessible advanced technology.

Chat
MAX Model
GPU

vicuna-delta-v0

2 variants

Vicuna is a chat assistant fine-tuned from LLaMA, intended primarily for research purposes.

Chat
MAX Model
GPU

vicuna-v1.3

2 variants

Vicuna is an advanced chat assistant model designed for research in NLP, AI, and ML.

Chat
MAX Model
GPU

vicuna-v1.5

2 variants

Vicuna, an AI chat assistant, excels in research on language models and chatbots, benefiting NLP enthusiasts.

Chat
MAX Model
GPU

vicuna-v1.5-16k

1 variant

Vicuna is a chat assistant model, fine-tuned on user-shared conversations for NLP research.

Chat
MAX Model
GPU

vicuna-delta-v1.1

2 variants

Vicuna is a chat assistant trained on LLaMA for research in AI and NLP conversations.

Chat
MAX Model
GPU

vicuna-v1.1

1 variant

Vicuna is a fine-tuned LLaMA model focused on chatbot research using user-shared conversations.

Chat
MAX Model
GPU

Nemotron-Mini-Instruct

1 variant

The Nemotron-Mini-4B-Instruct is an optimized language model for roleplay, Q&A, and function calls.

Chat
PyTorch
GPU

Falcon3-Instruct

7 variants

Falcon3-1B-Instruct is a versatile model excelling in reasoning, language, and inquiry tasks.

Chat
MAX Model
CPU
GPU

falcon-instruct

2 variants

Falcon-7B-Instruct is a powerful, optimized model for chat and instruction tasks.

Code
PyTorch
GPU

WizardCoder-Python-V1.0

2 variants

WizardCoder models significantly advance code language models, reaching state-of-the-art results in evaluations.

Code
MAX Model
GPU

WizardCoder-V1.1

3 variants

Explore WizardCoder, a leading Code Large Language Model with exceptional performance across multiple benchmarks.

Code
MAX Model
CPU
GPU

CodeLlama-Instruct-hf

9 variants

Code Llama offers versatile models for code synthesis and understanding across various programming tasks.

Code
MAX Model
CPU
GPU

DeepHermes-3-Llama-3-Preview

3 variants

DeepHermes 3, enhanced with advanced reasoning, offers flexible AI interaction and improved user alignment.

Chat
MAX Model
CPU
GPU

OpenThinker

2 variants

OpenThinker-7B excels in performance and open-source accessibility across multiple evaluation metrics.

Chat
MAX Model
GPU

DeepSeek-R1-Distill-Qwen-abliterated

1 variant

Uncensored DeepSeek model version created via abliteration to enhance LLM functionalities.

Chat
MAX Model
GPU

DeepSeek-R1-Distill-Qwen-abliterated-v2

2 variants

DeepSeek-R1-Distill-Qwen-7B-abliterated-v2 offers an uncensored, refined language model version.

Chat
MAX Model
GPU

Velvet

2 variants

Velvet-2B, an Italian language model, excels in diverse linguistic applications, promoting ethical AI use.

Chat
MAX Model
GPU

SmolLM2-Instruct

2 variants

SmolLM2 offers efficient, compact language models excelling in diverse tasks and instruction-following.

Chat
MAX Model
CPU
GPU

OpenR1-Qwen

1 variant

Finetuned Qwen2.5-Math model enhances mathematical problem-solving capabilities with extended context length.

Chat
MAX Model
GPU

QVikhr-2.5-Instruct-r

1 variant

QVikhr-2.5-1.5B-Instruct-r is a bilingual language model specialized in Russian math datasets.

Chat
MAX Model
GPU

ReaderLM-v2

1 variant

ReaderLM-v2 efficiently converts HTML into markdown or JSON, supporting 29 languages with 512K token handling.

Chat
MAX Model
GPU

Llama-2-hf

3 variants

Llama 2 offers advanced, scalable language models with enhanced performance and ethical considerations.

Chat
MAX Model
CPU
GPU

Llama-3.1-Tulu-3.1

3 variants

Llama-3.1-Tulu-3 models excel in diverse tasks with advanced training techniques and open-source data.

Chat
MAX Model
CPU
GPU

OREAL

1 variant

OREAL mathematical reasoning models excel with innovative reinforcement learning, achieving significant accuracy improvements.

Chat
MAX Model
GPU

DeepSeek-llama3.1-Bllossom

2 variants

DeepSeek-llama3.1-Bllossom-8B enhances multilingual AI performance, notably improving Korean language inference.

Chat
MAX Model
CPU
GPU

DeepSeek-R1-Distill-Llama-abliterated

1 variant

DeepSeek-R1-Distill-Llama-8B-abliterated is an uncensored model offering advanced AI capabilities.

Chat
MAX Model
GPU

YuE-s1-anneal-en-cot

3 variants

YuE is an innovative music generation model transforming lyrics into complete songs, supporting diverse genres and languages.

Chat
MAX Model
CPU
GPU

DeepSeek-R1-Distill-Qwen-Japanese

1 variant

DeepSeek-R1-Distill-Qwen-32B-Japanese is a Japanese language model designed for advanced text generation.

Chat
MAX Model
GPU

Llama-Krikri-Instruct

1 variant

Llama-Krikri-8B-Instruct enhances Greek text generation through extensive pretraining and bilingual capabilities.

Chat
MAX Model
CPU

Fino1

2 variants

Fino1-8B is a financial reasoning model fine-tuned for enhanced performance on specified tasks.

Chat
MAX Model
CPU
GPU

@ Copyright - Modular Inc - 2025