Copy, customize, and deploy. The quickest way to get your GenAI app up and running, and have total control over every layer.
Model families that can be run using MAX.
MODEL FAMILY | MODALITY | TYPE(S) | HARDWARE |
---|---|---|---|
![]() llama3.1 2 variants Llama 3.1 is a new state-of-the-art model from Meta available in 8B, 70B and 405B parameter sizes. | Chat | MAX Model | CPU GPU |
![]() deepseek-r1 5 variants DeepSeek's first-generation of reasoning models with comparable performance to OpenAI-o1, including six dense models distilled from DeepSeek-R1 based on Llama and Qwen. | Chat | MAX Model | CPU GPU |
![]() llama3.2-vision 1 variant Llama 3.2 Vision is a collection of instruction-tuned image reasoning generative models in 11B and 90B sizes. | Vision | MAX Model | GPU |
![]() llama3.2 4 variants Meta's Llama 3.2 goes small with 1B and 3B models. | Chat | MAX Model | CPU GPU |
![]() mistral 1 variant The 7B model released by Mistral AI, updated to version 0.3. | Chat | MAX Model | GPU |
![]() mistral-nemo 1 variant A state-of-the-art 12B model with 128k context length, built by Mistral AI in collaboration with NVIDIA. | Chat | MAX Model | GPU |
![]() qwen2.5 3 variants Qwen2.5 models are pretrained on Alibaba's latest large-scale dataset, encompassing up to 18 trillion tokens. The model supports up to 128K tokens and has multilingual support. | Chat Code | MAX Model | GPU |
Created by our community.
(95)
Model families that can be run using MAX.
MODEL FAMILY | MODALITY | TYPE(S) | HARDWARE |
---|---|---|---|
![]() deepseek-r1 5 variants DeepSeek's first-generation of reasoning models with comparable performance to OpenAI-o1, including six dense models distilled from DeepSeek-R1 based on Llama and Qwen. | Chat | MAX Model | CPU GPU |
![]() llama3.2-vision 1 variant Llama 3.2 Vision is a collection of instruction-tuned image reasoning generative models in 11B and 90B sizes. | Vision | MAX Model | GPU |
![]() llama3.2 4 variants Meta's Llama 3.2 goes small with 1B and 3B models. | Chat | MAX Model | CPU GPU |
![]() llama3.1 2 variants Llama 3.1 is a new state-of-the-art model from Meta available in 8B, 70B and 405B parameter sizes. | Chat | MAX Model | CPU GPU |
![]() mistral 1 variant The 7B model released by Mistral AI, updated to version 0.3. | Chat | MAX Model | GPU |
![]() llama3 2 variants Meta Llama 3: The most capable openly available LLM to date | Chat | MAX Model | CPU GPU |
![]() qwen 2 variants Qwen 1.5 is a series of large language models by Alibaba Cloud spanning from 0.5B to 110B parameters | Chat Code | MAX Model PyTorch | GPU |
![]() qwen2 2 variants Qwen2 is a new series of large language models from Alibaba group | Chat Vision | MAX Model PyTorch | GPU |
![]() qwen2.5 3 variants Qwen2.5 models are pretrained on Alibaba's latest large-scale dataset, encompassing up to 18 trillion tokens. The model supports up to 128K tokens and has multilingual support. | Chat Code | MAX Model | GPU |
![]() llama2 4 variants Llama 2 is a collection of foundation language models ranging from 7B to 70B parameters. | Chat | MAX Model | CPU GPU |
![]() codellama 6 variants A large language model that can use text prompts to generate and discuss code. | Code | MAX Model | CPU GPU |
![]() qwen2.5-coder 5 variants The latest series of Code-Specific Qwen models, with significant improvements in code generation, code reasoning, and code fixing. | Code | MAX Model | GPU |
![]() mistral-nemo 1 variant A state-of-the-art 12B model with 128k context length, built by Mistral AI in collaboration with NVIDIA. | Chat | MAX Model | GPU |
![]() starcoder2 3 variants StarCoder2 is the next generation of transparently trained open code LLMs that comes in three sizes: 3B, 7B and 15B parameters. | Code | PyTorch | GPU |
![]() mixtral 2 variants A set of Mixture of Experts (MoE) model with open weights by Mistral AI in 8x7b and 8x22b parameter sizes. | Chat | PyTorch | GPU |
![]() dolphin-mixtral 1 variant Uncensored, 8x7b and 8x22b fine-tuned models based on the Mixtral mixture of experts models that excels at coding tasks. Created by Eric Hartford. | Chat | PyTorch | GPU |
![]() deepseek-coder-v2 1 variant An open-source Mixture-of-Experts code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. | Code | PyTorch | GPU |
![]() phi 1 variant Phi-2: a 2.7B language model by Microsoft Research that demonstrates outstanding reasoning and language understanding capabilities. | Code | PyTorch | GPU |
![]() llama2-uncensored 1 variant Uncensored Llama 2 model by George Sung and Jarrad Hope. | Chat | MAX Model | CPU |
![]() deepseek-coder 2 variants DeepSeek Coder is a capable coding model trained on two trillion code and natural language tokens. | Chat | MAX Model | GPU |
![]() wizardlm2 1 variant State of the art large language model from Microsoft AI with improved performance on complex chat, multilingual, reasoning and agent use cases. | Chat | PyTorch | GPU |
![]() dolphin-mistral 1 variant The uncensored Dolphin model based on Mistral that excels at coding tasks. Updated to version 2.8. | Chat | MAX Model | GPU |
![]() dolphin-llama3 2 variants Dolphin 2.9 is a new model with 8B and 70B sizes by Eric Hartford based on Llama 3 that has a variety of instruction, conversational, and coding skills. | Chat | MAX Model | CPU GPU |
![]() orca-mini 2 variants A general-purpose model ranging from 3 billion parameters to 70 billion, suitable for entry-level hardware. | Chat | MAX Model | GPU |
![]() command-r 1 variant Command R is a Large Language Model optimized for conversational interaction and long context tasks. | Chat | PyTorch | GPU |
![]() yi 3 variants Yi 1.5 is a high-performing, bilingual language model. | Chat | MAX Model | GPU |
![]() zephyr 1 variant Zephyr is a series of fine-tuned versions of the Mistral and Mixtral models that are trained to act as helpful assistants. | Chat | MAX Model | GPU |
![]() phi3.5 1 variant A lightweight AI model with 3.8 billion parameters with performance overtaking similarly and larger sized models. | Code | PyTorch | GPU |
![]() codestral 1 variant Codestral is Mistral AI’s first-ever code model designed for code generation tasks. | Code | MAX Model | GPU |
![]() starcoder 3 variants StarCoder is a code generation model trained on 80+ programming languages. | Code | PyTorch | GPU |
![]() granite-code 3 variants A family of open foundation models by IBM for Code Intelligence | Code | MAX Model PyTorch | GPU |
![]() smollm 3 variants 🪐 A family of small models with 135M, 360M, and 1.7B parameters, trained on a new high-quality dataset. | Chat | MAX Model | GPU |
![]() qwq 1 variant QwQ is an experimental research model focused on advancing AI reasoning capabilities. | Chat | MAX Model | GPU |
![]() codegeex4 1 variant A versatile model for AI software development scenarios, including code completion. | Code | PyTorch | GPU |
![]() aya 2 variants Aya 23, released by Cohere, is a new family of state-of-the-art, multilingual models that support 23 languages. | Chat | PyTorch | GPU |
![]() codeqwen 1 variant CodeQwen1.5 is a large language model pretrained on a large amount of code data. | Chat | MAX Model | GPU |
![]() smollm2 3 variants SmolLM2 is a family of compact language models available in three size: 135M, 360M, and 1.7B parameters. | Chat | MAX Model | GPU |
![]() nous-hermes2 2 variants The powerful family of models by Nous Research that excels at scientific discussion and coding tasks. | Chat | MAX Model | GPU |
![]() stable-code 1 variant Stable Code 3B is a coding model with instruct and code completion variants on par with models such as Code Llama 7B that are 2.5x larger. | Code | PyTorch | GPU |
![]() tinydolphin 1 variant An experimental 1.1B parameter model trained on the new Dolphin 2.8 dataset by Eric Hartford and based on TinyLlama. | Chat | MAX Model | GPU |
![]() glm4 1 variant A strong multi-lingual general language model with competitive performance to Llama 3. | Code | PyTorch | GPU |
![]() wizardcoder 1 variant State-of-the-art code generation model | Code | MAX Model | GPU |
![]() qwen2-math 2 variants Qwen2 Math is a series of specialized math language models built upon the Qwen2 LLMs, which significantly outperforms the mathematical capabilities of open-source models and even closed-source models (e.g., GPT4o). | Chat | MAX Model | GPU |
![]() stablelm2 2 variants Stable LM 2 is a state-of-the-art 1.6B and 12B parameter language model trained on multilingual data in English, Spanish, German, Italian, French, Portuguese, and Dutch. | Chat | PyTorch | GPU |
![]() moondream 1 variant moondream2 is a small vision language model designed to run efficiently on edge devices. | Vision | PyTorch | GPU |
![]() neural-chat 1 variant A fine-tuned model based on Mistral with good coverage of domain and language. | Chat | MAX Model | GPU |
![]() llama3-gradient 2 variants This model extends LLama-3 8B's context length from 8k to over 1m tokens. | Chat | MAX Model | CPU GPU |
![]() wizard-math 2 variants Model focused on math and logic problems | Chat | MAX Model | GPU |
![]() llama3-chatqa 2 variants A model from NVIDIA based on Llama 3 that excels at conversational question answering (QA) and retrieval-augmented generation (RAG). | Chat | MAX Model | CPU GPU |
![]() sqlcoder 2 variants SQLCoder is a code completion model fined-tuned on StarCoder for SQL generation tasks | Chat | MAX Model | GPU |
![]() deepseek-v2 1 variant A strong, economical, and efficient Mixture-of-Experts language model. | Code | PyTorch | GPU |
![]() nous-hermes 2 variants General use models based on Llama and Llama 2 from Nous Research. | Chat | MAX Model | GPU |
![]() phind-codellama 1 variant Code generation model based on Code Llama. | Chat | MAX Model | CPU |
![]() minicpm-v 1 variant A series of multimodal LLMs (MLLMs) designed for vision-language understanding. | Vision | PyTorch | GPU |
![]() hermes3 4 variants Hermes 3 is the latest version of the flagship Hermes series of LLMs by Nous Research | Chat | MAX Model | CPU GPU |
![]() solar 1 variant A compact, yet powerful 10.7B large language model designed for single-turn conversation. | Chat | MAX Model | GPU |
![]() starling-lm 1 variant Starling is a large language model trained by reinforcement learning from AI feedback focused on improving chatbot helpfulness. | Chat | MAX Model | GPU |
![]() yi-coder 2 variants Yi-Coder is a series of open-source code language models that delivers state-of-the-art coding performance with fewer than 10 billion parameters. | Chat | MAX Model | GPU |
![]() internlm2 4 variants InternLM2.5 is a 7B parameter model tailored for practical scenarios with outstanding reasoning capability. | Code | PyTorch | GPU |
![]() falcon 1 variant A large language model built by the Technology Innovation Institute (TII) for use in summarization, text generation, and chat bots. | Code | PyTorch | GPU |
![]() mistral-small 1 variant Mistral Small 3 sets a new benchmark in the “small” Large Language Models category below 70B. | Chat | MAX Model | GPU |
![]() stable-beluga 2 variants Llama 2 based model fine tuned on an Orca-style dataset. Originally called Free Willy. | Chat | MAX Model | GPU |
![]() dolphin-phi 1 variant 2.7B uncensored Dolphin model by Eric Hartford, based on the Phi language model by Microsoft Research. | Code | PyTorch | GPU |
![]() llama3-groq-tool-use 2 variants A series of models from Groq that represent a significant advancement in open-source AI capabilities for tool use/function calling. | Chat | MAX Model | CPU GPU |
![]() meditron 1 variant Open-source medical large language model adapted from Llama 2 to the medical domain. | Chat | MAX Model | GPU |
![]() granite3-dense 2 variants The IBM Granite 2B and 8B models are designed to support tool-based use cases and support for retrieval augmented generation (RAG), streamlining code generation, translation and bug fixing. | Chat | PyTorch | GPU |
![]() nous-hermes2-mixtral 1 variant The Nous Hermes 2 model from Nous Research, now trained over Mixtral. | Chat | PyTorch | GPU |
![]() granite3.1-dense 2 variants The IBM Granite 2B and 8B models are text-only dense LLMs trained on over 12 trillion tokens of data, demonstrated significant improvements over their predecessors in performance and speed in IBM’s initial testing. | Chat | PyTorch | GPU |
![]() smallthinker 1 variant A new small reasoning model fine-tuned from the Qwen 2.5 3B Instruct model. | Chat | MAX Model | GPU |
![]() magicoder 1 variant 🎩 Magicoder is a family of 7B parameter models trained on 75K synthetic instruction data using OSS-Instruct, a novel approach to enlightening LLMs with open-source code snippets. | Chat | MAX Model | GPU |
![]() falcon2 1 variant Falcon2 is an 11B parameters causal decoder-only model built by TII and trained over 5T tokens. | Code | PyTorch | GPU |
![]() stablelm-zephyr 1 variant A lightweight chat model allowing accurate, and responsive output without requiring high-end hardware. | Chat | PyTorch | GPU |
![]() codebooga 1 variant A high-performing code instruct model created by merging two existing code models. | Chat | MAX Model | GPU |
![]() mathstral 1 variant MathΣtral: a 7B model designed for math reasoning and scientific discovery by Mistral AI. | Chat | MAX Model | GPU |
![]() duckdb-nsql 1 variant 7B parameter text-to-SQL model made by MotherDuck and Numbers Station. | Chat | MAX Model | GPU |
![]() reader-lm 2 variants A series of models that convert HTML content to Markdown content, which is useful for content conversion tasks. | Chat | MAX Model | GPU |
![]() aya-expanse 2 variants Cohere For AI's language models trained to perform well across 23 different languages. | Chat | PyTorch | GPU |
![]() marco-o1 1 variant An open large reasoning model for real-world solutions by the Alibaba International Digital Commerce Group (AIDC-AI). | Chat | MAX Model | GPU |
![]() granite3-moe 2 variants The IBM Granite 1B and 3B models are the first mixture of experts (MoE) Granite models from IBM designed for low latency usage. | Chat | PyTorch | GPU |
![]() solar-pro 1 variant Solar Pro Preview: an advanced large language model (LLM) with 22 billion parameters designed to fit into a single GPU | Code | PyTorch | GPU |
![]() notux 1 variant A top-performing mixture of experts model, fine-tuned with high-quality data. | Chat | PyTorch | GPU |
![]() falcon3 4 variants A family of efficient AI models under 10B parameters performant in science, math, and coding through innovative training techniques. | Chat | MAX Model | GPU |
![]() nuextract 1 variant A 3.8B model fine-tuned on a private high-quality synthetic dataset for information extraction, based on Phi-3. | Code | PyTorch | GPU |
![]() bespoke-minicheck 1 variant A state-of-the-art fact-checking model developed by Bespoke Labs. | Code | PyTorch | GPU |
![]() opencoder 2 variants OpenCoder is an open and reproducible code LLM family which includes 1.5B and 8B models, supporting chat in English and Chinese languages. | Chat | MAX Model | GPU |
![]() llama-guard3 2 variants Llama Guard 3 is a series of models fine-tuned for content safety classification of LLM inputs and responses. | Chat | MAX Model | GPU |
![]() granite3.1-moe 2 variants The IBM Granite 1B and 3B models are long-context mixture of experts (MoE) Granite models from IBM designed for low latency usage. | Chat | PyTorch | GPU |
![]() tulu3 1 variant Tülu 3 is a leading instruction following model family, offering fully open-source data, code, and recipes by the The Allen Institute for AI. | Chat | MAX Model | GPU |
![]() exaone3.5 6 variants EXAONE 3.5 is a collection of instruction-tuned bilingual (English and Korean) generative models ranging from 2.4B to 32B parameters, developed and released by LG AI Research. | Chat Code | MAX Model | CPU GPU |
![]() olmo2 2 variants OLMo 2 is a new family of 7B and 13B models trained on up to 5T tokens. These models are on par with or better than equivalently sized fully open models, and competitive with open-weight models such as Llama 3.1 on English academic benchmarks. | Chat | PyTorch | GPU |
![]() command-r7b 1 variant The smallest model in Cohere's R series delivers top-tier speed, efficiency, and quality to build powerful AI applications on commodity GPUs and edge devices. | Chat | PyTorch | GPU |
![]() granite3-guardian 2 variants The IBM Granite Guardian 3.0 2B and 8B models are designed to detect risks in prompts and/or responses. | Chat | PyTorch | GPU |
![]() sailor2 3 variants Sailor2 are multilingual language models made for South-East Asia. Available in 1B, 8B, and 20B parameter sizes. | Chat | MAX Model | GPU |
![]() all-mpnet-base-v2 1 variant The all-mpnet-base-v2 is a high-performance sentence-transformer model that generates 768-dimensional vector representations for sentences and paragraphs, excelling in NLP tasks like semantic search, clustering, and sentence similarity through extensive pre-training and fine-tuning using contrastive learning techniques. | Embedding | MAX Model | GPU |
![]() mistral-small-24b-instruct-2501 2 variants Mistral Small v24.09 is a cost-effective 22B parameter language model that balances performance and efficiency, excelling in tasks like translation, summarization, and sentiment analysis while offering improved reasoning and code generation capabilities. | Chat | MAX Model | CPU GPU |