Citation :
Here's a chronological breakdown of some of the most interesting open models released around October 1st - 31st, 2025: October 1st: LFM2-Audio-1.5B (Liquid AI): Low-latency, end-to-end audio foundation model. KaniTTS-370M (NineNineSix): Fast, open-source TTS for real-time applications. October 2nd: Granite 4.0 (IBM): Hyper-efficient, hybrid models for enterprise use. NeuTTS Air (Neuphonic Speech): On-device TTS with instant voice cloning. October 3rd: Agent S3 (Simular): Open framework for human-like computer use. Ming-UniVision-16B-A3B (Ant Group): Unified vision understanding, generation, editing model. Ovi (TTV/ITV) (Character.AI / Yale): Open-source framework for offline talking avatars. CoDA-v0-Instruct (Salesforce AI Research): Bidirectional diffusion model for code generation. October 4th: Qwen3-VL-30B-A3B-Instruct (Alibaba): Powerful vision-language model for agentic tasks. DecartXR (Decart AI): Open-source Quest app for realtime video-FX. October 7th: LFM2-8B-A1B (Liquid AI): Efficient on-device mixture-of-experts model. Hunyuan-Vision-1.5-Thinking (Tencent): Multimodal "thinking on images" reasoning model. Paris (Bagel Network): Decentralized-trained open-weight diffusion model. StreamDiffusionV2 (UC Berkeley, MIT, et al.): Open-source pipeline for real-time video streaming. October 8th: Jamba Reasoning 3B (AI21 Labs): Small hybrid model for on-device reasoning. Ling-1T / Ring-1T (Ant Group): Trillion-parameter thinking/non-thinking open models. Mimix (Research): Framework for multi-character video generation. October 9th: UserLM-8b (Microsoft): Open-weight model simulating a "user" role. RND1-Base-0910 (Radical Numerics): Experimental diffusion language model (30B MoE). October 10th: KAT-Dev-72B-Exp (Kwaipilot): Open-source experimental model for agentic coding. October 12th: DreamOmni2 (ByteDance): Multimodal instruction-based image editing/generation. October 13th: StreamingVLM (MIT Han Lab): Real-time understanding for infinite video streams. October 14th: Qwen3-VL-4B / 8B (Alibaba): Efficient, open vision-language models for edge. October 16th: PaddleOCR-VL (Baidu): Lightweight 109-language document parsing model. MobileLLM-Pro (Meta): 1B parameter on-device model (128k context). FlashWorld (Tencent): Fast (5-10 sec) 3D scene generation. RTFM (Real-Time Frame Model) (WorldLabs): Real-time, interactive 3D world generation. October 17th: LLaDA2.0-flash-preview (Ant Group): 100B MoE diffusion model for reasoning/code. October 20th: DeepSeek-OCR (DeepseekAI): Open-source model for optical context-compression. Krea Realtime 14B (Krea AI): 14B open-weight real-time video generation. October 21st: Qwen3-VL-2B / 32B (Alibaba): Open, dense VLMs for edge and cloud. BADAS-Open (Nexar): Ego-centric collision prediction model for ADAS. October 22nd: LFM2-VL-3B (Liquid AI): Efficient vision-language model for edge deployment. HunyuanWorld-1.1 (Tencent): 3D world generation from multi-view/video. PokeeResearch-7B (Pokee AI): Open 7B deep-research agent (search/synthesis). olmOCR-2-7B-1025 (Allen Institute for AI): Open-source, single-pass PDF-to-structured-text model. October 23rd: LTX 2 (Lightricks): Open-source 4K video engine for consumer GPUs. LightOnOCR-1B (LightOn): Fast, 1B-parameter open-source OCR VLM. HoloCine (Research): Model for holistic, multi-shot cinematic narratives. October 24th: Tahoe-x1 (Tahoe Therapeutics): 3B open-source single-cell biology model. P1 (PRIME-RL): Model mastering Physics Olympiads with RL. October 25th: LongCat-Video (Meituan): 13.6B open model for long video generation. Seed 3D 1.0 (ByteDance): Generates simulation-grade 3D assets from images. October 27th: Minimax M2 (Minimax): Open-sourced intelligence engine for agentic workflows. Ming-flash-omni-Preview (Ant Group): 100B MoE omni-modal model for perception. LLaDA2.0-mini-preview (Ant Group): 16B MoE diffusion model for language. October 28th: LFM2-ColBERT-350M (Liquid AI): Multilingual "late interaction" RAG retriever model. Granite 4.0 Nano (1B / 350M) (IBM): Smallest open models for on-device use. ViMax (HKUDS): Agentic framework for end-to-end video creation. Nemotron Nano v2 VL (NVIDIA): 12B open model for multi-image/video understanding. October 29th: gpt-oss-safeguard (OpenAI): Open-weight reasoning models for safety classification. Frames to Video (Morphic): Open-source model for keyframe video interpolation. Fibo (Bria AI): SOTA open-source model (trained on licensed data). October 30th: Emu3.5 (BAAI): Native multimodal model as a world learner. Kimi-Linear-48B-A3B (Moonshot AI): Long-context model using a linear-attention mechanism. RWKV-7 G0a3 7.2B (BlinkDL): A multilingual RNN-based large language model. UI-Ins-32B / 7B (Alibaba): GUI grounding agent.
|