AI News & Library Index

A curated, link-backed index of AI library releases, model updates, and tooling changes—useful if you are studying modern ML/AI stacks or choosing what to learn next alongside Paath.online tutoring. Each entry points to primary sources; we summarize context, not replace documentation.

What is this index?

This page tracks new AI and ML library releases (left) and AI news (right). Each library entry includes a short summary and a source link; each news item includes a detailed summary and official source. We update it regularly to help you stay current with Python AI frameworks, RAG tools, agent libraries, and model releases.

New library

  • 5 Apr 2026

    Elasticsearch — Reciprocal Rank Fusion (RRF) retriever

    Official Elastic docs describe RRF for merging two or more child retrievers (for example kNN vector search plus lexical query) into a single ranking without requiring scores on a common scale. Parameters include rank_constant (default 60) and rank_window_size.

    Source: Elastic documentation
  • 5 Apr 2026

    Pinecone — Hybrid search (dense + sparse vectors)

    Pinecone’s guides explain combining semantic (dense embedding) retrieval with lexical (sparse) signals—either in one hybrid-capable index or separate indexes—plus trade-offs for reranking and operations.

    Source: Pinecone docs
  • 4 Apr 2026

    Weaviate — Hybrid search (vector + BM25)

    Weaviate documents hybrid queries that fuse dense vector similarity with BM25-style keyword relevance, including fusion strategies so teams can balance “meaning match” vs “exact term match.”

    Source: Weaviate docs
  • 3 Apr 2026

    OpenSearch — Neural + lexical search patterns

    OpenSearch documents neural (vector) search alongside traditional text search, used in many RAG stacks that need both semantic recall and exact keyword hits (SKUs, IDs, citations).

    Source: OpenSearch docs
  • 15 Mar 2026

    Unsloth — Run & fine‑tune open models locally (LoRA/QLoRA, Studio)

    Unsloth helps you run and train 500+ open models locally with strong VRAM efficiency. Includes Unsloth Studio UI, LoRA/QLoRA fine-tuning guides, exports (GGUF/safetensors), and training observability.

    Source: Unsloth docs
  • 11 Mar 2026

    NVIDIA Nemotron 3 Super — Open hybrid Mamba‑Transformer MoE for agentic reasoning

    Open weights + recipes for an agent-focused hybrid Mamba‑Transformer MoE with native long context. Built to reduce “thinking tax” and improve throughput for multi-step agent workflows.

    Source: NVIDIA developer blog
  • 9 Mar 2026

    Context Hub — Andrew Ng’s CLI for AI coding agent docs & persistent memory

    Open-source CLI from DeepLearning.AI: curated, versioned API documentation for AI agents. chub search/get/annotate; language-specific docs (Python, JS); persistent local annotations. MIT, npm @aisuite/chub.

    Source: GitHub
  • 5 Mar 2026

    OpenAI Symphony — Agentic framework for autonomous AI agents

    Open-source agentic framework (Elixir/BEAM) for orchestrating autonomous AI coding agents with structured, scalable implementation runs. Integrates with issue trackers like Linear and focuses on reliable agent workflows.

    Source: MarkTechPost
  • 5 Mar 2026

    Luma Agents — Creative AI agents on Unified Intelligence models

    End-to-end creative AI agents powered by Luma’s Uni-1 unified intelligence model. Orchestrate multimodal workflows across text, image, video, and audio with persistent context, collaborating with models like Google Veo 3 and ElevenLabs.

    Source: TechCrunch
  • 1 Mar 2026

    Open SWE — Open-source framework for internal coding agents

    LangChain’s Open SWE shares production patterns for internal coding agents: sandboxed execution, curated toolsets, and multi-step agent orchestration with LangGraph.

    Source: LangChain blog
  • 15 Feb 2026

    Recursive Language Models (RLMs) — MIT CSAIL

    Novel inference paradigm for long-context AI: process 10M+ tokens via symbolic recursion and code execution. RLM-Qwen3-8B outperforms base by 28.3% on long-context tasks. Paper and code available.

    Source: arXiv paper
  • 1 Feb 2026

    LiteMind v2026.2 — Unified multimodal AI framework

    Unified API for OpenAI, Anthropic, Google Gemini, and Ollama. Agentic ReAct-style framework, built-in RAG, tool integration. Native support for text, images, audio, video, and PDFs. Python 3.10+.

    Source: PyPI
  • 1 Feb 2026

    Helix — Production agent framework with budget limits

    Semantic caching (40–70% API cost reduction), persistent memory, multi-agent teams, YAML pipelines. Supports OpenAI, Anthropic, Gemini, Groq, Mistral, and 8+ providers.

    Source: GitHub
  • 1 Feb 2026

    PageIndex — Vectorless, reasoning-based RAG

    RAG without vector DBs: builds a tree-structured index (table-of-contents) from documents and uses LLM reasoning + tree search for retrieval. 98.7% on FinanceBench; explainable, section-level references. Chat, API, MCP. By VectifyAI.

    Source: GitHub
  • 1 Feb 2026

    LlamaIndex 0.14 — RAG & agent updates

    Security and crash fixes, TokenBudgetHandler for cost control, agent retry logic for empty LLM responses, LangChain 1.x support. RAG and workflow framework for production.

    Source: LlamaIndex changelog
  • 1 Feb 2026

    AIST aiaccel — ML research acceleration

    Toolkit for HPC clusters: PyTorch/Lightning training, hyperparameter optimization, OmegaConf config. For large-scale ML research.

    Source: PyPI
  • 15 Jan 2026

    Voyage 4 — Embedding models & multimodal

    voyage-4-large (RTEB leaderboard), voyage-4-lite, voyage-4-nano (open-weights). Shared embedding space; voyage-multimodal-3.5 with video retrieval. On Azure, AWS, GCP, MongoDB Atlas.

    Source: Voyage AI blog
  • 1 Jan 2026

    RAGdb — Embeddable SQLite RAG (no vector DB)

    Single-file .ragdb SQLite database: ingestion, multimodal extraction, hybrid retrieval (TF-IDF + keyword) in one portable file. No Docker/cloud; ~99.5% smaller than typical RAG stacks. Python 3.9+, pip install ragdb.

    Source: GitHub
  • 1 Jan 2026

    Orca AI SDK — Unified LLM interface

    Provider-agnostic library for OpenAI, Anthropic, Google Gemini, OpenRouter. Full async/sync and streaming. Simplifies multi-provider apps.

    Source: PyPI
  • 1 Jan 2026

    Trinity-RFT — Reinforcement fine-tuning for LLMs

    Framework for training LLMs with reinforcement fine-tuning (RFT). Python 3.10+. For researchers and practitioners scaling RFT.

    Source: PyPI
  • 30 Sept 2025

    Model Context Protocol (MCP) — Agent tool standard

    Open standard for connecting AI assistants to tools and data. Adopted by major vendors. Safer, auditable integrations for agents.

    Source: MCP site

AI news

  • 5 Apr 2026

    Hybrid retrieval for RAG: why teams fuse semantic search, keywords, and RRF

    Retrieval-Augmented Generation quality depends heavily on finding the right chunks. Pure semantic (dense vector) search handles paraphrases well but can miss exact tokens such as error codes, SKUs, or legal clause numbers. Pure keyword (BM25-style) search does the opposite. Hybrid pipelines run both retrievers and merge results: rank-based fusion methods like Reciprocal Rank Fusion (RRF), described by Cormack, Clarke, and Buettcher (SIGIR 2009) and documented for production in Elasticsearch’s RRF retriever, assign each document a score from the sum of 1/(k + rank) across lists, with a constant k (Elastic’s default rank_constant is 60). Separately, some engines apply weighted combinations when scores are normalized to a comparable range—Weaviate’s hybrid search documentation discusses balancing vector vs keyword contribution. These patterns are now mainstream in vector DB and search-engine documentation, not experimental-only.

    Source: Elastic + IR literature
  • 26 Mar 2026

    Google launches Gemini 3.1 Flash Live (audio & real-time voice)

    In March 2026, Google introduced Gemini 3.1 Flash Live as a real-time audio/voice model built for more natural conversations and lower latency. The update emphasizes improved precision in spoken dialogue and better tonal understanding, making it relevant for AI tutoring, live assistants, and voice-driven agent workflows. Developers can use the Gemini Live preview tooling (Google AI Studio/Gemini platform) to build experiences that respond with less delay—especially important for educational and interactive scenarios.

    Source: Google blog
  • 24 Mar 2026

    vLLM KV cache + continuous batching update: scheduling by full input sequence length

    In March 2026, vLLM’s continuous batching and paged KV cache work added scheduler behavior that accounts for full input sequence length. The practical impact is fewer preemptions and better generation throughput because the scheduler doesn’t over-admit requests based only on the first chunk. This directly matters for applications that run long prompts (education, tutoring, RAG transcripts) where stable throughput is crucial for both UX latency and cost efficiency.

    Source: vLLM GitHub PR
  • 17 Mar 2026

    Gemini API tooling updates: tool combos, context circulation & Maps grounding

    Google announced Gemini API tooling updates in March 2026 to improve agentic development patterns. The updates include combining built-in tools (like Google Search and Maps grounding) with developer function calls in a single flow, plus improved context behavior so the model can access tool call results in later steps. For developers building planning + tool-use agents, these changes reduce orchestration complexity and can improve end-to-end latency. It also makes location-aware tutoring assistants and “plan a task using maps + documents” experiences more straightforward to prototype.

    Source: Google blog
  • 15 Mar 2026

    Unsloth gains traction for VRAM‑efficient local LLM fine‑tuning (LoRA/QLoRA)

    Unsloth has become a popular choice for students and developers who want to fine‑tune open LLMs locally without large GPU budgets. The official documentation emphasizes a practical workflow (install → choose LoRA vs QLoRA → train → export → deploy) and provides a detailed LoRA hyperparameters guide covering learning rate, epochs, effective batch size (batch_size × gradient_accumulation_steps), rank (r), alpha, and target modules. This trend aligns with the broader shift toward local-first AI development: lower cost, better privacy, and faster iteration for projects like tutoring assistants, domain chatbots, and RAG pipelines.

    Source: Unsloth docs
  • 15 Mar 2026

    Model Context Protocol (MCP) 2026 roadmap: Working Groups & production focus

    In early 2026, the MCP ecosystem shifted toward a more production-oriented roadmap: instead of only planning around releases, the community focused on working groups that improve core reliability (agent communication), transport scalability, and error handling. The roadmap also highlights governance practices for safe extension evolution so tools do not silently break across client/server upgrades. For builders, this is a strong signal that MCP is maturing into long-lived infrastructure for tool-and-context integration across AI assistants.

    Source: MCP blog
  • 11 Mar 2026

    OpenAI launches new tools for building agents (Responses API + Agents SDK)

    In March 2026, OpenAI announced new tooling aimed at making agentic applications easier to build and operate. The update highlights a new Responses API that unifies tool use with simple request patterns, an Agents SDK for orchestrating single and multi-agent workflows, built-in tools like web search and file search, and tracing/observability features to debug agent runs. For teams shipping AI products, this reflects a broader shift from “prompt-only apps” toward agent systems that need reliable tool calling, guardrails, and production-grade monitoring.

    Source: OpenAI
  • 10 Mar 2026

    NVIDIA OpenShell — safer runtime patterns for autonomous agents

    As agentic systems become more capable, safety and containment become core engineering concerns. NVIDIA’s OpenShell focuses on running autonomous, self-evolving agents more safely by combining sandboxed execution with policy-based restrictions (what tools/commands are allowed) and operational guardrails. This trend matters for developers building coding agents and automation assistants: the “runtime” and permissions model can be as important as the model itself.

    Source: NVIDIA developer blog
  • 9 Mar 2026

    Andrew Ng’s team releases Context Hub: API docs & persistent memory for AI coding agents

    In March 2026, Andrew Ng’s team at DeepLearning.AI released Context Hub, an open-source CLI tool that acts as a “package manager for AI-readable documentation.” AI coding agents often hallucinate API signatures or use outdated endpoints because they are trained on static data; Context Hub lets them search, fetch, and use up-to-date docs (e.g. chub search openai, chub get openai/chat --lang py) and annotate locally with chub annotate so that learnings persist across sessions. The project is MIT-licensed, available as npm package @aisuite/chub, and has seen strong adoption with integrations for Claude Code and other AI coding tools. It underscores the trend toward giving agents accurate, maintainable context instead of relying only on model weights.

    Source: MarkTechPost
  • 6 Mar 2026

    Microsoft releases Phi-4-Reasoning-Vision-15B open-weight multimodal model

    Microsoft announced Phi-4-Reasoning-Vision-15B in March 2026, a compact 15 billion parameter open-weight multimodal model focused on math, science, and graphical user interface understanding. The model is trained to balance reasoning quality with efficient compute and data usage, making it attractive for teams that want strong performance without frontier-model costs. Because it is open-weight, developers can fine-tune and self-host Phi-4-Reasoning-Vision for AI coding assistants, educational tools, and agentic systems that need reliable tool use and screen understanding. The release continues the trend of high-quality open-weight models that compete closely with proprietary offerings.

    Source: MarkTechPost
  • 5 Mar 2026

    Luma launches creative AI agents on Unified Intelligence models

    In March 2026, Luma introduced Luma Agents, a suite of creative AI agents powered by its new Uni-1 model from the Unified Intelligence family. The agents support multi-step, multimodal workflows across text, images, video, and audio, and maintain persistent context across assets, collaborators, and tools. Luma positions these creative AI agents as production-ready building blocks for studios and brands, integrating with third-party models like Google's Veo 3, ByteDance's Seedream, and ElevenLabs. The launch reflects a broader shift toward agentic AI systems that prioritize reliability, orchestration, and real-world outcomes over single prompt generations.

    Source: TechCrunch
  • 3 Mar 2026

    Gemini 3.1 Flash-Lite — Google’s fast, cost-effective AI model

    Google introduced Gemini 3.1 Flash-Lite in March 2026 as its most cost-effective AI model for high-volume workloads. Priced significantly lower per token than previous generations, Flash-Lite delivers up to 2.5× faster performance than Gemini 2.5 Flash while maintaining similar or better quality on many tasks. It targets use cases like real-time translation, content moderation, UI generation, and large-scale simulations where latency and cost per request matter more than frontier-level reasoning. For developers, this model fits neatly into AI monetization strategies that demand sustainable economics at scale.

    Source: Google DeepMind blog
  • 1 Mar 2026

    AI search trends 2026 — AEO, AI Overviews, and topical authority

    Recent reports on AI search in 2026 highlight how AI Overviews and other generative answer features are reshaping SEO strategy. Instead of only optimizing for blue-link rankings, publishers now focus on AI Engine Optimization (AEO), building topical authority and strong E-E-A-T signals so AI models trust and cite their content. Concepts like semantic relevance, brand visibility, and third-party citations matter more than exact-match keywords, because AI systems fan out from a query to related intents and synthesize answers across multiple trusted sources. For AI news and developer content, this means balancing rich, human-readable explanations with clear keywords around AI coding assistants, agentic systems, open-weight models, and AI search trends so that both users and AI search engines can understand the topic.

    Source: SEO.com blog
  • 19 Feb 2026

    Google Gemini 3.1 Pro — Flagship model release

    Google launched Gemini 3.1 Pro in February 2026 as its most capable model to date. It delivers roughly twice the reasoning performance of Gemini 3 Pro and scores 77.1% on the ARC-AGI-2 benchmark. The model supports a 1 million token context window and can output up to 65K tokens, making it suitable for long-document and code-generation tasks. It ranks first on 12 of 18 tracked benchmarks and excels at software engineering (80.6% on SWE-Bench Verified). Developers can access it via the Gemini API, Google AI Studio, Android Studio, and consumer-facing products.

    Source: Google AI for Developers
  • 17 Feb 2026

    Anthropic Sonnet 4.6 — 1M context, stronger coding & computer use

    Anthropic released Claude Sonnet 4.6 in February 2026 with a doubled context window of 1 million tokens (up from 200K). The model scores 60.4% on ARC-AGI-2, a benchmark aimed at human-like reasoning. Improvements focus on coding, instruction-following, and computer use (screen understanding and control). Sonnet 4.6 became the default model for both Free and Pro plan users on claude.ai and via the API, offering a strong balance of speed and capability for developers and power users.

    Source: Anthropic
  • 16 Feb 2026

    Alibaba Qwen 3.5 — Agentic AI model with vision

    Alibaba unveiled Qwen 3.5 in February 2026, positioning it for the "agentic AI era." The company claims around 60% lower cost and up to 8× better performance on large workloads compared to the previous generation. The model includes visual agentic capabilities, allowing it to understand screens and take actions across applications independently. It targets enterprise and developer use with stronger reasoning and tool use while reducing inference cost, and is available through Alibaba Cloud and open-weight variants.

    Source: Alibaba Cloud
  • 1 Feb 2026

    HyperNova 60B — Compressed open LLM on Hugging Face

    Spanish startup Multiverse Computing released HyperNova 60B 2602 in February 2026, a 50% compressed version of OpenAI's gpt-oss-120B model. Memory footprint drops from 61GB to 32GB using the company's quantum-inspired CompactifAI compression technology. The model shows significant gains in tool-calling and agentic coding, with around 1.5× improvement on the BFCL v4 benchmark. It is freely available on Hugging Face, offering a smaller, faster alternative for teams that need strong reasoning and tool use without the full 120B footprint.

    Source: Hugging Face
  • 20 Jan 2026

    India AI Summit 2026 — $1.1B fund, 7-Sutra governance

    The India AI Summit (India AI Impact Summit) in January 2026 set the tone for India's "AI for All" push. The government announced a $1.1 billion state-backed venture capital fund targeting AI and advanced manufacturing startups, with a goal to attract over $200 billion in AI infrastructure investment within two years. Compute will expand by 20,000 GPUs on top of the existing 38,000. India also released AI Governance Guidelines built around seven principles (the "7 Sutras"): Trust is the Foundation, People First, Innovation over Restraint, Fairness & Equity, Accountability, Understandable by Design, and Safety, Resilience & Sustainability. New institutions include the AI Governance Group, Technology & Policy Expert Committee, and AI Safety Institute. OpenAI will open offices in Bengaluru and Mumbai; Anthropic opened its first Indian office in Bengaluru. Eighty-eight countries signed the New Delhi AI Declaration, and India joined the Pax Silica group for AI infrastructure supply chain resilience.

    Source: PIB

Learn Python & AI with us

Stay ahead with live 1:1 classes on Python, ML, RAG, and modern AI. Book a free demo.

Book a Free Demo Session