注意力机制

大语言模型的注意力机制是一种让模型在处理文本时，能够聚焦于关键信息的技术。它模仿了人类在阅读时的注意力分配，帮助模型更好地理解上下文。

长上下文的问题

优化方案

DeepSeek发布了《native sparse attention: hardware-aligned and natively trainable sparse attention》。月之暗面发布了《moba: mixture of block attention for long-context llms》

Joel 的大唠嗑

Explorer

注意力机制

长上下文的问题

优化方案

Graph View

Table of Contents

Backlinks