Skip to content

Instantly share code, notes, and snippets.

@maycuatroi1
Last active February 6, 2026 07:01
Show Gist options
  • Select an option

  • Save maycuatroi1/4cc7d64eec68fa2e919559cfa49715d2 to your computer and use it in GitHub Desktop.

Select an option

Save maycuatroi1/4cc7d64eec68fa2e919559cfa49715d2 to your computer and use it in GitHub Desktop.
[DeepSeek OCR 2] Pseudo-code from paper concept
# Pseudo-code from paper concept
class DeepEncoderV2:
def forward(self, visual_tokens):
# visual_tokens: [B, m, d] - m tokens, d=896 dim
# Append learnable queries (same count as visual tokens)
queries = self.learnable_queries # [n, d] where n = m
combined = concat([visual_tokens, queries], dim=1) # [B, 2m, d]
# Mixed attention: visual=bidirectional, queries=causal
output = self.qwen2_encoder(combined, attention_mask=M)
# Only return query outputs (causal flow tokens)
return output[:, m:, :] # [B, n, d]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment