Name: Lens
Rating: 38.4 (159 reviews)
Author: Microsoft

Lens by Microsoft | AI Market Cap

HF PapersGoogleresearch1mo ago

ICA Lens: Interpreting Language Models Without Training Another Dictionary

Finding interpretable directions in language-model representations is critical for understanding and controlling model behavior. Sparse autoencoders (SAEs) have become the standard tool for this purpose, but using them as the default first lens often requires training, storing, and evaluating large overcomplete dictionaries. This bottleneck limits rapid exploration and raises a fundamental question: how much interpretable structure is already visible from activation geometry before training another neural dictionary? Our intuition is simple: many interpretable directions are selective on tokens, and these directions should look less Gaussian than random directions. We therefore revisit independent component analysis (ICA), a classical method for finding non-Gaussian directions, as a compact lens for language-model interpretability. We find that ICA has been underestimated for LLM interpretability, because prior uses often relied on off-the-shelf ICA implementations that are brittle on LLM activations and lacked systematic tools for inspecting and evaluating the recovered directions. To bridge these gaps, we introduce ICALens, the first practical workflow for stable, efficient, and auditable ICA analysis of LLM representations. It combines an optimized GPU-parallel FastICA pipeline with LLM-specific stability recipes and better fitting diagnostics, enabling efficient and reliable layer-wise analysis. Across GPT-2 Small, Gemma 2 2B, and Qwen 3.5 2B Base, ICALens efficiently recovers compact, human-interpretable directions without per-layer gradient-based dictionary training. On SAEBench, ICA is competitive with public SAEs in sparse probing and outperforms them in targeted probe perturbation under small-to-medium budgets. These results suggest that ICA should not be viewed as a weak baseline, but as an efficient and complementary first lens for exploring language-model representations.

View Source

#huggingface#daily-papers

HF PapersOpenAIresearch1mo ago

Lens: Rethinking Training Efficiency for Foundational Text-to-Image Models

We introduce Lens, a 3.8B-parameter T2I model that achieves performance competitive with, and in several cases surpassing, state-of-the-art models with more than 6B parameters across various benchmarks, while requiring significantly less training compute. For example, Lens requires only about 19.3% of the training compute used by Z-Image. The training efficiency of Lens stems from two key strategies beyond its compact model size. First, we maximize data information density per training batch by (i) training on Lens-800M, a dataset of 800M densely captioned image-text pairs whose captions are generated by GPT-4.1 and contain approximately 109 words on average, providing richer semantic supervision than conventional short captions, and (ii) constructing each batch from images with multiple resolutions and diverse aspect ratios, thereby enlarging the effective visual coverage of each optimization step. Second, we improve convergence speed through careful architectural choices, including adopting a semantic VAE that provides better latent representations and employing a strong language encoder that accelerates optimization while enabling multilingual generalization from English-only training data. After pre-training, we apply RL with taxonomy-driven prompts (Lens-RL-8K) and structured reward rubrics to suppress artifacts and improve visual quality, a reasoner module with training-free system prompt search to better align user requests with the model, and distillation-based acceleration for 4-step inference. Through efficient training and systematic optimization, Lens generalizes to arbitrary aspect ratios from 1:2 to 2:1 and resolutions up to 1440^2, and supports prompts in several commonly used languages. Thanks to its compact size, Lens generates a 1024^2 image in 3.15 seconds on a single NVIDIA H100 GPU, while its distilled turbo version performs 4-step generation in 0.84 seconds.

View Source

#huggingface#daily-papers

Lens

Similar Models

Thrilled to announce by popular demand MAI-Code-1-Flash is now generally available for GitHub Copilot Business and GitHub Copilot Enterprise - fast, efficient, and custom designed to help you build mo

Social & Blog Posts8

Research Papers10

Let’s go a bit deeper into Frontier Tuning launched at Build and see a live demo! Frontier Tuning is how we enable you to develop custom AI by building a reinforcement learning environment (RLE) to hi

What does it take to build coding models that meet developers where they work? Go behind the scenes with Microsoft AI to explore how we build and optimize code. From training and evaluation to perform

Object-Centric Residual RL for Zero-Shot Sim-to-Real VLA Enhancement

FastContext: Training Efficient Repository Explorer for Coding Agents

Say how you feel. Get a poem that meets you there. Ode connects you with a poem for your moment. Guided by William Sieghart. Powered by Microsoft AI models. Try it out here: https://t.co/egu88NTf9C ht

Thrilled to announce by popular demand MAI-Code-1-Flash is now generally available for GitHub Copilot Business and GitHub Copilot Enterprise - fast, efficient, and custom designed to help you build mo

We shipped a new coding model built for your everyday dev work. MAI-Code-1-Flash is fast, token-efficient, and trained inside real GitHub Copilot environments. It plans, builds, runs, and tests. All f

MAI-Image-2.5 ranked #2 for text-to-image and #3 for image editing on @ArtificialAnlys - showing strong performance across both generation and precise image edits. From rainy-window blur to a clear, u

What happens when speech, transcription, and coding models work together? This prototype demo, built using a VS Code fork, showcases how MAI-Transcribe, MAI-Voice, and MAI-Code-1-Flash can work togeth

Behind every model is a team dedicated to solving difficult challenges, exploring new ideas, and continuously pushing technology forward. Meet some of the people behind Microsoft AI. Watch the full vi

What does it take to build coding models that meet developers where they work? Go behind the scenes with Microsoft AI to explore how we build and optimize code. From training and evaluation to perform

Let’s go a bit deeper into Frontier Tuning launched at Build and see a live demo! Frontier Tuning is how we enable you to develop custom AI by building a reinforcement learning environment (RLE) to hi

Object-Centric Residual RL for Zero-Shot Sim-to-Real VLA Enhancement

FastContext: Training Efficient Repository Explorer for Coding Agents

ICA Lens: Interpreting Language Models Without Training Another Dictionary

POISE: Position-Aware Undetectable Skill Injection on LLM Agents

Your UnEmbedding Matrix is Secretly a Feature Lens for Text Embeddings

A Cookbook of 3D Vision: Data, Learning Paradigms, and Application

Toward Native Multimodal Modeling: A Roadmap

SpaceDG: Benchmarking Spatial Intelligence under Visual Degradation

Lens: Rethinking Training Efficiency for Foundational Text-to-Image Models

Equilibrium Reasoners: Learning Attractors Enables Scalable Reasoning