2022

Latent Lab header

Throughout history, new ways of seeing information have expanded what humans can think. William Playfair turned trade ledgers into line charts. Watson and Crick converted Rosalind Franklin's 2D photo into a 3D model of DNA. Alan Kay and David Smith turned the command line into a graphical desktop. Each of these representations made complex systems legible in ways their predecessors could not.

History of information visualization: Playfair's line charts (1786), Watson and Crick's 3D DNA model (1953), and Kay and Smith's desktop metaphor (1970)

We have no equivalent for internet-scale unstructured data. Millions of academic papers are published each year. Billions of social media posts appear each day. Generative AI is accelerating this further as the cost of producing content trends toward zero. The dominant interfaces for working with all of this, search and chat, each have fundamental limitations. Search gives you a ranked list: no overview, no connections, and you never go past page one. Chat provides synthesis but is opaque, collapses breadth, and shifts agency from the user to the system. Neither lets you see the shape of your information.



The shift from search-based interfaces to AI-powered chat interfaces in the Common AI Era


Conceptual diagram showing how embeddings transform documents into spatial clusters, analogous to a Fourier transform converting time-domain signals to frequency-domain

Latent Lab is a platform I built to explore whether embedding-based spatial interfaces can fill this gap and serve as a "spectrogram for text." It takes large sets of unstructured information (documents, images, media) and, with minimal effort on the user's end, creates interactive 2D and 3D visualizations where meaning is encoded in position. Items that are semantically similar appear near each other. Clusters, outliers, and relationships become visible at a glance.

The system supports automated topic labeling at multiple scales, with filtering to isolate specific research areas. Users can explore topics by zooming into dense clusters or filtering by keyword, revealing sub-themes and connections that would be invisible in a traditional list. The topographic contour lines encode density, making it immediately clear where activity is concentrated.


Latent Lab filtering interface showing interactive topic exploration and cluster isolation
Latent Lab timeline evolution feature showing how a dataset changes over time

Temporal sliders reveal how a dataset evolves over time. By scrubbing through a timeline, users can watch topics emerge, grow, and fade, turning a static snapshot into a living history. This feature has proven particularly useful for understanding how research fields develop and shift focus across years.

Semantic axes allow users to define their own organizational dimensions. Instead of relying solely on the algorithm's 2D projection, users can specify meaningful axes, such as "climate change" on the Y-axis and time on the X-axis, to see how their data distributes along concepts they care about. This transforms the visualization from a fixed map into a flexible analytical tool.


Latent Lab semantic axes feature allowing user-defined organization of data along custom dimensions

Latent Lab Visual RAG highlighting which documents inform a generated response

Visual RAG lets users see exactly which items inform a generated response. Rather than a black-box answer, the visualization highlights the source documents spatially, so users can evaluate coverage, identify gaps, and understand the provenance of synthesized information. This bridges the gap between AI-generated summaries and the underlying data.

Dataset overlays enable comparative analysis by layering multiple collections in the same semantic space. Here, MIT Media Lab projects and Microsoft Research projects are overlaid, revealing where the two institutions share focus and where they diverge. The ability to see one organization's work in the context of another's has been valuable for competitive intelligence and strategic planning.

Microsoft Research projects visualized in Latent Lab with MIT Media Lab projects overlaid for comparative analysis
MIT Media Lab projects with Microsoft Research overlay, showing shared research areas and divergent focus

In a study with 94 researchers comparing Latent Lab to a conventional list-based interface, we found significant improvements in insight extraction, mental support, and engagement during open-ended exploration. The tool has also been adopted by sponsors including Dell, HP, Deloitte, and Kenvue for applications ranging from patent analysis to competitive intelligence. The patent landscape shown here illustrates how the platform scales to large, domain-specific datasets like the USPTO corpus.

Latent Lab system architecture: end-to-end pipeline from raw documents through embeddings and dimensionality reduction to interactive visualization, with Filescape and Algorithmic Mirror extensions

Latent Lab processes raw files end-to-end, from drag-and-drop to interactive semantic visualization. The pipeline handles parsing and chunking, embedding generation, dimensionality reduction, storage, interface rendering, and interaction. This end-to-end capability remains unique among tools in this space. Latent Lab serves as the technical foundation for two derivative projects: Filescape, which applies the platform to personal information management as an alternative to the desktop metaphor, and Algorithmic Mirror, which uses it to help adolescents see and reflect on the content they have consumed across social media platforms.

Gallery

Publications

Links

Technology & Tools

AIVisualizationResearchHCI