Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models

1TMLR Group, Hong Kong Baptist University, 2Mila - Québec AI Institute 3Intel Labs 4Stanford University
(*Equal Contribution)

Abstract

We propose Landscape of Thoughts, a visualization tool that maps reasoning paths in LLMs using perplexity-based features and t-SNE projection, revealing patterns in success and failure cases.

Introduction

img description

Figure 1. Landscape of thoughts for visualizing the reasoning steps of LLMs.


Large language models (LLMs) excel in tasks like tool use and step-by-step reasoning, but their reasoning processes are not well understood. Reading their generated reasoning texts is time-consuming. Landscape of thoughts approach uses visualization plots to intuitively analyze and clarify the LLM reasoning process for users.

Landscape of Thoughts Examples

We show the examples of landscape of thoughts: visualization of AQuA using Llama-3.1-70B across different reasoning methods.

Chain-of-Thought (CoT)

Correct case
Wrong case
click on the progress bar to see the landscape of reasoning process in a specific step.
Hover over a node in the landscape below to see raw textual reasoning process.

Least-to-Most (L2M)

Correct case
Wrong case
click on the progress bar to see the landscape of reasoning process in a specific step.
Hover over a node in the landscape below to see raw textual reasoning process.

Tree-of-Thought (ToT)

Correct case
Wrong case
click on the progress bar to see the landscape of reasoning process in a specific step.
Hover over a node in the landscape below to see raw textual reasoning process.

Monte Carlo Tree Search (MCTS)

Correct case
Wrong case
click on the progress bar to see the landscape of reasoning process in a specific step.
Hover over a node in the landscape below to see raw textual reasoning process.

Landscape of Thoughts

The visualization of the reasoning process can be conducted by the following steps.

  • Characterizing the states: We characterize intermediate thoughts, measure their distances to possible choices in a unified feature space.
  • Visualization: We project the high-dimensional feature matrix into 2D space, and smooth the discrete points into a continuous density map.

Visualization Experiences

We qualitatively analyze the landscape of thoughts for different datasets and language models. Besides, we also introduce three quantitative metrics to help understand the behavior of the LLM at different reasoning steps.

  • Consistency: whether the LLM knows the answer before generating all thoughts
  • Uncertainty: how confident the LLM is about its predictions at intermediate steps
  • Perplexity: how confident the LLM is about its thoughts

LoT for different algorithms.

Figure 4. Comparing the landscapes and corresponding metrics of four reasoning algorithms (using Llama-3.1-70B on the AQuA dataset).

We can use Landscape of Thoughts to analyze the reasoning process of different algorithms.

  • Observation 1: Faster landscape convergence indicates higher reasoning accuracy.
  • Observation 2: Wrong paths converge quickly, correct paths progress slowly.
  • Observation 3: Correct paths show higher consistency between intermediate and final states.


LoT for different datasets.

Figure 5. Comparing the landscapes and corresponding metrics of different datasets (using Llama-3.1-70B with CoT).

We can use Landscape of Thoughts to analyze the reasoning process of different datasets.

  • Observation 4: Similar reasoning tasks exhibit similar landscapes.
  • Observation 5: Different reasoning tasks show different consistency, uncertainty, and perplexity.


LoT for different models.

Figure 6. Comparing the landscapes and corresponding metrics of different language models (with CoT on the AQuA dataset).


We can use Landscape of Thoughts to analyze the reasoning process of different models.

  • Observation 6: The landscape converges faster as the model size increase.
  • Observation 7: Larger models have higher consistency, lower uncertainty, and lower perplexity.

A Lightweight Verifier to Predictive Models

Based on the observations from visualization, the landscape of thoughts method has the potential to be adapted to a model to predict any property users observe, here we show examples of using it to predict the correctness of the reasoning paths in Figure 1.

Figure 7. Reasoning accuracy averaging across all dataset.

Interpolation end reference image.

Then, we investigate the inference-time scaling effect of the verifier by adjusting the number of reasoning paths.

Figure 8. Demonstration of the inference-time scaling effect of the verifier.

Interpolation end reference image.

Contact

Welcome to check our paper for more details of the work. If there is any question, please feel free to contact us.

If you find our paper and repo useful, please consider to cite:


      @misc{zhou2025landscapethoughtsvisualizingreasoning,
        title={Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models}, 
        author={Zhanke Zhou and Zhaocheng Zhu and Xuan Li and Mikhail Galkin and Xiao Feng and Sanmi Koyejo and Jian Tang and Bo Han},
        year={2025},
        eprint={2503.22165},
        archivePrefix={arXiv},
        primaryClass={cs.LG}
  }