CrystalReasoner: Reasoning and RL for Property-Conditioned Crystal Structure Generation

1Tsinghua University, 2Radical AI, 3New York University
Manuscript under review

Generative modeling has emerged as a promising approach for crystal structure discovery. However, existing LLM-based generative models struggle with low-level atomic precision, while diffusion-based methods fall short in integrating high-level scientific knowledge. As a result, generated structures are often invalid, unstable, or do not possess desirable properties. To address this gap, we propose CrystalReasoner (CrysReas), an end-to-end LLM framework that generates crystal structures from natural language instructions through reasoning and alignment. CrysReas introduces physical priors as thinking tokens, which include crystallographic symmetry, local coordination environments and predicted physical properties before generating atomic coordinates. This bridges the gap between natural language and 3D structures. CrysReas then employs reinforcement learning (RL) with a multi-objective, dense reward function to align generation with physical validity, chemical consistency, and thermodynamic stability. For property-conditioned tasks, we design task-specific reward functions and train specialized models for discrete constraints (e.g., space group) and continuous properties (e.g., elasticity, thermal expansion). Empirical results demonstrate that compared to prior works and baselines without thinking traces or RL, CrysReas obtains better performance on diverse metrics, triples S.U.N. ratio, and achieves better performance for property conditioned generation. CrysReas also exhibits adaptive reasoning, increasing reasoning lengths as the number of atoms increases. Our work demonstrates the potential of leveraging thinking traces and RL for generating valid, stable, and property-conditioned crystal structures.

Method Overview

We propose CrystalReasoner, an end-to-end framework that converts high-level textual instructions into high-fidelity low-level crystal structures through reasoning and alignment.

CrystalReasoner pipeline overview

Overview of the CrystalReasoner pipeline. An LLM is finetuned to first generate thinking traces in an abstract-to-concrete manner before outputting atomic coordinates. A multi-objective dense reward is used for RL (GRPO) alignment. The model can be used for formula conditioned generation generation, and can be further specialized with property-specific reward for property conditioned generation.

Physical Priors as Thinking Tokens

CrysReas is finetuned to generate physical priors as thinking traces before outputting atomic coordinates, following an abstract-to-concrete progression through reasoning about crystallographic symmetry, local coordination environments, and predicted properties (e.g., structure volume, formation energy). By introducing symbolic representations of the 3D structure through text, LLMs can first reason about 3D structure before generating the structure itself, making structure generation more tractable.

Thinking trace prompt format

RL Optimization on Validity and Stability

To improve precision of the generated atom locations, we apply RL with a carefully designed multi-objective dense reward function covering physical validity, chemical validity, and thermodynamic stability, guiding generation toward valid, low-energy configurations.

Energy above hull DFT validation

Property-Conditioned Generation

To enable property conditioned generation, CrysReas employs RL with property-specific reward, supporting optimization with respect to both discrete constraints (e.g., space group) and continuous properties (e.g., elasticity, thermal expansion) calculated using surrogate MLIPs. By combining stability rewards with property-specific objectives, CrysReas can be specialized for diverse material design scenarios without architectural modifications

Property-conditioned generation results

Experimental Results

Validity and consistency metrics

Comparison with Prior Methods

Comparison of our model CrysReas to our implementations of prior works including PLAID++ Wyckoff Base and CrystalTextLLM. Our model achieves the best overall performance.

Validity and consistency metrics

Ablation of Thinking and RL

Performance comparison of model variants: CrysReas-Base (SFT baseline), CrysReas-Thinking (SFT + thinking traces), CrysReas-RL (SFT + RL), and full CrysReas. Thinking traces improve instruction following and validity; RL boosts uniqueness and stability; the full model achieves the best overall performance.

Validity and consistency metrics

Thinking improves validity and consistency

Across atom-count and space-group complexity, thinking traces improve structural validity, composition consistency, and symmetry following.

Thinking trace analysis

Reasoning adapts to structure complexity

Trace length grows with the number of atoms, and trace-segment ablations show the importance of hierarchical physical reasoning.

Energy above hull DFT validation

RL improves energy above hull

DFT validation compares generated structures across model variants and shows lower energy distributions after RL alignment.

Property-conditioned generation results

Specialists follow target properties

Task-specific RL improves target adherence for space group, elasticity, and thermal expansion while exposing useful validity trade-offs.

Generated Structures

We include qualitative examples showing generated structures from different space groups, alongside quantitative checks that compare predicted thinking-trace properties with realized structure properties.

Generated crystal structure example for Fm-3m Generated crystal structure example for Fd-3m Generated crystal structure example for P3m1
Prediction error comparison for thinking trace properties

Comparison between predicted properties (site, structure volume, bond length) in thinking traces and actual properties of generated structures across different space groups.

Citation

The manuscript is currently under review; citation metadata will be updated after a public version is available.

@misc{wu2026crystalreasoner,
  title={CrystalReasoner: Reasoning and RL for Property-Conditioned Crystal Structure Generation},
  author={Yuyang Wu and Stefano Falletta and Delia McGrath and Sherry Yang},
  year={2026},
  note={Manuscript under review}
}