Causal & Geometric Reasoning

We build autonomous agents that reason over distributed, high-stakes data (Bio & Finance) without ever centralizing it. We replace trust with cryptography and replace correlation with causal/structural reasoning.

If you are interested in any of this project, please don’t feel free to contact me. :)

Causal & Geometric Reasoning

We move from “Probabilistic Token Prediction” to “Structural Reasoning.” Our agents generate solutions constrained by geometric symmetry (Physics) and causal graphs (Logic).

The Structural Inference Engine (SIE)

The Moonshot: Standard LLMs are probabilistic—they ‘guess’ the next token. We are building the Deterministic Structural Engine—an LLM that generates data (molecules, contracts, causal graphs) that is mathematically guaranteed to obey the strict schema and semantic constraints of the real world.

The Hard Problem: In high-stakes domains, “mostly correct” is a failure. A synthetic patient record cannot have disjoint medical codes (e.g., “Viral Pneumonia” without “Infectious Disease”). A financial contract cannot violate regulatory schemas. Current methods (RAG or Fine-tuning) offer “soft” guidance but cannot enforce hard logical consistency.

The Blueprint Layer(Factor Graph Priors): Instead of sampling tokens directly, the engine first samples a target subspace (a configuration of attributes) using a Probabilistic Factor Graph.
The Constraint Compiler (The Hard Mask): Compile the domain’s rigid rules into Deterministic Finite Automata (DFA). During inference, these automata project a “Hard Mask” onto the LLM’s vocabulary. Any token that would violate the schema is mathematically assigned zero probability.
The Semantic Verifier (The Loop): A token-level verifier runs in parallel with the generation. It checks candidate facts (Subject-Relation-Object) against a retrieved Knowledge Graph in real-time.

The Inter-Graph Reasoning Protocol (IGRP)

The Moonshot: The “Internet of Knowledge”—A protocol where agents can “walk” across millions of decentralized, private Knowledge Graphs (KGs) to discover hidden causal chains, without ever moving the data to a central server.

The Hard Problem: Knowledge is fragmented. Hospital A has a patient graph. University B has a gene-regulatory graph. Pharma C has a chemical-structure graph. Currently, to discover a cure, you have to centralize all three (Impossible due to Privacy/IP). Standard Agents cannot “reason” across these silos; they hallucinate connections that don’t exist.

The “Bridge” Nodes (Entity Alignment) Privacy-Preserving Entity Alignment (hashed embeddings) to identify “Bridge Nodes” (e.g., Gene TP53 exists in both the Hospital KG and the University KG). The system creates a virtual “Hyper-Edge” connecting these disparate graphs, creating a unified logical surface without data sharing.
The “Walker” Agent (Neural Pathfinding) Instead of RAG (retrieving text), the agent executes a Multi-Hop Reasoning Walk.
The Compositional Proof The Agent bundles these distributed hops into a Cryptographic Trace. It presents the final answer (“Molecule Y cures Patient X”) along with the verified path of reasoning, which can be audited without revealing the full graphs.

Geometric Equivariant Agents: Physics-Aware Design

The Moonshot: Create an LLM that “thinks” in continuous 3D space, not just discrete text tokens. It should be mathematically incapable of generating a structure that violates the laws of physics.

The Hard Problem: Standard Transformers treat a yield curve like a string of text. They don’t understand that if you rotate a molecule 90 degrees, it’s still the same molecule (Invariance). This leads to massive data inefficiency and “hallucinated” matter.

SE(3) Equivariant Attention: You bake Euclidean symmetry (rotation/translation) directly into the attention mechanism. The agent calculates attention based on relative distances and angles, not absolute positions.
Manifold Optimization: Instead of optimizing in “pixel space,” the agent optimizes on the “Riemannian Manifold” of valid structures. It surfs the energy landscape rather than guessing coordinates.

The Self-Correcting Knowledge Graph (SCKG)

The Moonshot: Scientific literature is growing too fast (2 papers/minute). Much of it is contradictory (Paper A says X, Paper B says Not X). LLMs just hallucinate a middle ground.

Conflict Detection: An agent swarm ingests 10,000 arXiv/BioRxiv papers daily, extracting claims into a Triplet Graph.
Epistemic Conflict Resolution: When the graph detects a topology conflict (A contradicts B), it flags it.
Agent Verification: It spawns a “Judge Agent” to analyze the methodology of both papers and assign a confidence score, or propose a tie-breaking experiment.

Soon to publish.

Zehua Cheng

Causal & Geometric Reasoning

The Structural Inference Engine (SIE)

The Inter-Graph Reasoning Protocol (IGRP)

Geometric Equivariant Agents: Physics-Aware Design

The Self-Correcting Knowledge Graph (SCKG)