Formalizing Materials R&Dmodular reference
Home/Materials reference/R&D as a system
Materials reference · Part 7 of 9

R&D as a system

The formulation-development loop as control/active-learning; DoE and mixture designs; tacit/failure knowledge; informatics, HTE, self-driving labs and surrogate models.

~16 min read P10P12RD4

Ordinary materials R&D is already a control problem: an agent perturbs a plant it cannot fully observe, reads noisy sensors, updates a model, and chooses the next action. Once you see the formulation lab this way, an “AI-driven autonomous paint laboratory” stops being exotic and becomes a familiar object — a closed loop estimating a hidden plant, the process→structure→property map. This module assembles that loop piece by piece, then states the two bets layered on top: capitalize undocumented know-how as a structured, verifiable object (P10), and quantify the return on the next experiment (P12).

The formulation-development loop

The repeating cycle of development is: set a goal, form a hypothesis, formulate (choose a recipe), process (mix, apply, cure), characterize (measure properties), decide, iterate. Stripped of vocabulary, this is one turn of a feedback controller — or of a reinforcement-learning loop — wrapped around a plant you can only poke and probe.

Intuition. Act = mix and apply a paint (an action in formulation space). Observe = measure properties (a noisy, partial sensor reading). Update = revise your model of input→output. Decide = pick the next experiment. The “plant” is film-formation chemistry and physics — high-dimensional, nonlinear, only partially observable; you never see the molecular state, only proxies (gloss, viscosity, a spectrum). The loop estimates a hidden state from indirect measurements.

That hidden state is not a metaphor. The thing standing between what you control and what you measure is microstructure, and the loop’s whole job is to infer and steer it — the materials face of mediation through a hidden intermediate (P3) and of many-recipes-to-one-outcome degeneracy (P4). The figure below names the four moves and the plant they circle.

Actformulate + processObservemeasure (noisy, partial)Decidechoose next experimentUpdatefit / update modelhidden plantthe PSP mapa closed control / active-learning loop estimating a hidden state
R&D as a closed loop. Act → observe → update → decide, around a hidden plant (the PSP map) — control / active learning.

Three facts make this loop unusual, and they set everything downstream. Latency per turn is large — hours to days, dominated by drying and curing. The action space is enormous. Each observation is expensive in materials and instrument time. So the value-bearing decision is which experiment to run next, not the running itself; nearly all cost and risk concentrate in the number of loop iterations. Automation and a better decision policy both attack the same quantity — iterations-to-target.

Pitfall. Treating the loop as open — run one big batch, analyze once — throws away the information early results give about where to sample next. Open-loop is acceptable only when feedback is impossible; here it is merely slow.

In the synthesis. The draft’s proposed system is exactly this loop with two substitutions: act + observe are handed to an automated platform, and update + decide are shared between a human and a “tacit-knowledge LLM.” Nothing about the control structure is new; what is new is who runs each box.

Designing the questions: small vs. large loop, and DoE

Not all iteration is the same kind. Optimizing inside a fixed problem is one activity; revising the problem definition itself is another, and the second dominates the first.

The small loop is gradient descent or Bayesian optimization within a fixed objective and search space: “given these five raw materials and this target, what blend is best?” The large loop edits the loss function, the constraints, or the feature set: “is this the right resin chemistry? is gloss what the customer wants, or perceived depth? should I measure a spectrum instead of a single number?” In RL terms the small loop improves the policy; the large loop edits the reward and the state representation. Most breakthroughs — and most expensive mistakes — happen in the large loop, because it sets the frame the small loop then exploits.

Key. A perfectly tuned small loop converges confidently to a useless answer if the problem was mis-posed. Knowing when to re-pose the question is the tacit expertise that no inner-loop optimizer supplies.

Design of Experiments (DoE) is the discipline of choosing which experiments extract the most information per run — the statistics analogue of choosing inputs to make a system identifiable. A/B testing is one-factor DoE; factorial designs vary several factors at once and so reveal interactions (one factor’s effect depending on another’s level). Fractional factorials sample a clever subset; response-surface methodology (central-composite, Box–Behnken) fits a smooth, often quadratic model to locate an optimum. The one-variable-at-a-time habit misses interactions and gets stuck off-optimum.

In the synthesis. DoE is the static counterpart of the loop’s dynamic sampling — it designs an informative initial batch probing the input space before the closed loop takes over. The apparatus plus its acquisition policy runs the small loop; the human-and-LLM conversation that re-sets goal, input space, property space, and target is the large loop.

Mixture designs: the geometry of recipes

Formulations carry a constraint ordinary DoE ignores: the variables are fractions of a whole that sum to one. With \(k\) components the feasible region is the simplex \[ \Delta^{k-1} = \Big\{\, x \in \mathbb{R}^k : x_i \ge 0,\ \textstyle\sum_{i=1}^{k} x_i = 1 \,\Big\}, \] a triangle for three components, a tetrahedron for four — exactly the constraint on a probability or softmax vector. You cannot move one component without moving another, so the inputs are not independent and the factorial cube simply does not fit.

Example (Scheffé). Because of the sum-to-one constraint, an ordinary polynomial is collinear and unfittable (there is no free intercept). Scheffé reformulated the response model to live natively on the simplex — cross-terms capture blending, with no standalone constant. Layouts such as simplex-lattice and simplex-centroid place points on \(\Delta^{k-1}\); a real constraint like “pigment between 5 % and 15 %” carves out a smaller polytope. When process knobs (shear, temperature) accompany the fractions, you get a combined mixture-process design.

Pitfall. Optimizing in unconstrained coordinates and renormalizing afterward distorts the objective and misattributes blending effects. The simplex constraint must live inside the model and inside the search — any surrogate or acquisition function that ignores it will recommend impossible paints. This is the first place domain geometry must be baked into the ML, and it recurs at every later stage.

This native geometry is the materials instance of the symmetric-algebra mixture models and the \(\binom{n}{k}\) interaction budget developed in rings & modules (RD5): the polynomial functions on a mixture space are precisely those compatible with the simplex.

Capturing what the recipe leaves out

The recipe — the list of ingredients and amounts — is the easy, explicit part. The policy that makes a recipe actually succeed is the hard, implicit part, and it is rarely written anywhere.

Intuition. Tacit knowledge is an expert policy that was never written down — a trained controller whose weights live only in an operator’s intuition, the undocumented production tricks that are in nobody’s README. Two labs with identical formulas and divergent results is the standard proof that the decisive knowledge lives in unwritten process and equipment policy, not in the ingredient list.

It comes in three unrecorded flavors, and naming them tells you what a knowledge system must capture.

  • Process / order-of-operationswhy add the dispersant before the pigment; why this shear for this long. Outcomes depend on history and kinetics: a non-commutative, path-dependent process (P1), so the sequence carries information the ingredient list cannot.
  • Equipment / transferwhy the same recipe behaves differently on different equipment (different shear, thermal profile, calibration). This is a domain-shift / sim-to-real gap, and it is P7: knowledge that does not survive a change of machine, scale, or site is knowledge you do not really have.
  • Failure / negativewhy to avoid this condition: the regions that crack, blush, gel, or settle. Almost never logged, yet these are exactly the constraints that bound the safe operating region.

This is the highest-value, least-captured layer: learned by apprenticeship, lost on retirement, invisible to any system trained only on explicit recipes and final measurements. It is precisely the gap the draft targets (P10) — and whether tacit knowledge can be faithfully externalized at all is the load-bearing assumption of the entire program.

Knowledge as a typed, verifiable object

If undocumented policy is the asset, you need a type for it — a format that turns an answer into a checkable, bounded, reusable unit. The draft uses two nested formats, both instances of “knowledge as a structured, composable, verifiable object” (P10).

The Abstract Card is Nippon Paint’s standardized experiment-report schema (in use since 1974; roughly 10,000 produced per year, over 250,000 cumulative). Read it as a typed, schema-enforced experiment log — the difference between free-text notes and emitting one well-formed structured record per run. Because every card shares fields, the corpus is parseable: queryable, aggregatable, machine-readable — a dataset, not a diary. Its six elements encode one complete hypothesis-test: Q objective; P hypothesis/plan (the reason for the chosen method); A study method; D judgment criteria and method (how success is decided — fixed in advance); O result; Q–O consideration (did the hypothesis hold, and why). Field D is the quiet hero: pinning the metric before the result is what makes results across cards comparable.

Pitfall. Schema compliance is not semantic quality. A card can be perfectly typed yet record a vague hypothesis or an ill-chosen metric. Structure enables verification; it does not guarantee correctness.

The QRA+PSE format is the unit distilled out of many cards — a typed knowledge record with citations and a validity contract.

Definition (QRA+PSE unit). A knowledge unit carries Q Question; R Rationale (grounds); A Answer/Action (the recommended formulation, process, or next experiment); P Physics (laws, governing factors); S Scope (materials systems, equipment, viscosity/temperature ranges); E Evidence (source reports, experiment IDs, past failures); C Counter-example (conditions under which it did not hold). Q/R/A are the answer; P/S/E/C are what make it trustworthy and reusable.

Intuition. A QRA+PSE unit is a well-documented function: not just the return value, but its rationale, its preconditions and types (Scope), its test evidence (Evidence), and its known-failure cases (Counter-example). Physics gives a mechanism you can check and sensibly extrapolate; Scope states the validity region — the explicit encoding of where this transfers (P7); Counter-example records the normally-lost failure knowledge, bounding the safe region from outside.

Pitfall. The hard fields are exactly the tacit ones, S and C. A system can quietly emit confident Q/R/A with empty or wrong S/C — and that is more dangerous than no answer, because it looks bounded. An answer without scope and counter-examples is a liability that appears general and is not.

Bridge. “Scope” is not just a text field — it is a region of validity, and whether two scoped claims cohere where their regions overlap is a gluing question. The math reference makes this exact: claims-with-scope are sheaf sections and the obstruction to fitting them together is measured by \(H^1\) (see sheaves and the gluing obstruction, P6/RD6). The whole olog-and-migration apparatus for making knowledge a verifiable category lives there too: knowledge as a verifiable object.

In the synthesis. The chain is a pipeline of types. The 250,000+ Abstract Cards are the grounding corpus; QRA+PSE is the output type the LLM emits and humans audit (its E field points back to the cards); together they are the explicit form of the previously-tacit policy — and the mechanism by which the draft capitalizes knowledge, especially the failure knowledge in C (P10).

A QRA+PSE unit is also a knowledge object you can transport between labs and product families. The principled way to move populated, structured knowledge across schemas — rather than ad-hoc copy-paste — is functorial data migration over ologs: knowledge represented as a category, transferred by a functor with adjoint guarantees, so the move is structure-preserving by construction. That is transfer-as-functoriality (P7/RD4), and its full account is in the math reference.

The informatics stack: PSP, databases, HTE, and the self-driving lab

Materials informatics is the ML playbook — featurize, fit, predict, optimize — organized by the PSP linkage. Read the whole field as one pipeline: \[ \text{process} \;\longrightarrow\; \underbrace{\text{structure}}_{\text{latent}} \;\longrightarrow\; \text{property}. \] Forward = prediction (process to property); inverse (desired property → required structure → required process) = design. PSP is the materials statement of the hidden-state idea: structure is the latent variable between what you control and what you observe (P3/P4). The Materials Genome Initiative (MGI, 2011) pushed integrating computation, experiment, and data infrastructure; ICME links models across length and time scales along the same chain. The common thread: treat data as infrastructure and the PSP linkage as the model to learn.

That makes the two halves of the chain into two labeled training sets — a structure–property database for structure → property, a process–structure database for process → structure. Compose them for the full process → property map; invert for design. These linkages are learnable and transferable only if each row carries enough metadata (composition, conditions, measurement method) — meaning a good database row already contains the Scope field of a QRA+PSE unit, and consistent rows require the Abstract Card’s pre-declared metric D. The draft’s claim is that the extracted Abstract-Card corpus is exactly such a database.

Example (high-throughput experimentation). HTE applies throughput and parallelism to data generation — the lab analogue of scaling out a training-data pipeline. Its payoff is twofold: more rows (P12) and cleaner rows, because a robot executes the same protocol identically, shrinking measurement noise and removing operator-to-operator variance. Lower variance raises the ceiling on achievable model accuracy, not just the row count. The PIPETTE MASTER-type platform reaches roughly 1,000 samples per day — the throughput figure behind the data-ROI argument.

An autonomous (self-driving) laboratory closes the entire loop automatically, and it is precisely robotics plus active learning — a perception→action→learning loop. Its pieces map one-to-one onto the four boxes: actuation (a liquid-handling or synthesis robot) and sensing (spectroscopy) run act + observe; a surrogate model with uncertainty and a decision policy run update + decide. The hard parts are robotics’ hard parts: reliable manipulation of messy media, sensor calibration and drift, error handling when a sample fails, and keeping the model honest as conditions change.

Pitfall. Autonomy amplifies whatever policy and constraints you give it. An acquisition function that ignores the simplex constraint, or a target that encodes the wrong objective (a large-loop error), now generates wrong experiments at 1,000 per day. Speed without correct framing scales mistakes — the small loop runs fast precisely because the large loop was set right.

The decision layer: surrogates, active learning, and the economics of data

The intelligence of a self-driving lab is concentrated in two coupled components: a fast learned stand-in for the experiment, and a policy that decides where to spend the next one.

A surrogate is a learned forward model plus its inverse — the forward/inverse-dynamics pair from model-based control. The forward surrogate maps parameters → spectrum/property: a supervised regressor orders of magnitude cheaper than the real experiment, so the decision layer can evaluate millions of virtual candidates. The inverse problem — specify the output, recover the recipe — is inverse design, and it is genuinely harder: it is typically ill-posed (many recipes give the same property; not every requested property is achievable). Practical inverse design therefore uses the forward surrogate inside an optimizer, or trains a constrained/generative inverse model — and it must respect the simplex so its proposals are realizable.

Intuition. Active learning / Bayesian optimization is acquisition-function-driven sampling under uncertainty. A surrogate — often a Gaussian process — returns a prediction and a calibrated uncertainty everywhere; an acquisition function (expected improvement, UCB) scores each candidate by expected gain; you run the top one, add the result, and refit. Bayesian optimization is active learning aimed at an optimum, purpose-built for the expensive-black-box, few-evaluations regime of experimental science.

Real targets are multi-objective — maximize gloss and hardness and dry speed while minimizing cost — so the policy uses Pareto-aware acquisition (e.g. expected hypervolume improvement) to trace the Pareto front of non-dominated formulations (P5, the interfering-constraint problem). Batch variants pick several diverse experiments at once to feed a parallel HTE platform. This decision layer is what makes a self-driving lab smart rather than merely fast.

Pitfall. Acquisition is only as good as the surrogate’s uncertainty. A confidently-wrong — miscalibrated or extrapolating — model steers the loop into barren regions and proposes recipes that look optimal and are physically impossible, especially near the simplex boundary or outside the data. Calibrated uncertainty, not raw accuracy, is what makes the policy work, and an inverse recipe is never trustworthy without a confirming real experiment.

All of this rests on a quantitative question: how much, and how clean, must the data be — and is the next experiment worth its cost? Empirically, more and cleaner data yields better models, often predictably — a scaling law (P12). But unlike scraping text, every materials data point has a real, often large marginal cost: materials, instrument time, days of latency. Two coupled levers set accuracy — quantity (reduces variance, extends coverage) and consistency (lower noise raises the ceiling) — and automation improves both. Crucially, structured data has compounding value: once captured in scoped form (Abstract Card → QRA+PSE), it keeps paying off across future projects instead of decaying after one use.

In the synthesis. The draft’s two distinctive bets live here, and they are the two emphasized problems. P12 — quantify data ROI (cost per experiment vs. expected reduction in iterations-to-target), so running the robot at ~1,000 samples/day is justified by modeled return, not assumed. P10 — capitalize tacit and failure knowledge so each experiment yields a reusable, verifiable asset whose value does not decay. Active learning supplies the informative data; the structured knowledge formats supply the reusable form; quantity without direction (active learning) or structure (P10) merely burns budget.

Recap

  • Materials R&D is a closed-loop control / active-learning system: act → observe → update → decide around a hidden plant whose latent state is microstructure (P3/P4). Cost concentrates in iterations-to-target, so which experiment next is the value-bearing decision.
  • The small loop optimizes within a fixed problem; the large loop re-poses the problem — where breakthroughs and expensive mistakes live. DoE and mixture designs (the simplex \(\Delta^{k-1}\), Scheffé) shape the questions; the simplex constraint must live inside both the model and the search (RD5).
  • The recipe is the easy part; the tacit, process, equipment-transfer, and failure policy is the high-value, least-captured part (P7) — exactly the gap the draft targets.
  • Abstract Cards (a typed log) and QRA+PSE (a knowledge unit with a validity contract: Physics, Scope, Evidence, Counter-example) make knowledge a structured, verifiable, composable object (P10); their hard fields S and C are the tacit ones, and scoped claims are sheaf sections transported by functorial migration (P7/RD4).
  • The informatics stack — MGI/ICME, PSP databases, HTE, the self-driving lab — learns the forward PSP map and inverts it for design; the decision layer (forward/inverse surrogates + Bayesian optimization, Pareto-aware for multi-objective targets, P5) makes the loop smart, but only with calibrated uncertainty.
  • The economics of data (P12) ties it together: data has a real marginal cost but compounding value when captured in scoped form; ROI comes from informative data captured in reusable form, never from undirected volume.

Part of a four-document set: the ARiSE draft (problem + AI solution), this modular Materials-science reference, the companion math reference, and the synthesis. Generated from modular Markdown with a custom static-site builder.

Mathematics is typeset with MathJax (loaded once from a CDN with Subresource Integrity; needs network on first view). Diagrams are inline SVG and follow the light/dark theme. Keyboard: / search · [ ] prev/next · t theme.