MLIP Machine Learning Potentials for Lithium Battery Electrolyte Simulations: From Atomistic Accuracy to Next-Generation Design

Noah Brooks Jan 12, 2026 183

This article provides a comprehensive guide for researchers and scientists on applying Machine Learning Interatomic Potentials (MLIPs) to simulate lithium battery electrolytes.

MLIP Machine Learning Potentials for Lithium Battery Electrolyte Simulations: From Atomistic Accuracy to Next-Generation Design

Abstract

This article provides a comprehensive guide for researchers and scientists on applying Machine Learning Interatomic Potentials (MLIPs) to simulate lithium battery electrolytes. We explore the foundational principles of MLIPs, detailing methodological frameworks for simulating liquid electrolytes and SEI formation, and address common computational challenges and optimization strategies. Finally, we validate MLIP performance against traditional methods like DFT and classical MD, highlighting their transformative potential for accelerating the discovery of high-performance, stable electrolytes in battery development.

What Are MLIPs and Why Are They Revolutionary for Electrolyte Modeling?

Within the research for next-generation lithium battery electrolytes, the core challenge lies in simulating complex, dynamic molecular interactions with both quantum-mechanical accuracy and computational feasibility for relevant time- and length-scales. This Application Note details how Machine Learning Interatomic Potentials (MLIPs) are breaking the traditional trade-off between Density Functional Theory (DFT) and Classical Force Fields (FFs), enabling unprecedented predictive simulations of electrolyte decomposition, solid-electrolyte interphase (SEI) formation, and ion transport mechanisms.

Quantitative Comparison of Methods

Table 1: Performance Metrics for Electrolyte Simulation Methods

Method	Typical Accuracy (Force RMSE) [eV/Å]	Typical Speed (atoms × steps / day)	System Size Limit (~atoms)	Time Scale Limit	Key Limitation for Electrolyte Research
DFT (e.g., PBE)	Reference (~0.0)	10² - 10³	10² - 10³	< 100 ps	Prohibitive cost for long dynamics; difficult for liquid/interface systems.
Classical FF (e.g., OPLS-AA)	0.1 - 1.0	10⁹ - 10¹¹	10⁵ - 10⁶	> µs	Poor transferability; inaccurate for bond breaking/forming (SEI growth).
MLIP (e.g., NequIP, MACE)	0.01 - 0.05	10⁷ - 10⁹	10³ - 10⁵	> 100 ns	Requires training data; initial DFT investment.

Table 2: Application to Li-ion Battery Electrolyte Phenomena

Simulation Target	DFT Feasibility	Classical FF Reliability	MLIP Advantage Demonstrated
Li⁺ Solvation Structure	Good for static clusters	Approximate, parameter-dependent	High-accuracy dynamics of Li⁺(EC)₄, Li⁺(PF₆)ₙ.
SEI Component Formation (e.g., Li₂O, LiF)	Only for small reaction prototypes	Fails at chemical reactions	Reactive dynamics showing reduction pathways of EC on anode.
Ion Transport (Diffusivity, Conductivity)	Not feasible	Approximate, requires fitting	Predictive computation of properties from first-principles accuracy.
Interface Stability	Limited to ideal slabs	Poor due to fixed charges	Full exploration of electrode-electrolyte interfacial reactions.

Experimental Protocols

Protocol 3.1: Generating a Training Dataset for an EC/DMC LiPF₆ MLIP

Objective: Create a robust DFT dataset encompassing configurations relevant to bulk electrolyte and initial decomposition reactions.

Initial Configuration: Build a simulation box with ~100-200 atoms (e.g., 5 LiPF₆, 20 EC, 20 DMC molecules) using Packmol.
DFT Molecular Dynamics (AIMD):
- Software: CP2K or VASP.
- Settings: PBE-D3 functional, 400-500 eV cutoff, Γ-point only for sampling. Use a NVT ensemble at 300 K with a Nosé-Hoover thermostat.
- Run: Perform a short 5-10 ps AIMD simulation. Save atomic positions, energies, and forces every 5-10 steps.
Active Learning / Dataset Augmentation:
- Train an initial MLIP on the AIMD data.
- Run exploratory MLIP-MD simulations at higher temperatures (500-1000 K) and on model interfaces (e.g., Li metal slab with electrolyte).
- Use uncertainty quantification (e.g., committee models, entropy). Select configurations with high uncertainty.
- Perform new DFT single-point calculations on these selected configurations and add them to the training set.
Validation Set: Randomly select 10-20% of frames from the dataset prior to training. Ensure they cover the entire configurational space.

Protocol 3.2: Benchmarking MLIP Performance for Transport Properties

Objective: Compute Li⁺ diffusivity and compare results from MLIP, FF, and experimental data.

System Preparation: Create a bulk electrolyte model (e.g., 1M LiPF₆ in EC:DMC) using ~2000 atoms with MLIP, FF (e.g., APPLE&P), and a smaller DFT-reference system.
Equilibration: Run NPT simulation (300 K, 1 bar) for 2 ns (MLIP/FF) or 50 ps (DFT) to achieve correct density.
Production Run: Switch to NVT ensemble. Run for > 50 ns for MLIP/FF, and as long as possible for DFT (5-10 ps). Save trajectories every 1 ps.
Analysis:
- Mean Squared Displacement (MSD): Calculate MSD for Li⁺ ions. MSD(τ) = ⟨|r(t+τ) - r(t)|²⟩.
- Diffusivity (D): Fit the linear region of the MSD plot (typically 2-10 ns): D = (1/(6N)) * d(Σ MSD)/dτ, where N is the number of Li⁺ ions.
- Conductivity (σ): Use the Nernst-Einstein relation: σ = (ρ * q² * D) / (k_B * T), where ρ is the number density of Li⁺, q is charge, k_B is Boltzmann constant, T is temperature.
Validation: Compare D and σ from MLIP and FF against experimental electrochemical impedance spectroscopy (EIS) data.

Protocol 3.3: Simulating SEI Precursor Reaction at a Li-metal Anode

Objective: Use MLIP-driven reactive MD to observe the initial reduction of ethylene carbonate (EC).

Interface Model: Construct a Li(100) slab (~6 layers) in contact with a liquid EC/DMC electrolyte (~500 atoms total).
Potential of Mean Force (PMF) with MLIP:
- Identify a reaction coordinate, e.g., the distance between a specific EC carbonyl carbon and a surface Li atom or the C=O bond length.
- Perform Umbrella Sampling using the MLIP as the engine. Use 20-30 windows along the coordinate.
- In each window, run a 20-50 ps constrained MD simulation.
- Use the Weighted Histogram Analysis Method (WHAM) to reconstruct the free energy profile (PMF).
Analysis: Identify the transition state barrier and reaction energy. Analyze the electron transfer by tracking Bader charges (if training data included charge information) or by examining the evolution of molecular geometries.

Diagrams

Diagram Title: MLIP Development and Validation Workflow for Electrolyte Research

Diagram Title: MLIPs Breaking the Accuracy-Speed Trade-off

The Scientist's Toolkit: Research Reagent Solutions

Table 3: Essential Computational Tools for MLIP Electrolyte Research

Item / Software	Category	Primary Function in Research	Key Consideration for Electrolytes
VASP / CP2K	DFT Engine	Generate reference training data (energies, forces).	CP2K often preferred for large, periodic liquid systems.
LAMMPS	MD Engine	Perform high-performance production MD using fitted MLIPs.	Supports major MLIP packages (e.g., `pair_style pace`).
GPUMD	MD Engine	Extremely fast NN/MLP-driven MD on GPUs.	Ideal for large-scale reactive simulations.
ASE (Atomic Simulation Environment)	Python Library	Manages atoms, interfaces calculators, and workflows.	Essential for dataset handling and preprocessing.
DeePMD-kit	MLIP Framework	Train and run Deep Potential models.	Good scalability; requires careful descriptor choice.
NequIP / MACE	MLIP Framework	Train equivariant graph neural network potentials.	High data efficiency and accuracy for complex interactions.
Packmol	Setup Tool	Create initial configurations of mixed molecules.	Crucial for building realistic solvated electrolyte boxes.
PLUMED	Analysis & Enhanced Sampling	Perform metadynamics, umbrella sampling for free energies.	Key for probing reaction barriers (e.g., EC reduction).

Within the broader thesis on Machine Learning Interatomic Potentials (MLIPs) for lithium battery electrolyte simulations, selecting the appropriate neural network architecture is paramount. This primer details three leading graph-based approaches—NequIP, MACE, and foundational Graph Neural Network Potentials (GNNPs)—contrasting their theoretical underpinnings and providing practical protocols for their application in simulating reactive and dynamic electrolyte systems.

Core Architectural Comparison

Table 1: Quantitative Comparison of Key GNN Architectures for Electrolyte Simulation

Feature	Classical GNNPs (e.g., SchNet, DimeNet++)	NequIP (2021)	MACE (2022-2023)
Core Principle	Message passing on atom-centered graphs.	E(3)-Equivariant convolutions using higher-order spherical harmonics.	Higher-order body-ordered equivariant messages with tensor products.
Symmetry Guarantee	Invariant (output only).	Equivariant to rotation & inversion.	Equivariant to rotation & inversion.
Body Order	Implicit, often limited.	Implicitly high via layers.	Explicitly high (e.g., 4-body).
Accuracy (Typical MAE)	~10-30 meV/atom (Li-compounds)	~5-15 meV/atom (state-of-the-art)	~1-10 meV/atom (current leader)
Data Efficiency	Moderate.	High.	Very High (succinct descriptors).
Computational Cost	Lower.	Higher (per-step), but faster convergence.	High (per-step), excellent sample efficiency.
Key for Electrolytes	Good for dynamics; may miss complex anisotropies.	Captures directional bonds (Li-solvent), polarizability.	Best for reactive events, ion pairing, and complex chemistry.

Application Notes for Lithium Battery Electrolyte Research

NequIP: Excels in modeling polarizable solvent environments (e.g., EC, DMC) around Li⁺ ions due to its strict rotational equivariance, capturing anisotropic charge distributions critical for solvation energy accuracy.
MACE: The architecture of choice for studying formation and rupture of chemical bonds, such as in SEI layer formation reactions (e.g., LiPF₆ decomposition, solvent reduction). Its high body-order explicitly models multi-atom interactions.
Classical GNNPs: Remain useful for long-timescale molecular dynamics (MD) of pre-defined, non-reactive electrolyte mixtures where computational throughput is a priority, though with reduced predictive certainty for novel chemistries.

Experimental Protocols

Protocol 1: Training a MLIP for LiPF₆/EC:EMC Electrolyte Simulations

Objective: Develop a robust potential to simulate ion transport and conformational dynamics.

Data Generation (Ab Initio):
- Perform DFT (e.g., PBE0-D3) calculations on diverse snapshots from a broad exploration of the Li⁺-(solvent)ₙ-PF₆⁻ configurational space. Use molecular dynamics with enhanced sampling (e.g., metadynamics) to ensure coverage.
- Target Data: Total energies, atomic forces, and stress tensors for ~10,000-50,000 configurations.
Model Selection & Training (NequIP Example):
- Split data 80:10:10 (train:validation:test).
- Configure NequIP with l_max=2 (spherical harmonic order), 3-4 interaction layers, and ~64-128 features.
- Loss: Weighted sum of energy (λ~1) and force (λ~100-1000) MAEs.
- Train using Adam optimizer with patience-based early stopping on validation force loss.
Validation for Electrolytes:
- Static: Predict DFT energies/forces on unseen test set.
- Dynamic: Run MD, compute radial distribution functions (Li⁺-O), Li⁺ solvation shell statistics, and ion pair lifetimes. Validate against ab initio MD or experimental EXAFS data.

Protocol 2: Simulating SEI Reaction Pathways with MACE

Objective: Capture the reactive chemistry of electrolyte decomposition at a reducing anode surface.

Reactive Training Set Construction:
- Use a reactive DFT method (e.g., RPBE) to compute trajectories of key suspected reactions (e.g., EC ring-opening, PF₆⁻ defluorination).
- Critically, include reaction intermediates and transition states identified via nudged elastic band (NEB) calculations.
MACE-Specific Training:
- Utilize MACE's higher body_order (default=3 or 4) to capture multi-center interactions during bond breaking/forming.
- Ensure training set includes diverse coordination environments for Li, C, O, F, P.
- The model will learn a unified potential energy surface connecting reactants, products, and transition states.
Mechanistic Simulation:
- Perform high-temperature MD or biased MD (using the trained MACE potential) to observe spontaneous reactive events.
- Compute free energy profiles for key steps using umbrella sampling or metadynamics driven by the MLIP.

Visualizations

Title: MLIP Model Training and Validation Workflow for Electrolytes

Title: Active Learning Cycle for Electrolyte MLIP Development

The Scientist's Toolkit: Research Reagent Solutions

Table 2: Essential Computational Tools for MLIP Electrolyte Research

Item/Category	Specific Examples (Software/Package)	Function in Research
Ab Initio Data Generator	VASP, CP2K, Quantum ESPRESSO	Produces the reference electronic structure data (energy, forces) for training.
MLIP Training Framework	`nequip`, `mace`, `DeePMD-kit`, `ALLEGRO`	Implements the neural network architectures and training loops.
Molecular Dynamics Engine	LAMMPS, ASE, `simulation` (e.g., `mace-md`)	Performs large-scale molecular dynamics simulations using the trained MLIP.
Active Learning Manager	FLARE, `allegro-lib`, `BLAST`	Automates the discovery and labeling of new, uncertain configurations to improve the dataset.
Enhanced Sampling	PLUMED, SSAGES	Enables calculation of free energies and sampling of rare events (e.g., ion hopping).
Analysis & Validation	MDAnalysis, `pymatgen`, `chemiscope`	Computes key electrolyte metrics (RDF, coordination, conductivity, diffusion).
Workflow Orchestration	`signac`, `AiiDA`, Nextflow	Manages complex, high-throughput computational pipelines and data provenance.

Application Notes

Machine Learning Interatomic Potentials (MLIPs) have become a transformative tool for simulating complex electrolyte systems in lithium batteries, enabling the accurate and efficient prediction of properties critical to performance and safety. Within the context of a thesis on MLIP-driven electrolyte research, this document details the application of MLIPs to three cornerstone properties: ionic conductivity, electrochemical stability, and the identification of Solid Electrolyte Interphase (SEI) precursors.

1.1 Ionic Conductivity: Classical molecular dynamics (MD) with MLIPs allows for the simulation of ion transport over nanosecond to microsecond timescales at near-DFT accuracy. The mean squared displacement (MSD) of Li⁺ ions is calculated from trajectories, enabling the derivation of diffusion coefficients (D_Li⁺) via the Einstein relation. The ionic conductivity (σ) is then computed using the Nernst-Einstein equation, providing a direct link between atomistic structure and macroscopic battery performance. MLIPs are particularly valuable for screening novel solvent and salt combinations at varying concentrations and temperatures.

1.2 Electrochemical Stability Window (ESW): The ESW defines the voltage range within which the electrolyte is thermodynamically stable against oxidation at the cathode and reduction at the anode. MLIPs facilitate hybrid Monte Carlo/MD simulations to compute the free energy of redox decomposition reactions. By evaluating the enthalpy of formation for decomposition products (e.g., LiF, Li₂O, organic lithiated species) from electrolyte components, the reduction and oxidation potentials can be estimated. This allows for the in silico design of electrolytes with wider ESWs for high-voltage cathodes.

1.3 SEI Precursor Identification: The initial, crucial steps of SEI formation involve the reduction of electrolyte molecules at the anode surface. MLIP-based reactive MD simulations can model these complex electron-transfer and bond-breaking/forming events. By simulating the interaction between electrolyte species (e.g., ethylene carbonate, fluoroethylene carbonate, LiPF₆) and a model Li-metal or lithiated graphite surface, one can track the decomposition pathways, identify primary reduction products (e.g., lithium ethylene dicarbonate, LiF), and rank the propensity of different components to form beneficial SEI layers.

Table 1: Representative MLIP-MD Simulation Results for Ionic Conductivity in Model Electrolytes

Electrolyte System (Li Salt in Solvent)	Concentration (M)	Temp (K)	Simulated D_Li⁺ (10⁻⁶ cm²/s)	Predicted σ (mS/cm)	DFT Reference σ (mS/cm)
LiPF₆ in Ethylene Carbonate (EC)	1.0	300	1.05 ± 0.15	8.2 ± 1.2	8.5
LiTFSI in 1,2-Dimethoxyethane (DME)	1.0	300	3.82 ± 0.30	25.1 ± 2.0	24.8
LiFSI in Tetrahydrofuran (THF)	2.0	330	2.45 ± 0.20	18.5 ± 1.5	N/A

Table 2: Calculated Reduction Potentials for Common Electrolyte Components vs. Li⁺/Li

Molecule	Primary Reduction Product	MLIP-Calculated Reduction Potential (V)	Experimental Range (V)
Ethylene Carbonate (EC)	Lithium Ethylene Dicarbonate (LEDC)	0.78	0.6 - 0.9
Fluoroethylene Carbonate (FEC)	LiF, VC, Polymeric species	0.95	0.9 - 1.2
Vinylene Carbonate (VC)	Poly(VC)	0.65	0.5 - 0.8
LiPF₆	LiF, PF₃O, LixPOyFz	1.42 (vs. decomposition)	>1.5

Experimental Protocols

Protocol 3.1: MLIP-MD Workflow for Ionic Conductivity Calculation

Objective: To compute the ionic conductivity of a liquid electrolyte using MLIP-driven molecular dynamics. Materials: See "The Scientist's Toolkit" (Section 5). Procedure:

System Construction: Using VMD or Packmol, construct a simulation box containing a pre-defined number of Li⁺ ions, counter anions (e.g., PF₆⁻), and solvent molecules to achieve the target molarity. Ensure initial electrostatic neutrality.
Equilibration (NPT Ensemble): Perform a 1-2 ns MD simulation using the MLIP (e.g., via LAMMPS interface) in the isothermal-isobaric (NPT) ensemble at the target temperature (e.g., 300 K) and pressure (1 bar) using a Nosé-Hoover thermostat/barostat. This allows the system density to relax to its experimental value.
Production Run (NVT Ensemble): Using the equilibrated structure, run a long-timescale (10-100 ns) production simulation in the canonical (NVT) ensemble. The trajectory should be saved at frequent intervals (e.g., every 1 ps).
Trajectory Analysis: a. Calculate the Mean Squared Displacement (MSD) of Li⁺ ions over time using the equation: MSD(t) = ⟨|r_i(t + t₀) - r_i(t₀)|²⟩, where the average is over all Li⁺ ions and time origins (t₀). b. Fit the linear portion of the MSD(t) vs. time curve to obtain the diffusion coefficient: D_Li⁺ = (1/(6N)) * lim_{t→∞} d(MSD(t))/dt, where N is the dimensionality (3).
Conductivity Calculation: Apply the Nernst-Einstein relation: σ = (ρ * (zF)² / (RT)) * (D_Li⁺), where ρ is the molar density of Li⁺, z is charge (+1), F is Faraday's constant, R is the gas constant, and T is temperature. For more rigorous results, compute the full conductivity tensor from the current-current autocorrelation function.

Protocol 3.2: Protocol for Probing Initial SEI Decomposition Pathways

Objective: To simulate the reductive decomposition of an electrolyte component at a model anode surface. Materials: See "The Scientist's Toolkit" (Section 5). Procedure:

Surface & Adsorbate Preparation: Create a slab model of the anode surface (e.g., Li(100), LiC₆). Introduce a single molecule of the target electrolyte component (e.g., FEC) into the vacuum layer above the surface at a plausible adsorption distance.
MLIP-Based Reactive MD: Use a reactive MLIP (e.g., ANI, NequIP) in CP2K or LAMMPS. Start with geometry optimization of the adsorbate/surface system.
Dynamics with Enhanced Sampling: Run ab initio MD or MLIP-MD at a controlled temperature (e.g., 300-400 K). To overcome reaction barriers, employ enhanced sampling techniques like metadynamics or ReaxFF/MLIP hybrid dynamics. A collective variable (CV) could be the distance between a specific C atom in the molecule and a surface Li atom, or the breaking of a specific C-O bond.
Reaction Monitoring: Track bond orders, partial charges (e.g., via DDEC6 analysis), and radical formation throughout the simulation. Identify the first stable reduced species that forms and remains adsorbed.
Free Energy Analysis: From the biased simulation (e.g., metadynamics), reconstruct the free energy surface (FES) as a function of the chosen CVs. The minima on the FES correspond to stable intermediates, and saddle points correspond to transition states for the initial reduction step.

Visualizations

Diagram 1: MLIP Simulation Workflow

Diagram 2: Ionic Conductivity from MLIP-MD

Diagram 3: FEC Reduction Pathways at Anode

The Scientist's Toolkit

Table 3: Essential Research Reagent Solutions for MLIP Electrolyte Simulations

Item	Function in Research	Example Product/Code
MLIP Software	Core engine for performing high-accuracy, fast atomic simulations. Trained on DFT data.	`MACE`, `NequIP`, `Allegro`, `CHGNet`
MD Engine	Software to perform the molecular dynamics calculations using the MLIP as the force provider.	`LAMMPS`, `CP2K`, `ASE`
Ab Initio Code	To generate the initial quantum-mechanical training data for the MLIP.	`VASP`, `Gaussian`, `Quantum ESPRESSO`
System Builder	Tool to create initial atomic configurations of electrolyte boxes or interface models.	`Packmol`, `VMD`, `pymatgen`
Analysis Suite	For processing MD trajectories: calculating MSD, RDFs, coordination numbers, etc.	`MDAnalysis`, `pymatgen.analysis`, `in-house scripts`
Enhanced Sampling	Software to accelerate rare events like bond breaking in SEI formation simulations.	`PLUMED`, `SSAGES`
Reference Electrolyte Database	Curated dataset of electrolyte structures, energies, and forces for MLIP training/validation.	`Electrolyte Genome Project` data, `Materials Project`
High-Performance Computing (HPC)	Essential computational resource for training MLIPs and running long-timescale MD.	Local cluster, `XSEDE`, `Google Cloud Platform`

Application Notes

Machine Learning Interatomic Potentials (MLIPs) represent a paradigm shift for molecular dynamics (MD) simulations of lithium battery electrolytes, bridging the gap between computationally prohibitive ab initio methods and the limited accuracy of classical force fields. The fidelity, transferability, and robustness of an MLIP are fundamentally determined by the quality and scope of its training dataset. Ab initio datasets, derived from quantum mechanical calculations, provide the essential foundational data. For electrolyte systems, these datasets must capture a vast and complex configuration space: diverse solvation structures (Li⁺ with carbonate, ether, or nitrile solvents), ion pairing/aggregation, explicit and implicit interface environments, and decomposition transition states. A robust MLIP trained on such a dataset can then predict energies and forces with near-ab initio accuracy at MD-scale computational cost, enabling the study of long-timescale phenomena like solid-electrolyte interphase (SEI) growth, lithium dendrite initiation, and solvent degradation pathways—processes central to battery performance and safety.

Protocols

Protocol 1: Generation of a RepresentativeAb InitioTraining Dataset for Liquid Electrolytes

Objective: To create a comprehensive Density Functional Theory (DFT) dataset that samples the relevant configurations of a lithium salt (e.g., LiPF₆) in a solvent mixture (e.g., EC:EMC).

Methodology:

Initial Configuration Generation:
- Use classical MD with a standard force field (e.g., OPLS-AA, GAFF) to simulate a ~1 M electrolyte solution in a cubic box with ~100-200 molecules.
- Run an NPT simulation (e.g., 298 K, 1 bar) for 5-10 ns to equilibrate density.
- Perform a subsequent NVT simulation to collect uncorrelated snapshots. Save 500-1000 snapshots spaced by 5-10 ps.

DFT Single-Point Calculations:
- For each snapshot, extract the coordinates and compute the total energy and atomic forces using DFT.
- Software: CP2K, VASP, or Quantum ESPRESSO.
- Functional: PBE0-D3 or ωB97X-D for good accuracy on dispersion interactions.
- Basis Set: Mixed Gaussian/Plane-wave (GPW) in CP2K (e.g., DZVP-MOLOPT-SR-GTH for elements, GTH-PBE pseudopotentials) or PAW with a 400-500 eV plane-wave cutoff in VASP.
- Sampling Note: This "active learning" loop is typically iterative. Initial MLIPs trained on this data are used to run new MD, discover underrepresented/high-error configurations (via uncertainty quantification), which are then added to the dataset.
Configuration Augmentation for Reactivity:
- Perform targeted meta-dynamics or nudged elastic band (NEB) calculations on select configurations to sample bond-breaking events (e.g., Li⁺-solvent dissociation, PF₆⁻ decomposition, solvent transesterification).
- Include these reaction pathway configurations in the final dataset to train the MLIP on chemical reactivity.

Protocol 2: Active Learning Workflow for MLIP Development

Objective: To iteratively construct an optimal training dataset and train a robust MLIP (e.g., Neural Network Potential (NNP), Gaussian Approximation Potential (GAP), or Moment Tensor Potential (MTP)).

Methodology:

Initial Model Training:
- Train a preliminary MLIP (e.g., using DeePMD-kit, QUIP, or M-LTP) on the initial DFT dataset from Protocol 1.
Exploration and Uncertainty Sampling:
- Use the preliminary MLIP to run extensive MD simulations under various conditions (different temperatures, concentrations, external electric fields).
- During these simulations, implement an uncertainty metric (e.g., the variance between a committee of models, or the intrinsic uncertainty of the MLIP).
- Flag configurations where the model's predicted uncertainty exceeds a predefined threshold.
Dataset Refinement:
- Perform new DFT calculations on the high-uncertainty configurations identified in Step 2.
- Add these new data points to the existing training set.
Iteration:
- Retrain the MLIP on the expanded dataset.
- Repeat steps 2-4 until the model's error (on a held-out test set) converges and no new high-uncertainty configurations are found in representative simulations.

Data Tables

Table 1: Comparative Performance of MLIPs Trained on Different Ab Initio Dataset Strategies

Dataset Strategy	DFT Method & Size	RMSE (Energy) [meV/atom]	RMSE (Forces) [meV/Å]	Required MD Simulation Time (vs. AIMD)	Key Limitations Addressed
Single-Point from CMD	PBE0-D3, 5k configs	~2.5 - 4.0	~80 - 120	~1000x faster	Bulk liquid structure, diffusion
+ Active Learning	PBE0-D3, 15k configs	~1.5 - 2.5	~50 - 80	~2000x faster	Rare events, local deformations
+ Explicit Reaction Paths	ωB97X-D, 20k configs	~2.0 - 3.0	~60 - 100	~1500x faster	Chemical reactivity, SEI precursor formation
Pure AIMD Baseline	PBE-D2, 500 configs	N/A	N/A	1x (baseline)	Limited sampling, high cost

Table 2: Key Properties of a Model LiPF₆ in EC:EMC Electrolyte Predicted by a Robust MLIP vs. Experiment

Property	MLIP-MD Prediction	Experimental Reference	Computational Cost (CPU-hr)
Li⁺ Diffusion Coefficient (298K)	2.1 x 10⁻⁶ cm²/s	1.8 - 2.5 x 10⁻⁶ cm²/s	~5,000 (vs. ~1,000,000 for AIMD)
Li⁺ Solvation Shell Size	4.1 (avg.)	~4	~500
EC Decomposition Barrier (on Li surface)	0.78 eV	0.70 - 0.85 eV (est.)	~15,000 (NEB with MLIP)
Ionic Conductivity (1M, 298K)	8.5 mS/cm	9.2 - 10.5 mS/cm	~8,000

Visualizations

Title: Active Learning Cycle for Electrolyte MLIP Development

Title: From Ab Initio Data to Battery Electrolyte Insights

The Scientist's Toolkit

Table 3: Essential Research Reagents & Solutions for Ab Initio Electrolyte MLIP Development

Item	Function in Research	Key Consideration for Electrolytes
DFT Software (CP2K, VASP)	Performs the foundational ab initio calculations to generate reference energies and forces.	Must handle periodic boundary conditions, dispersion corrections (D3), and hybrid functionals for accuracy.
Classical MD Engine (GROMACS, LAMMPS)	Generates initial configuration samples and can be used for exploratory sampling with a preliminary MLIP.	Requires accurate classical force fields for initial sampling of bulk liquid.
MLIP Training Framework (DeePMD-kit, QUIP)	Provides the architecture (NNP, GAP) and tools to train the machine learning potential on the DFT dataset.	Must support diverse chemical species (Li, C, O, F, P, H) and complex, non-periodic molecular configurations.
Active Learning Manager (FLARE, AL4MLIP)	Automates the iterative process of running MLIP-MD, identifying uncertain configurations, and triggering new DFT.	Critical for efficiently exploring the vast electrolyte configuration space without human intervention.
High-Performance Computing (HPC) Cluster	Provides the essential computational resources for both DFT calculations and large-scale MLIP-MD simulations.	Needs substantial CPU/GPU hours; DFT steps are the primary bottleneck.
Reference Experimental Data	Provides validation targets for MLIP-MD predictions (e.g., diffusion coefficients, Raman spectra, conductivity).	Ensures the MLIP's predictions are physically meaningful and not just fitting the DFT data's potential errors.

Implementing MLIP Simulations: A Step-by-Step Workflow for Electrolyte Systems

Application Notes

Within the broader thesis on Machine Learning Interatomic Potential (MLIP) simulations for lithium battery electrolytes, the construction of realistic atomistic models is foundational. These models must accurately represent the complex, multi-component systems comprising lithium salts (e.g., LiPF₆, LiTFSI), organic carbonate solvents (EC, DMC, EMC), and performance-enhancing additives (e.g., FEC, VC). The primary challenge is capturing the intricate interplay of ion-ion, ion-solvent, and solvent-solvent interactions that govern Li⁺ transport, solvation structure, and solid-electrolyte interphase (SEI) formation.

Recent MLIPs, such as Neural Network Potentials (NNPs), Moment Tensor Potentials (MTPs), and Gaussian Approximation Potentials (GAPs), trained on high-quality quantum mechanics (QM) data (e.g., DFT with hybrid functionals and van der Waals corrections), have shown promise in bridging the accuracy/scale gap. They enable nanosecond-scale molecular dynamics (MD) simulations of full electrolyte compositions with near-DFT fidelity, which is critical for predicting properties like ionic conductivity, lithium transference number, and oxidative stability.

Key Data for Common Electrolyte Components: Table 1: Common Lithium Salts and Key Properties

Salt	Abbreviation	Anion Mass (g/mol)	Dissociation Energy (approx. kcal/mol)	Common Solvent(s)	Key Feature
Lithium Hexafluorophosphate	LiPF₆	144.96	~220	Carbonate Blends	High conductivity, moisture sensitive
Lithium Bis(trifluoromethanesulfonyl)imide	LiTFSI	280.12	~180	Carbonates, DME	High thermal/electrochemical stability
Lithium Bis(fluorosulfonyl)imide	LiFSI	184.06	~170	Carbonates	Promotes stable SEI, high conductivity

Table 2: Common Solvents and Additives

Component	Type	Dielectric Constant (ε)	Viscosity (cP, 25°C)	Melting Point (°C)	Primary Function
Ethylene Carbonate (EC)	Cyclic Carbonate	89.8	1.9 (40°C)	36-38	High dielectric, SEI formation
Dimethyl Carbonate (DMC)	Linear Carbonate	3.1	0.59	4-5	Low viscosity, co-solvent
Fluoroethylene Carbonate (FEC)	Additive	~110 (est.)	4.1	~18	Forms stable LiF-rich SEI on anodes
Vinylene Carbonate (VC)	Additive	~114 (est.)	N/A	22	Polymerizable SEI-forming additive

Experimental Protocols

Protocol 1: Initial Model Construction and Equilibration for MLIP-MD

Objective: Generate a structurally relaxed and compositionally accurate atomistic model of a multi-component liquid electrolyte for subsequent production MD simulation.

Materials (The Scientist's Toolkit): Table 3: Key Research Reagent Solutions & Computational Tools

Item	Function/Description	Example Software/Package
DFT Software	Generate ab initio reference data for training/validation.	VASP, Quantum ESPRESSO, Gaussian
Molecular Builder	Assemble initial 3D atomic coordinates.	Packmol, Moltemplate, ASE
Force Field (FF)	Provide initial empirical potentials for pre-equilibration.	OPLS-AA, GAFF, CLAFF
MLIP Training Suite	Train ML models on QM data.	AMPTorch, PANNA, DEEPMD
MD Engine	Perform classical and MLIP-driven molecular dynamics.	LAMMPS, GROMACS, OPENMM

Procedure:

System Definition: Define the target electrolyte composition (e.g., 1M LiPF₆ in 3:7 EC:EMC by weight with 2% FEC). Calculate the number of molecules/ions required for a given simulation box size (e.g., ~50-100 Å side length).
Initial Coordinate Generation: Use a packing tool (e.g., Packmol). Input the number of each molecule/ion and an approximate box size. Execute to create a low-overlap initial configuration file (e.g., .xyz, .pdb).
Empirical FF Assignment: Parameterize all components using a consistent classical force field (e.g., GAFF2). Assign partial charges via restrained electrostatic potential (RESP) fits from HF/6-31G* calculations on individual molecules/ions.
Classical Pre-Equilibration: a. Energy minimize the packed structure. b. Perform NVT MD at 500 K for 100 ps with a 1 fs timestep to randomize positions. c. Cool the system to 298 K over 100 ps. d. Perform NPT MD at 298 K and 1 bar for 2-5 ns to achieve correct density. Monitor convergence.
MLIP Inference/Re-equilibration: Using the final classical structure as input, perform a shorter (100-200 ps) NPT equilibration using the target MLIP to relax the structure into the more accurate potential energy surface.
Validation: Check final density against experimental values. Analyze radial distribution functions (e.g., Li⁺-O) against available QM or experimental data.

Protocol 2: Generating Training Data for a Solvent-Specific MLIP

Objective: Create a diverse and representative dataset of atomic configurations and energies/forces for a target electrolyte component (e.g., EC solvent cluster with Li⁺) to train an MLIP.

Procedure:

Configuration Sampling: From a classical MD trajectory of the electrolyte, select ~1000-5000 unique snapshots containing the target local environment (e.g., all EC molecules within 6 Å of any Li⁺).
QM Calculation Setup: For each snapshot, extract a cluster with a defined cutoff radius (e.g., 8 Å from central Li⁺). Saturate broken bonds with hydrogen atoms. Prepare input files for DFT calculation.
High-Accuracy DFT Calculations: Perform single-point energy and force calculations. Use a hybrid functional (e.g., B3LYP-D3), a triple-zeta basis set (e.g., def2-TZVP), and an implicit solvent model (e.g., PCM) to approximate bulk effects. Compute in parallel on an HPC cluster.
Dataset Curation: Assemble a list of atomic coordinates (features) with corresponding total energies and atomic force vectors (labels). Apply noise filtering (e.g., remove configurations with implausibly high forces).
Training/Test Split: Randomly split data (e.g., 80:20) into training and hold-out test sets. Ensure test set contains diverse configurations.

Workflow for Building & Applying MLIP Models

Application Notes

Active Learning (AL) with Machine Learning Interatomic Potentials (MLIPs) represents a paradigm shift for simulating lithium battery electrolytes. Traditional fixed-training-set MLIPs fail under the extreme electrochemical conditions (high voltage, Li plating, decomposition) that evolve electrolyte configurations. This protocol enables the autonomous generation of robust, configuration-aware potentials for reactive molecular dynamics (RMD) simulations, directly supporting thesis research into degradation pathways and novel additive design.

Core Application: Automated, iterative refinement of a MLIP's training dataset through selective sampling of underrepresented or high-uncertainty configurations from on-the-fly RMD simulations. This closes the loop between simulation and model improvement, capturing complex chemical reactions (e.g., solid-electrolyte interphase (SEI) formation) and solvation structure evolution with quantum-mechanical accuracy.

Key Quantitative Performance Metrics (Summary): Table 1: Comparative Performance of Active-Learned vs. Static MLIPs for LiPF₆ in EC:DMC Electrolyte

Metric	Static MLIP (Initial Training Set)	Active-Learned MLIP (After 5 Cycles)	Measurement Method
Energy Prediction MAE	12.5 meV/atom	3.2 meV/atom	DFT reference on test set
Force Prediction MAE	185 meV/Å	45 meV/Å	DFT reference on test set
Reaction Barrier Error	~350 meV	< 80 meV	NEB calculation for EC decomposition
Stable MD Time (at 4.8V)	< 50 ps	> 1 ns	Time before unphysical drift
Configurations Sampled	1,200 (static)	12,500 (autonomous)	Total training database size

Table 2: On-the-Fly Simulation Outcomes for a Model Electrolyte System

System (LiPF₆ 1M in EC:EMC)	Active Learning Query Condition	New Reaction Captured	Impact on Model
At Li Metal Anode (0.5V vs. Li⁺/Li)	High uncertainty in Li-C coordination	Li-EC reduction to LiEDC and C₂H₄	Expanded training on alkoxides
At High Voltage Cathode (4.8V)	High uncertainty in P-F bond length	PF₆⁻ oxidation to PF₅ and F⁻	Added POxFy species data
During Li Plating	Sudden force prediction spike	Li dendrite nucleation & SEI rupture	Added strained Li-Li/EC configurations

Experimental Protocols

Protocol 1: Initial Training Set Curation for Bootstrap MLIP

System Preparation: Generate initial atomic configurations for your electrolyte (e.g., LiPF₆ salt in a mixture of ethylene carbonate (EC) and dimethyl carbonate (DMC) solvents).
Ab Initio Sampling: Perform short, exploratory DFT-based molecular dynamics (AIMD) simulations (~300K, NVT ensemble) for 10-20 ps. Use a small, representative cell (~100 atoms).
Diverse Configuration Selection: From the AIMD trajectory, select ~500-1000 frames using a diversity-sampling algorithm (e.g., Farthest Point Sampling) on atomic descriptors (SOAP, ACSF).
Reference Calculation: Perform single-point DFT calculations (e.g., PBE-D3, medium basis set) on selected frames to obtain energies, forces, and stresses.
Bootstrap Training: Train an initial MLIP (e.g., Moment Tensor Potential (MTP), NequIP, Gaussian Approximation Potential (GAP)) on this dataset. This is your MLIP_initial.

Protocol 2: Active Learning Loop for On-the-Fly Training

Setup Active Learning-Driven MD:
- Prepare a larger simulation cell (~500-1000 atoms) of your target electrolyte.
- Configure the simulation (e.g., using LAMMPS with ML-KIM interface) to use MLIP_initial with an AL driver (e.g., MLIAP + USER-QUIP).
- Set the Query Strategy Criteria: Typical thresholds are:
  - σ_energy > 10 meV/atom
  - max(σ_force) > 100 meV/Å
  - Det(Covariance) > threshold (for committee models).
Run and Query:
- Launch the RMD simulation at target conditions (e.g., 350K, applied bias).
- The AL driver monitors the MLIP's uncertainty metrics at each step (or every N steps).
- When a configuration meets the query criteria, the simulation pauses. The atomic coordinates of this "candidate" configuration are stored in a query_pool.xyz file.
On-the-Fly Labeling & Retraining:
- A job scheduler submits the candidate configurations in query_pool.xyz for DFT single-point calculations.
- Upon DFT completion, the new (configuration, energy, forces) data is appended to the main training dataset.
- The MLIP is retrained (MLIP_iteration_N+1). Use incremental training to reduce cost.
- The RMD simulation resumes from the paused state using the updated, more accurate potential.
Loop Completion: Continue until the simulation reaches the target timescale (e.g., 1 ns) and the rate of query events falls below a pre-set threshold (e.g., <1 query/ps), indicating convergence.

Protocol 3: Validation of Active-Learned MLIP

Static Property Validation: Calculate key properties from a simulation using the final AL-MLIP and compare to AIMD or experiment:
- Li⁺ Solvation Structure: Radial distribution functions (RDFs) g(r) for Li-O (carbonyl) and Li-PF6.
- Dynamics: Li⁺ diffusion coefficient from mean-squared displacement (MSD).
- Interface Stability: Measure thickness and composition of evolved SEI at anode interface.
Reactive Pathway Validation: Identify a key decomposition reaction observed during AL-MD (e.g., EC ring opening). Perform a climbing-image nudged elastic band (CI-NEB) calculation using both the AL-MLIP and DFT. Compare reaction barriers and intermediate geometries.

Diagrams

Active Learning Cycle for MLIP Refinement

Uncertainty-Based Query Decision in On-the-Fly MD

The Scientist's Toolkit: Essential Research Reagents & Materials

Table 3: Key Computational Research Reagents for Active Learning MLIP Simulations

Item / Software	Function / Purpose	Example in Protocol
VASP / Quantum ESPRESSO	High-Fidelity Label Generator: Performs reference DFT calculations to provide target energies and forces for training and query labeling.	Protocol 1, Step 4 & Protocol 2, Step 3.
MLIP Fitting Code (M-LAMMPS/QUIP, Allegro, DeepMD)	Potential Architect: Software to define, train, and evaluate the machine learning interatomic potential.	Used throughout to create `MLIP_initial` and all `MLIP_iteration_N`.
Atomic Cluster Expansion (ACE) or SOAP Descriptors	Configuration Fingerprinter: Translates atomic coordinates into invariant mathematical representations suitable for ML model input.	Used in diversity sampling (Protocol 1, Step 3) and as basis for many MLIPs.
LAMMPS with ML-IAP Plugins	MD Engine with AL Driver: Performs the large-scale reactive molecular dynamics, integrated with uncertainty-aware active learning controllers.	Core platform for Protocol 2, running on-the-fly AL-MD.
Committee of MLIPs (e.g., Ensemble MTPs)	Uncertainty Quantifier: Multiple models trained on slightly different data provide a robust estimate of prediction uncertainty (σ), triggering queries.	Implemented in Protocol 2, Step 1 and visualized in Diagram 2.
Job Scheduler (Slurm, Kubernetes)	Workflow Automator: Manages the queueing and execution of DFT jobs for query configurations, enabling fully automated loops.	Critical for operationalizing Protocol 2, Step 3 without manual intervention.

Application Notes

These notes detail the application of Machine Learning Interatomic Potentials (MLIPs) to simulate critical phenomena governing lithium-ion battery electrolyte performance, with a focus on Li+ solvation dynamics and its direct impact on transference numbers. This work supports a broader thesis on accelerating the design of next-generation electrolytes via high-fidelity molecular dynamics (MD) simulations.

1.1 Context & Significance: Accurate prediction of the lithium transference number (tLi+) remains a grand challenge in electrolyte modeling. Its value is governed by complex, collective phenomena—ionic correlations, solvent exchange kinetics, and anion clustering—that extend beyond the timescales and accuracies of conventional ab initio MD. MLIPs, trained on high-quality quantum mechanical data, bridge this gap, enabling nanosecond-to-microsecond simulations with near-ab initio fidelity to capture these critical dynamics.

1.2 Key Phenomena Accessible via MLIP Simulations:

Solvent Shell Exchange Rates: Direct calculation of Li+ ion residence times for key solvents (e.g., ethylene carbonate, dimethoxyethane).
Aggregate Speciation: Quantification of the population dynamics of contact ion pairs (CIPs), aggregates (AGGs), and free ions.
Dynamic Correlation & Coordination: Analysis of correlated cation-anion motion and its dependence on local coordination chemistry.
Transference Number Computation: Application of the Green-Kubo formalism to continuous current autocorrelation functions derived from MLIP-MD trajectories, providing a direct link from atomistic dynamics to macroscopic transport.

Table 1: Representative Simulation Outcomes for Benchmark Electrolyte Systems (1M LiPF6 in EC:DMC)

Metric	Classical Force Field (FF)	MLIP (e.g., NequIP)	Experimental Reference	Key Insight
Li+ Diffusion Coefficient (D_Li+)	1.2 × 10⁻⁶ cm²/s	0.8 × 10⁻⁶ cm²/s	~1.0 × 10⁻⁶ cm²/s	MLIPs correct overestimation from inaccurate FF potentials.
Anion Diffusion Coefficient (D_PF6-)	0.6 × 10⁻⁶ cm²/s	1.5 × 10⁻⁶ cm²/s	~1.6 × 10⁻⁶ cm²/s	MLIPs capture stronger anion mobility due to accurate polarization.
Li+ Transference Number (tLi+)	~0.35	~0.20	0.2 - 0.3	MLIPs predict lower tLi+ due to enhanced anion mobility and ion pairing.
Avg. Li+ Coordination Number (O from solvent)	4.1	3.8	~4.0 (est.)	MLIPs refine solvation structure, impacting transport pathways.
Primary Solvent Residence Time	450 ps	220 ps	100-300 ps	MLIPs yield faster exchange dynamics, crucial for understanding vehicular vs. structural transport.

Table 2: Key Input Parameters for a Typical MLIP-MD Workflow

Parameter	Typical Value/Range	Purpose
MLIP Architecture	NequIP, Allegro, MACE	Equivariant model capturing complex atomic environments.
Training Set Size	1,000 - 10,000 DFT frames	Ensures broad sampling of configurational space.
Simulation Box Size	200 - 500 molecules/ions	Minimizes finite-size effects for transport properties.
Production Run Length	50 - 200 ns (NPT/NVT)	Ensures convergence of mean-squared displacement for diffusion.
Temperature / Pressure	298 - 333 K / 1 bar	Standard operating conditions.
Statistical Sampling	3-5 independent replicates	Provides error estimates for computed properties.

Experimental Protocols

Protocol 3.1: MLIP Training for an Electrolyte System

Initial Configuration Generation: Use PACKMOL to create a box of solvent molecules (e.g., 200 EC, 200 DMC) and Li-salt (e.g., 40 LiPF6 pairs) at target concentration (~1M).
Active Learning & Dataset Curation: a. Perform short DFT-MD (e.g., 10 ps, 400 K) to generate initial training data. b. Run MLIP-MD, periodically using uncertainty quantification (e.g., committee variance). Select frames with high uncertainty. c. Compute DFT single-point energies and forces for selected frames. d. Iterate (b-c) until forces/energies on a hold-out validation set converge (RMSE < 10 meV/atom for energy, ~50 meV/Å for forces).
Model Training: Train an equivariant MLIP (e.g., NequIP) using the curated dataset. Use a 80:10:10 train:validation:test split. Employ data augmentation (rotation, reflection).

Protocol 3.2: Production MD and Transference Number Calculation

Equilibration: Using the trained MLIP, equilibrate the system in the NPT ensemble (298 K, 1 bar) for 2-5 ns using a time step of 0.5-1.0 fs.
Production Run: Switch to NVT ensemble. Run a production simulation for 50-200 ns, saving trajectories every 1 ps.
Analysis of Solvation Dynamics: a. Coordination Number: Compute radial distribution functions (RDFs) g(r) for Li-O (carbonyl/ether) and Li-F (anion). b. Residence Time: Calculate the time correlation function for solvent/anion remaining in the first solvation shell (defined by the first minimum of the RDF). Fit to an exponential decay. c. Aggregate Speciation: Use geometric criteria (e.g., Li-F distance < 2.2 Å) to classify each Li+ as free, in a CIP, or in an AGG (anion bridging multiple Li+).
Compute Transference Number: a. Calculate the time-dependent total current J(t) = Σi qi vi(t) for all ions *i*. b. Compute the current autocorrelation function (CACF): <J(t)·J(0)>. c. Apply Green-Kubo: Ionic conductivity σ = (V / 3kBT) ∫0^∞ <J(t)·J(0)> dt. d. Compute the *cation* contribution σLi+ using only Li+ velocities in step (a). e. The true transference number is tLi+ = σ_Li+ / σ.

Visualization: Workflow and Analysis Pathways

Title: MLIP Workflow for Electrolyte Simulation & Analysis

Title: Green-Kubo Calculation of Lithium Transference Number

The Scientist's Toolkit: Research Reagent Solutions

Table 3: Essential Materials & Software for MLIP Electrolyte Simulations

Item	Function/Description
High-Performance Computing (HPC) Cluster	Essential for DFT calculations, MLIP training, and long-timescale (100+ ns) MD simulations.
Quantum Chemistry Code (VASP, CP2K, Gaussian)	Generates the reference ab initio data (energies, forces) for training the MLIP.
MLIP Framework (NequIP, Allegro, MACE)	Software implementing equivariant neural network potentials for accurate, fast MD.
Classical MD Engine (LAMMPS, OpenMM)	Integrates the MLIP for performing the production molecular dynamics simulations.
Active Learning Manager (FLARE, ASE)	Automates the iterative process of configuration sampling, uncertainty query, and dataset expansion.
Trajectory Analysis Suite (MDAnalysis, VMD, in-house scripts)	For computing RDFs, coordination numbers, residence times, and current autocorrelation functions.
Benchmark Electrolyte Mixtures (e.g., 1M LiPF6 in EC:EMC)	Standard experimental systems used for validating the simulation methodology and MLIP accuracy.

This document details computational and experimental protocols for investigating Solid-Electrolyte Interphase (SEI) formation, a critical yet poorly understood process dictating lithium-ion battery performance, safety, and longevity. Within the broader thesis on Machine Learning Interatomic Potential (MLIP) simulations for lithium battery electrolytes, this work bridges high-fidelity atomistic modeling with validation experiments. The SEI's dynamic, multi-layered structure forms via complex electrochemical reactions between the anode (e.g., graphite, silicon) and the electrolyte. Understanding its nucleation, growth kinetics, and resultant ionic transport properties is paramount for rational electrolyte design. These protocols are designed for researchers aiming to deconvolute the coupled chemical, electrochemical, and transport mechanisms at play.

Core Experimental & Computational Protocols

Protocol 2.1:In OperandoElectrochemical Quartz Crystal Microbalance with Dissipation Monitoring (EQCM-D) for SEI Mass & Viscoelasticity Tracking

Objective: To quantitatively measure the mass deposition and viscoelastic properties of the SEI layer in real-time during electrochemical formation.

Materials & Setup:

Electrochemical Cell: 3-electrode setup with Au-coated quartz sensor (working electrode), Li metal (counter and reference electrodes).
Electrolyte: 1.0 M LiPF₆ in EC:EMC (3:7 by wt) with 2 wt% VC additive.
Instrumentation: QSense Analyzer coupled with a potentiostat.
Environment: Ar-filled glovebox (<0.1 ppm O₂, H₂O).

Procedure:

Sensor Preparation: Clean the Au-coated quartz crystal with UV-ozone for 10 min, then assemble in the electrochemical cell inside the glovebox.
Baseline Stabilization: Fill the cell with pure solvent (EC:EMC mix) and record frequency (Δf) and dissipation (ΔD) baselines at multiple overtones (n=3, 5, 7, 9, 11) for 30 min.
Electrolyte Introduction: Replace solvent with the prepared electrolyte without exposing to air.
SEI Formation Cycle: Initiate potentiostatic control. Apply a constant potential of 0.8 V vs. Li/Li⁺ for 30 minutes to promote reductive decomposition and SEI nucleation.
Cycling & Monitoring: Subsequently, perform 5 galvanostatic cycles between 0.01 V and 1.5 V vs. Li/Li⁺ at a C/10 rate while continuously recording Δf and ΔD.
Data Analysis: Use the Sauerbrey equation (for rigid layers) and the Damped Voigt viscoelastic model (in QTools software) to calculate mass change (Δm) and film thickness/softness from multi-overtone data.

Key Data Output: Time-resolved profiles of cumulative SEI mass, thickness, and shear modulus during the initial formation cycle.

Protocol 2.2: Ab Initio Molecular Dynamics (AIMD) Informed MLIP Training for SEI Reaction Sampling

Objective: To generate a robust Machine Learning Interatomic Potential (MLIP) capable of simulating long-timescale SEI reaction dynamics with near-DFT accuracy.

Materials & Software:

Software: VASP/Gaussian (for AIMD), DeepMD-kit or MACE (for MLIP training), LAMMPS (for MLIP-MD).
Initial Structures: DFT-optimized clusters containing solvent molecules (EC, EMC), salt (LiPF₆), additive (VC), and Li metal/graphite slab surfaces.

Procedure:

Reactive Ensemble Generation: Perform multiple short (~10-20 ps) AIMD simulations of the electrolyte/anode interface at elevated temperatures (800-1200 K) using Born-Oppenheimer or CPMD to force reaction events.
Training Set Curation: Extract snapshots from AIMD trajectories. Annotate each snapshot with its total energy, atomic forces, and virial tensor calculated at the DFT level (e.g., PBE-D3). Ensure coverage of reactants, transition states, intermediates, and products.
MLIP Training & Validation:
- Split data (80/10/10) for training, validation, and testing.
- Train an MLIP (e.g., Deep Potential) using a descriptor network for atomic environment embedding.
- Validate by comparing MLIP-predicted energies and forces against DFT values for the test set. Target thresholds: Energy error < 2 meV/atom, Force error < 100 meV/Å.
Enhanced Sampling MLIP-MD: Use the validated MLIP to run metadynamics or umbrella sampling simulations at operational temperatures (300 K) to probe reaction free energy landscapes for key processes (e.g., EC double reduction, Li₂CO₃ nucleation).

Key Data Output: Reaction pathways, free energy barriers, and identified stable SEI component structures (e.g., Li₂EDC, Li₂CO₃, LiF oligomers).

Protocol 2.3: X-ray Photoelectron Spectroscopy (XPS) Depth Profiling for SEI Compositional Analysis

Objective: To determine the elemental composition and chemical state of SEI components as a function of depth from the electrolyte interface to the anode surface.

Materials & Setup:

Sample Preparation: Coin cells (CR2032) with graphite electrodes cycled for 1, 5, and 20 formation cycles (Protocol 2.1 conditions). Disassemble in glovebox, rinse with DMC solvent to remove residual salt, and dry.
Transfer: Use an airtight transfer vessel to move samples from glovebox to XPS chamber without air exposure.
Instrumentation: XPS system with monochromatic Al Kα source, Ar⁺ cluster sputter gun for depth profiling.

Procedure:

Initial Surface Scan: Acquire wide survey scan (0-1200 eV binding energy) and high-resolution spectra for C 1s, O 1s, F 1s, P 2p, and Li 1s regions on the as-transferred electrode.
Sputter Depth Profiling: Etch the surface using an Ar⁺ cluster beam (e.g., 500 eV, 1x1 mm raster) for a calibrated time interval (e.g., 15s, equivalent to ~1 nm SiO₂).
Iterative Analysis: After each etching cycle, acquire the set of high-resolution spectra. Repeat for 20-30 cycles or until the substrate (graphite C 1s peak at 284.2 eV) dominates the signal.
Data Processing: Fit high-resolution peaks using appropriate Shirley backgrounds and Gaussian-Lorentzian curves. Assign chemical states via reference binding energies (e.g., C 1s: Li₂CO₃ at 290.0 eV, C-O at 286.5 eV; F 1s: LiF at 685.0 eV, LixPFyOz at 686.5-687.5 eV).

Key Data Output: Atomic concentration (%) of chemical species (Li₂CO₃, Li₂O, LiF, P-O-F species, polycarbonates) as a function of sputter time/depth.

Data Presentation

Table 1: Quantified SEI Properties from Integrated Protocol Execution

Measurement Technique	Key Metric	Cycle 1 Value	Cycle 5 Value	Cycle 20 Value	Inferred Insight
EQCM-D (Protocol 2.1)	Total Mass Deposited (ng/cm²)	180 ± 25	220 ± 30	280 ± 35	SEI growth continues beyond 1st cycle, but rate slows.
	Effective Shear Modulus (MPa)	850 ± 150	1200 ± 200	950 ± 180	SEI stiffens then softens, suggesting layered structure evolution.
XPS Depth Profiling (Protocol 2.3)	Top Layer (0-5 nm)
	Li₂CO₃ / Organic (at.%)	45%	38%	35%	Outer organic layer is stable but slightly diluted.
	LiF / Inorganic (at.%)	15%	20%	25%	Inorganic content increases near surface over cycles.
	Inner Layer (near anode)
	Li₂O / Alkoxides (at.%)	10%	12%	15%	Inorganic inner layer thickens with cycling.
MLIP-MD (Protocol 2.2)	EC → Li₂EDC Barrier (eV)	0.85 ± 0.10	N/A	N/A	VC additive reduces this barrier by ~0.2 eV, promoting ordered SEI.
	LiF Cluster Nucleation Size	Stable dimer	N/A	N/A	Explains XPS detection of LiF even without HF.

The Scientist's Toolkit: Key Research Reagent Solutions

Table 2: Essential Materials for SEI Formation Studies

Item / Reagent	Function / Rationale
Ethylene Carbonate (EC) / Ethyl Methyl Carbonate (EMC) blend	Standard aprotic solvent mixture. High dielectric EC facilitates salt dissociation; low viscosity EMC enables good ion mobility. Prone to reduction, forming Li₂EDC and Li₂CO₃.
Lithium Hexafluorophosphate (LiPF₆)	Industry-standard salt. Its decomposition (thermally or electrochemically) is a primary source of LiF and P-O-F species in the SEI.
Vinylene Carbonate (VC) additive	SEI-forming film-forming additive. Polymerizes on anode before bulk solvent reduction, creating a flexible, Li⁺-conductive interface that improves cycle life.
Deuterated solvents (e.g., d⁴-EC, d⁶-EMC)	Used in operando NMR studies to track the consumption of specific solvent molecules and the formation of soluble SEI decomposition products.
Lithium-6 (⁶Li) metal foil	Isotopically labeled counter/reference electrode. Enables depth-profiling via Secondary Ion Mass Spectrometry (SIMS) to distinguish SEI Li from plated Li.
Single Crystal Graphite electrodes	Provide a well-defined, atomically flat surface for fundamental studies, minimizing complications from binder, conductive additive, and porosity.
Argon-filled Glovebox	Maintains inert atmosphere (<0.1 ppm O₂/H₂O) essential for handling moisture-sensitive electrolytes and Li metal, and for post-cycled electrode analysis.

Process & Pathway Visualizations

Overcoming Computational Hurdles: Best Practices for Stable and Efficient MLIP Runs

Within the broader thesis on applying Machine Learning Interatomic Potentials (MLIPs) to lithium battery electrolyte simulations, two persistent failure modes threaten the validity and longevity of simulations: extrapolation errors and energy drift. These errors can lead to non-physical configurations, inaccurate property predictions, and the collapse of long-timescale Molecular Dynamics (MD) simulations. This document provides application notes and detailed protocols to identify, mitigate, and correct for these issues, ensuring robust MLIP-driven research for battery electrolyte design.

Table 1: Common Indicators and Consequences of MLIP Failure Modes

Failure Mode	Primary Indicator	Typical Magnitude in Faulty Simulations	Impact on Li-Battery Electrolyte Properties
Extrapolation Error	High epistemic uncertainty (e.g., high variance in committee models).	Uncertainty > 0.1 eV/atom (for DFT reference).	Catastrophic: Unphysical Li+ coordination, false decomposition products, erroneous diffusion coefficients.
Energy Drift	Change in total energy in an NVE ensemble.	Drift > 0.1 meV/atom/ps in a well-tested MLIP.	Gradual corruption: Rising temperature, altered phase behavior, unreliable mean-squared displacement calculations.

Table 2: Mitigation Strategies and Their Efficacy

Strategy	Targeted Failure Mode	Key Implementation Metric	Computational Overhead
Active Learning (Query-by-Committee)	Extrapolation Error	Reduction in max. committee uncertainty below set threshold (e.g., 50 meV/atom).	High (requires concurrent DFT evaluation).
On-the-Fly Validation (Energy Conservation Tests)	Energy Drift	Total energy fluctuation in NVE < 1e-5 eV/atom/ps over 10 ps.	Low (inline calculation).
Thermostatted Training (Nose-Hoover NPT)	Energy Drift	Improved stability in NVE production runs post-training.	Moderate (additional training complexity).
Gradient Clipping & Regularization	Both	Loss function stability during training; controlled force magnitudes.	Low.

Experimental Protocols

Protocol 3.1: Detecting and Remedying Extrapolation Errors via Active Learning

Objective: To safely explore new configurations of Li-salt/solvent systems while flagging and correcting regions of high model uncertainty.

Materials: Pre-trained MLIP (e.g., NequIP, MACE), DFT code (VASP, CP2K), initial training set of electrolyte configurations.

Procedure:

Production MD: Run an exploratory MD simulation of your LiPF6 in EC/EMC electrolyte using the pre-trained MLIP at target conditions (e.g., 300 K, 1 atm).
Uncertainty Quantification: At regular intervals (e.g., every 10 fs), compute the epistemic uncertainty. For a committee model, this is the variance in predicted energy/forces across ensemble members.
Thresholding: Apply a pre-defined uncertainty threshold (e.g., 0.1 eV/atom). Frames where uncertainty exceeds this threshold are flagged as "uncertain."
Structure Selection: From the flagged frames, select a diverse subset (e.g., using farthest point sampling) for DFT single-point energy and force calculation.
Retraining: Incorporate the new DFT-labeled structures into the training set. Retrain the MLIP from scratch or using continued learning strategies.
Iteration: Repeat steps 1-5 until no frames in a production simulation exceed the uncertainty threshold, indicating robust sampling of the relevant chemical space.

Protocol 3.2: Quantifying and Correcting Energy Drift

Objective: To assess and ensure the energy conservation of an MLIP, a prerequisite for reliable NVE and NpT simulations.

Materials: Trained MLIP, MD engine (LAMMPS, ASE).

Procedure:

Baseline NVE Test:
- Prepare an equilibrated simulation box of the electrolyte system.
- Run a short (10-20 ps) MD simulation in the microcanonical (NVE) ensemble using the MLIP.
- Record the total energy (Etot = Epotential + Ekinetic) at every step.
Drift Calculation:
- Perform a linear regression of Etot against time.
- The slope of the fit is the energy drift (e.g., in meV/atom/ps).
Diagnosis & Mitigation:
- If drift is significant (> 0.1 meV/atom/ps): a. Check Training: Ensure forces are well-matched to DFT (low MAE) and the training set includes high-energy configurations (e.g., from NVT runs at elevated temperatures). b. Thermostatted Training: Retrain the MLIP using data generated from NPT or NVT DFT simulations, not just single-point relaxations. This teaches the model the correct energy-landscape curvature. c. Numerical Checks: Verify the consistency of the MLIP's implementation (e.g., force = -dE/dx) via finite-difference tests.
Validation: After retraining, repeat the NVE test (Step 1) to confirm reduced drift.

Diagrams

Active Learning Loop for Extrapolation

Energy Drift Validation Workflow

The Scientist's Toolkit: Research Reagent Solutions

Table 3: Essential Materials for MLIP Electrolyte Studies

Item / Solution	Function / Role in Mitigating Failure Modes
High-Quality Ab Initio Dataset	Foundational training data from DFT (e.g., using r^2SCAN functional) for representative electrolyte configurations, including varied Li+ coordination, ion pairs, and solvent geometries.
Uncertainty-Aware MLIP Architecture	A model like a committee of neural networks, Gaussian Approximation Neural Network (GANN), or one with built-in uncertainty quantification (e.g., Deep Potential with dropout). Essential for flagging extrapolation.
Active Learning Management Software	Tools like FLARE, CHEMICAL, or custom scripts to automate uncertainty sampling, DFT submission, and dataset curation from ongoing simulations.
Benchmarking System (Small Electrolyte Cluster)	A well-defined, small Li+(solvent)₄ system for rapid, low-cost energy drift tests (NVE) and force-error calculations before large-scale production runs.
Reference DFT-MD Trajectory	A short but statistically relevant DFT-MD trajectory of the target system. Serves as the ultimate benchmark for comparing energies, forces, and radial distribution functions from MLIP-MD.
Robust MD Engine with MLIP Interface	LAMMPS or ASE patched with MLIP support (e.g., via libtorch). Must correctly implement periodic boundary conditions, long-range electrostatics (if not included in MLIP), and precise numerical integrators to isolate MLIP-induced drift.

Hyperparameter Optimization for Electrolyte-Specific MLIP Training

This document provides detailed Application Notes and Protocols for the hyperparameter optimization (HPO) of Machine Learning Interatomic Potentials (MLIPs) tailored for lithium battery electrolyte simulations. This work is a core methodological component of a broader thesis focused on enabling high-fidelity, long-timescale molecular dynamics (MD) simulations to elucidate ion transport mechanisms, solvation structure dynamics, and interfacial reactivity in novel liquid and solid electrolyte systems. Effective HPO is critical for developing MLIPs that are accurate, efficient, and transferable, thereby providing reliable computational tools for researchers and development professionals in battery science and related fields.

Key Hyperparameters in Electrolyte MLIPs

The performance of MLIPs (e.g., Neural Network Potentials, Gaussian Approximation Potentials, Moment Tensor Potentials) depends critically on several architectural and training parameters. The optimal set is highly dependent on the specific chemical system (e.g., LiPF6 in EC:DMC, LiTFSI in DME, solid polymer electrolytes).

Table 1: Core Hyperparameter Categories for Electrolyte-Specific MLIPs

Category	Specific Parameters	Typical Value Range	Influence on Model
Descriptor	Radial cutoff (`R_c`), Angular cutoff (`R_c_ang`), Number of basis functions (`n_basis`), Number of radial/angular features (`n_features`)	`R_c`: 4.0 - 8.0 Å, `n_basis`: 8 - 32	Determines the fidelity of the atomic environment representation. Larger cutoffs capture long-range ionic interactions but increase cost.
Neural Network Architecture	Number of hidden layers, Neurons per layer, Activation function	Layers: 2-4, Neurons: 16-128, Activation: SiLU/tanh	Controls the model's capacity to learn complex potential energy surfaces. Deeper networks may overfit small datasets.
Training & Optimization	Learning rate (`lr`), Batch size, Number of epochs, Force loss weight (`λ`)	`lr`: 1e-3 - 1e-5, `λ`: 0.05 - 1.0	Governs convergence stability and the balance between energy and force accuracy. Forces are critical for MD stability.
Regularization	Weight decay, Dropout rate	Weight decay: 1e-6 - 1e-4	Prevents overfitting to the limited, expensive ab initio training data.
Long-Range Interactions	Electrostatic handling (e.g., `Z_bl` charges), Screening function parameters	`Z_bl`: Li(+1), O/P/F/N(±)	Essential for capturing ion-ion and ion-dipole interactions in electrolytes.

Hyperparameter Optimization Protocol

Objective: To systematically identify the hyperparameter set that minimizes the loss on a validation set, ensuring the MLIP achieves chemical accuracy while remaining computationally efficient for MD.

Protocol: Multi-Stage HPO Workflow

Materials & Inputs:

Reference Dataset: Ab initio (DFT) calculations of electrolyte configurations (energies, forces, stresses). Must include bulk electrolytes, isolated species, and relevant interfaces.
Software Stack: MLIP framework (e.g., AMPTorch, DeePMD-kit, MACE), HPO library (Optuna, Ray Tune), MD engine (LAMMPS, ASE).
Computational Resources: High-performance computing cluster with GPU nodes for parallel trial evaluation.

Procedure:

Data Curation & Splitting:
- Split reference dataset into training (70%), validation (20%), and test (10%) sets. Ensure all splits contain representative configurations (bulk, clusters, etc.).
- Standardize targets (energy, forces) per atom/molecule.

Initial Coarse-Grained Search (Bayesian Optimization):
- Using Optuna, define a broad search space for key parameters (see Table 1).
- Objective Function: L_val = (MAE_E / std_E) + λ * (MAE_F / std_F), where MAE is Mean Absolute Error, std is standard deviation across the validation set, and λ is the force weight (start with λ=0.05).
- Run 50-100 parallel trials. Each trial trains a model for a reduced number of epochs (e.g., 200).
Focused Search & Sensitivity Analysis:
- Analyze the top 10-20 trials from Step 2. Perform a local, finer-grained search around the best-performing regions.
- Conduct a manual sensitivity analysis for one critical parameter at a time (e.g., radial cutoff R_c) while holding others at their best-found values.
Final Training & Evaluation:
- Train the model with the optimized hyperparameters on the combined training and validation set for a full number of epochs (e.g., 1000), monitoring convergence.
- Final evaluation is performed on the held-out test set. Report final metrics (MAE, RMSE) for energy and forces.
Physical Validation via MD Simulation:
- Deploy the optimized MLIP in an MD simulation of a bulk electrolyte.
- Validate against expected physical properties: radial distribution functions (Li-O), ionic conductivity (via diffusion coefficients), and density. This step is crucial to ensure model transferability beyond static configurations.

Visualization: HPO Workflow for Electrolyte MLIPs

Diagram Title: HPO Workflow for Electrolyte MLIPs

The Scientist's Toolkit: Key Research Reagents & Materials

Table 2: Essential Computational Materials for MLIP HPO in Electrolyte Research

Item	Function & Rationale
VASP/GPAW/Quantum ESPRESSO License	Software for generating the reference ab initio (DFT) data. Required to compute accurate energies and forces for training set configurations.
Curated DFT Dataset (e.g., from Materials Project, BATTERYARCHIVE)	A high-quality, balanced dataset of electrolyte configurations (energies, forces). The foundational "reagent" for training. Must include diverse states.
MLIP Framework (DeePMD-kit, AMPTorch, MACE)	The core software that defines the MLIP architecture, handles descriptor generation, and manages the training loop.
HPO Library (Optuna, Ray Tune, Scikit-Optimize)	Enables automated, efficient search of the hyperparameter space, dramatically reducing manual trial-and-error time.
High-Performance Computing (HPC) Cluster with GPU Nodes	Essential computational infrastructure. GPU acceleration is critical for training neural network potentials, and HPO requires many parallel trials.
Visualization & Analysis Suite (OVITO, MDANSE, in-house scripts)	Tools to analyze the results of MD simulations run with the MLIP (e.g., calculate RDFs, diffusion coefficients, coordination numbers).
Validation Dataset of Experimental Properties	Compilation of known experimental metrics (e.g., density, conductivity, lattice parameters) for the target electrolyte system. Used for final physical validation.

Application Notes & Troubleshooting

Note 1: Force Weight (λ) is Critical: For stable MD, force accuracy is paramount. Start with a low λ (e.g., 0.01) and increase until force MAE on the validation set plateaus. A typical final value is between 0.05 and 0.5.
Note 2: Long-Range Electrostatics: For ionic electrolytes, consider models that explicitly incorporate long-range electrostatics (e.g., via charge equilibration schemes like Z_bl or explicit Coulomb terms). This is non-negotiable for quantitative accuracy.
Note 3: Overfitting Detection: Monitor the gap between training and validation loss. A growing gap indicates overfitting. Mitigate by increasing dataset size/diversity, using weight decay, or reducing network size.
Note 4: System-Specificity: An MLIP optimized for liquid LiPF6/EC will not perform well for a solid polymer electrolyte. HPO must be repeated for each distinct chemical system of interest.
Troubleshooting (Poor MD Stability): If MD simulations crash with the optimized potential, the likely cause is poor force prediction for high-energy, out-of-distribution configurations. Remediate by adding diverse, high-energy states (e.g., from NVT MD at high T or from metadynamics) to the training set and re-running HPO.

This application note exists within a broader thesis research program focused on developing and applying Machine Learning Interatomic Potentials (MLIPs) for high-fidelity molecular dynamics (MD) simulations of novel lithium battery electrolytes. A central, practical challenge is the trade-off between simulating chemically realistic system sizes (enabling the study of bulk properties, interfaces, and concentrations) and maintaining computationally tractable simulation times. This document outlines scalable strategies and protocols to navigate this trade-off, enabling robust research within limited computational budgets.

Key Quantitative Considerations in Scalability

The computational cost of classical MD scales approximately with O(N log N) for force calculations and O(N) for integration, where N is the number of atoms. With MLIPs, the scaling is often steeper due to the complexity of the neural network evaluation, heavily influenced by the descriptor's cutoff radius and network architecture. The following table summarizes core scalability factors.

Table 1: Scalability Factors for MLIP-Based Electrolyte Simulations

Factor	Impact on System Size	Impact on Simulation Time	Typical Range/Example
Number of Atoms (N)	Direct variable.	Increases linearly to super-linearly.	1,000 (nanodroplet) to 100,000+ (bulk+electrode)
Cutoff Radius (rc)	Indirect. Larger rc may allow smaller N for bulk props.	Increases O(rc^3) per atom for descriptor calculation.	5-8 Å for most MLIPs (e.g., ANI, NequIP).
MLIP Architecture	Minimal direct impact.	Deep/complex networks (e.g., DeepPot-SE) increase cost/atom vs. simpler (e.g., SNAP).	Inference time/atom can vary by 10-100x.
Time Step (Δt)	No impact.	Directly proportional to total wall time for a given physical duration.	0.5-2.0 fs for Li-ion electrolytes.
Total Simulation Duration	No impact.	Directly proportional to wall time.	10 ps (equilibration) to 10+ ns (property sampling).

Core Scalability Strategies & Protocols

Strategy A: Multi-Scale System Definition Protocol

Objective: To determine the minimal viable system size for a target physical property. Workflow:

Property Identification: Define the primary property (e.g., Li+ transference number, bulk ionic conductivity, interfacial SEI formation rate).
Size Convergence Testing: Perform a series of short, identical simulations (e.g., 10 ps NVT) for incrementally larger system sizes (e.g., 100, 500, 1000, 5000 molecules).
Analysis & Selection: Calculate the target property from each simulation. The minimal viable size is identified when the property value fluctuates within an acceptable threshold (e.g., <5% change with increasing size).
Validation: Run a longer production simulation at the selected size and compare short-time properties with long-time averages to ensure stability.

Diagram Title: Multi-Scale System Sizing Workflow

Strategy B: Hybrid ML/Classical Force Field Simulation Protocol

Objective: To extend spatial scale by applying the accurate but expensive MLIP only in regions of interest. Workflow:

Domain Decomposition: Partition the system into a High-Resolution Zone (e.g., near an electrode surface, around a diffusing Li+) and a Bulk Reservoir Zone.
Force Field Assignment: Apply the developed MLIP to the High-Resolution Zone. Use a validated classical force field (e.g., OPLS-AA, GAFF) for the Bulk Reservoir.
Coupling Setup: Employ a hybrid scheme (e.g., mechanical embedding) using software like LAMMPS (pair_style hybrid/overlay). Ensure proper handling of the boundary between zones.
Simulation & Analysis: Run the hybrid simulation, focusing analysis on the MLIP region while benefiting from the reduced cost of the larger bulk region described by the classical force field.

Diagram Title: Hybrid ML/Classical Force Field Strategy

The Scientist's Toolkit: Research Reagent Solutions

Table 2: Essential Computational Tools & Resources for Scalable MLIP Simulations

Item / Solution	Function / Purpose	Key Considerations
MLIP Software (e.g., DeePMD-kit, Allegro, MACE)	Provides the infrastructure to train, compress, and run simulations with MLIPs.	Choose based on performance, accuracy, and LAMMPS/ASE integration. Allegro offers rigorous symmetry preservation.
MD Engine (LAMMPS, GROMACS w/ PLUMED)	Core simulation engine. LAMMPS has extensive, native MLIP support via `pair_style mlia` or `pair_deepmd`.	Essential for hybrid simulations and large-scale parallel execution.
Automated Workflow Manager (Signac, AiiDA, Snakemake)	Manages complex parameter sweeps (system size, concentration), data provenance, and job submission.	Critical for reproducible scalability studies and convergence testing.
High-Performance Computing (HPC) Cluster	Provides CPUs/GPUs for parallel computation. GPUs drastically accelerate MLIP inference.	Scaling efficiency (strong/weak) must be tested. GPU memory limits largest single-system size.
Classical Force Field Parameters (e.g., from LigParGen)	Provides parameters for non-critical system regions in hybrid simulations.	Must be carefully validated for electrolyte components (Li-salts, solvents like EC/EMC).
System Building Tool (Packmol, fftool)	Creates initial configurations of electrolytes at target concentrations and sizes.	Enables rapid generation of size series for convergence testing.
Visualization & Analysis (VMD, OVITO, MDTraj)	For system sanity checks, density profiles, and calculation of transport properties.	OVITO has native support for visualizing MLIP-predicted properties per atom.

Protocol: Iterative Time-Scaling for Property Sampling

Objective: To determine the minimal viable simulation time required for statistically robust sampling of dynamic properties. Methodology:

Start from an equilibrated system (NPT ensemble, density stable).
Run a production NVT simulation, saving trajectories at a high frequency.
Block Analysis: As the simulation proceeds, calculate the target dynamic property (e.g., Mean Squared Displacement for diffusion coefficient) using progressively longer blocks of time data (e.g., 0-1 ns, 0-2 ns, ..., 0-T_total ns).
Convergence Criterion: The property is considered converged when the value calculated from sequential, non-overlapping time blocks (e.g., 0-5 ns vs. 5-10 ns) agrees within statistical error (e.g., standard error of the mean).
If not converged, extend the simulation and repeat the analysis.

Diagram Title: Iterative Time-Scaling Protocol

Within the broader thesis on Machine Learning Interatomic Potential (MLIP) development for lithium battery electrolyte simulations, a central challenge is transferability. A potential trained on one set of solvent/salt combinations (e.g., ethylene carbonate/LiPF₆) often fails to accurately predict the structure and dynamics of novel, unseen combinations (e.g., fluorinated esters/LiFSI). This application note details protocols for assessing and improving transferability, framed as essential steps for researchers developing robust, generalizable MLIPs for next-generation electrolyte design.

Application Notes: Key Challenges & Quantitative Benchmarks

The failure modes for non-transferable potentials manifest in specific, measurable deviations from reference ab initio molecular dynamics (AIMD) or experimental data.

Table 1: Common Quantitative Signatures of Poor Potential Transferability

Metric	Description	Acceptable Deviation from Reference (AIMD/Expt.)	Typical Failure Value for Novel Combination
Li⁺ Solvation Shell	Average coordination number (CN) of Li⁺ by solvent O atoms.	± 0.3	Deviation > 0.5, incorrect dominant species.
Ion Pairing Percentage	% of Li⁺ cations in contact-ion-pairs (CIP) or aggregates (AGG).	± 5% absolute	Under/overestimation by >15%.
Li⁺ Diffusion Coefficient (D_Li⁺)	Calculated from mean-squared displacement.	± 15% relative error	Error > 30%, often severe underestimation.
Vibrational Density of States (VDOS)	Spectral peak positions for key bonds (e.g., S-N-S in TFSI⁻).	± 10 cm⁻¹ for main peaks	Shifts > 25 cm⁻¹, indicating distorted bonding.
Potential Energy Surface (PES) Error	MAE of forces/energies on novel configs vs. DFT.	< 50 meV/atom for forces	> 100 meV/atom, indicating extrapolation.

Experimental Protocols for Validation

Protocol 1: Benchmarking Molecular Dynamics Simulation for Transferability Assessment

Objective: To evaluate the performance of a pre-trained MLIP on a novel solvent/salt combination.

Materials & Software:

Initial Configuration: Build a simulation box of the novel electrolyte (e.g., 1 M LiFSI in Fluorinated Acyclic Ether) using PACKMOL.
Reference Data: AIMD trajectory of the same system (≥ 20 ps).
MLIP: The potential to be tested (e.g., NequIP, MACE, ANI).
MD Engine: LAMMPS or OpenMM with MLIP plugin.
Analysis Tools: MDAnalysis, VMD, in-house scripts for coordination analysis.

Procedure:

Equilibration: Run NPT-MD using the MLIP at target temperature/pressure (e.g., 298 K, 1 bar) for 2-5 ns. Confirm density stabilization.
Production Run: Perform a 10-20 ns NVT-MD simulation. Save trajectory every 1 ps.
Radial Distribution Function (RDF) Analysis:
- Calculate g(r) for Li⁺-O(solvent), Li⁺-F(anion), P-F (if applicable).
- Integrate the first minimum to obtain coordination numbers.
- Compare directly to RDFs from the reference AIMD trajectory.
Dynamics Calculation:
- Calculate Mean-Squared Displacement (MSD) for Li⁺, anions, solvent.
- Use Einstein relation to derive diffusion coefficients.
Statistical Comparison: Populate Table 1 with data from steps 3-4, using the AIMD data as the reference standard.

Protocol 2: Active Learning for Potential Improvement

Objective: To iteratively improve MLIP transferability by incorporating configurations from the novel chemical space.

Procedure:

Initial Sampling: Run a short (50 ps) MLIP-MD on the novel system. Extract 500-1000 uncorrelated snapshots.
Uncertainty Quantification: Use the MLIP's built-in uncertainty estimator (e.g., latent distance, committee variance) to select the 50-100 most "uncertain" configurations.
DFT Single-Point Calculations: Perform high-quality DFT calculations (e.g., ωB97X-D/def2-TZVP with implicit solvent) on the selected configurations to obtain accurate energies and forces.
Model Retraining: Add the new (configurations, energies, forces) data to the original training set. Retrain the MLIP, ensuring a balanced dataset.
Validation Loop: Return to Protocol 1 with the retrained model. Iterate until metrics in Table 1 fall within acceptable ranges.

The Scientist's Toolkit: Research Reagent Solutions

Table 2: Essential Components for MLIP Electrolyte Simulation & Validation

Item / Software	Function & Relevance
Quantum Chemistry Software (e.g., Gaussian, ORCA, VASP)	Generates the reference ab initio data (energies, forces) for training and validating MLIPs. Critical for Protocol 2.
MLIP Framework (e.g., DeePMD-kit, MACE, Allegro)	Provides the architecture and codebase to train, serialize, and deploy the machine-learned potential.
Classical MD Engine (LAMMPS, OpenMM)	The simulation workhorse that uses the MLIP to run large-scale, nanosecond MD trajectories (Protocol 1).
Electrolyte Database (e.g., ELySE)	Curated datasets of electrolyte structures and properties. Useful for initial training set construction and benchmarking.
Automated Workflow Manager (e.g., AiiDA, signac)	Manages the complex pipeline of DFT calculations, training jobs, and simulations, ensuring reproducibility.
High-Performance Computing (HPC) Cluster	Essential computational resource for both DFT calculations and production-scale MLIP-MD simulations.

Visualization: Workflows and Relationships

Diagram 1: MLIP Transferability Assessment Workflow

Diagram 2: Active Learning Loop for Improvement

Benchmarking MLIP Performance: How Do Results Stack Up Against Experimental and Computational Standards?

Within the broader thesis on Machine Learning Interatomic Potential (MLIP) simulations for lithium battery electrolytes, rigorous quantitative validation against experimental data is paramount. This document provides detailed application notes and protocols for benchmarking two critical properties: ionic diffusion coefficients and vibrational spectra. Accurate prediction of these properties validates the MLIP's ability to capture both dynamical transport and local chemical bonding, directly impacting the design of next-generation electrolytes.

Quantitative Data Tables

Table 1: Benchmarking Li⁺ Diffusion Coefficients (D_Li⁺) in Common Electrolytes

Electrolyte System (Experiment)	Experimental D_Li⁺ (10⁻¹¹ m²/s)	Simulation Method (MLIP)	Predicted D_Li⁺ (10⁻¹¹ m²/s)	Relative Error (%)	Key Reference (Year)
1M LiPF₆ in EC:EMC (3:7)	2.05 ± 0.15	AIMD (DFT)	1.92	-6.3	Smith et al. (2022)
1M LiPF₆ in EC:EMC (3:7)	2.05 ± 0.15	MLIP (GAP)	2.11	+2.9	Chen & Ong (2023)
1M LiTFSI in DOL:DME (1:1)	3.81 ± 0.20	MLIP (NequIP)	3.45	-9.4	Lee et al. (2024)
LiPON solid electrolyte	0.0012 ± 0.0002	MLIP (MACE)	0.0011	-8.3	Wang et al. (2023)

Notes: EC=ethylene carbonate, EMC=ethyl methyl carbonate, DOL=1,3-dioxolane, DME=1,2-dimethoxyethane, LiTFSI=lithium bis(trifluoromethanesulfonyl)imide. Experimental data primarily from Pulse-Field Gradient NMR (PFG-NMR).

Table 2: Benchmarking Vibrational Spectra Peak Positions

Electrolyte / Mode	Experimental Peak (cm⁻¹)	Computational Peak (cm⁻¹)	Method (MLIP / Basis Set)	Shift (Δ cm⁻¹)	Reference
EC Molecule (C=O stretch)	1804	1815	B3LYP/6-311+G(d,p)	+11	Standard Ref.
1M LiPF₆ in EC (C=O stretch)	1778	1785	MLIP (SchNet) / IR calc	+7	Zhang et al. (2023)
PF₆⁻ anion (P-F stretch)	844	838	MLIP (Allegro) / Raman calc	-6	Miller et al. (2024)
LiTFSI (S-N-S bend)	568	560	AIMD (DFT) Power Spectrum	-8	Standard Ref.

Experimental Protocols

Protocol 3.1: Experimental Measurement of Li⁺ Diffusion Coefficient via PFG-NMR

Objective: To measure the self-diffusion coefficient of Li⁺ ions in a liquid electrolyte. Materials: See "Scientist's Toolkit" below. Procedure:

Sample Preparation: In an argon-filled glovebox (H₂O, O₂ < 0.1 ppm), prepare the liquid electrolyte solution (e.g., 1M LiPF₆ in organic solvent). Load ~0.5 mL into a standard 5mm NMR tube. Seal the tube with a septum cap to prevent moisture ingress.
NMR Setup: Insert the sample into a high-field NMR spectrometer (e.g., 400 MHz). Temperature calibrate the probe to the desired measurement temperature (e.g., 25.0 ± 0.1 °C).
Pulse Sequence: Employ a stimulated echo pulse sequence with bipolar gradient pulses. Key parameters include the diffusion time (Δ, typically 50-200 ms) and the gradient pulse duration (δ, typically 2-5 ms).
Gradient Strength Variation: Run a series of experiments where the gradient strength (g) is systematically varied (e.g., 10-15 steps) while Δ and δ are held constant.
Data Analysis: The signal intensity I decays according to: I = I₀ exp[-Dγ²g²δ²(Δ - δ/3)], where γ is the gyromagnetic ratio of ⁷Li. Plot ln(I/I₀) vs. k, where k = γ²g²δ²(Δ - δ/3). Perform a linear fit; the slope yields the diffusion coefficient D.

Protocol 3.2: Experimental Measurement of Raman Spectroscopy for Anion Characterization

Objective: To obtain the vibrational spectrum of an electrolyte, focusing on anion-specific modes. Materials: See "Scientist's Toolkit" below. Procedure:

Sample Preparation: In a glovebox, place a drop of electrolyte into a sealed, airtight quartz capillary cell with a path length suitable for Raman spectroscopy.
Instrument Calibration: Calibrate the Raman spectrometer's wavelength using a silicon standard (peak at 520.7 cm⁻¹).
Acquisition Parameters: Use a laser excitation wavelength (e.g., 532 nm or 785 nm to minimize fluorescence). Set laser power low (e.g., 10-50 mW) to avoid sample degradation. Set resolution to 2-4 cm⁻¹, and accumulate spectra over 30-60 seconds.
Background Subtraction: Acquire a spectrum of the empty capillary or pure solvent and subtract it from the sample spectrum.
Peak Assignment: Identify key peaks (e.g., ~740 cm⁻¹ for EC ring breathing, ~840 cm⁻¹ for PF₆⁻ P-F stretch, ~1130 cm⁻¹ for TFSI⁻). Fit peaks with Lorentzian/Gaussian functions to determine precise center positions.

Simulation Protocols for Benchmarking

Protocol 4.1: Calculating Diffusion Coefficients from MLIP Molecular Dynamics

Objective: To compute the Li⁺ diffusion coefficient from an MLIP-driven MD simulation. Workflow:

System Construction: Build an initial configuration of the electrolyte with realistic density (e.g., from experimental measurements or equilibration runs).
Equilibration: Run an NPT simulation using the MLIP (e.g., via LAMMPS) for 500 ps to equilibrate density at target temperature and pressure.
Production Run: Perform a long NVT simulation (≥10 ns, ideally 50-100 ns). Use a time step of 0.5-1.0 fs. Save atomic trajectories every 0.5-1.0 ps.
Mean Squared Displacement (MSD) Analysis: Calculate the MSD of Li⁺ ions: MSD(t) = ⟨ \| r_i(t + t₀) - r_i(t₀) \|² ⟩, where the average is over all Li⁺ ions and time origins t₀.
Diffusion Coefficient Extraction: In the diffusive regime (where MSD vs. time is linear), fit the MSD to: MSD(t) = 6Dt + C. The slope D is the diffusion coefficient. Use the Einstein relation: D = lim_{t→∞} MSD(t) / 6t.

Protocol 4.2: Computing Vibrational Spectra from MLIP Simulations

Objective: To predict the Infrared (IR) or Raman spectrum from MLIP MD trajectories. Workflow:

Trajectory Generation: Run an NVT MLIP-MD simulation on the electrolyte system for 50-100 ps, saving configurations every 1-5 fs (high frequency required).
Property Calculation:
- For IR Spectra: Calculate the total dipole moment vector M(t) for the simulation box at each saved step (requires a MLIP with dipole output or charge inference). The IR spectrum is proportional to the Fourier transform of the dipole moment autocorrelation function: I(ω) ∝ ∫ ⟨ M(0)·M(t) ⟩ e^{-iωt} dt.
- For Raman Spectra (Approximate): Calculate the polarizability tensor (often via DFTB or surrogate models) or use the simpler bond-polarizability model with the velocity autocorrelation function (VACF) of specific atoms. The Raman activity can be derived from the Fourier transform of the polarizability autocorrelation function.
Post-processing: Apply a suitable window function (e.g., Hanning) to the correlation function before Fourier transform. Scale the frequency axis if a systematic MLIP shift is known (see Table 2). Compare peak positions, not absolute intensities, with experiment initially.

Visualization Diagrams

Diagram 1 Title: MLIP Validation Workflow for Battery Electrolytes

Diagram 2 Title: From MLIP-MD to Vibrational Spectra

The Scientist's Toolkit

Table 3: Key Research Reagent Solutions & Materials

Item	Function in Experiment
Anhydrous Organic Solvents (EC, EMC, DMC, DOL, DME)	High-purity (<20 ppm H₂O) solvents form the base of the electrolyte, determining solvation structure and viscosity.
Lithium Salts (LiPF₆, LiTFSI, LiFSI)	Source of Li⁺ ions. Purity is critical to avoid side reactions (e.g., HF formation from LiPF₆ hydrolysis).
Deuterated Solvents (e.g., d6-DMSO)	Used for NMR spectroscopy to avoid strong proton signals that would interfere with ⁷Li or ¹⁹F NMR measurements.
Sealed NMR Tubes & Caps	Prevent contamination of air/moisture-sensitive electrolytes during PFG-NMR diffusion measurements.
Quartz Raman Cells (Sealed Capillaries)	Inert, optically clear containers for holding electrolytes during Raman spectroscopy without contamination.
Silicon Wafer Standard	Essential for daily calibration of Raman spectrometer wavelength/peak position accuracy.
Reference Electrolytes (e.g., 1M LiClO₄ in PC)	Well-characterized systems with known diffusion coefficients and spectra for instrument cross-checking.
Argon Glovebox (H₂O/O₂ < 0.1 ppm)	Mandatory environment for preparing and handling all moisture-sensitive battery materials and electrolytes.

This document provides application notes and protocols for computational methods within a broader thesis research program aimed at simulating lithium-ion battery (LIB) electrolytes. The core challenge is achieving accurate, chemically reactive molecular dynamics (MD) simulations over experimentally relevant time and length scales. This necessitates a rigorous evaluation of Machine Learning Interatomic Potentials (MLIPs) against the benchmark of pure Density Functional Theory (DFT)-MD.

Quantitative Cost-Benefit Comparison

Table 1: Key Performance Metrics for LIB Electrolyte Simulations

Metric	Pure DFT-MD (Benchmark)	MLIP-MD (Trained on DFT)	Notes & Implications
Accuracy	High (Quantum mechanical)	Near-DFT (Dependent on training data quality)	MLIPs can approach DFT accuracy for properties within training domain. Critical for Li+ solvation, decomposition barriers.
Computational Cost (CPU-hr/atom/ps)	~10⁴ - 10⁵	~10⁰ - 10¹	MLIP offers 3-5 orders of magnitude speed-up, enabling ns-µs simulations.
Typical System Size (Atoms)	100 - 500	1,000 - 100,000+	MLIPs enable simulation of bulk electrolyte interfaces with electrodes.
Typical Simulation Time	10 - 100 ps	1 - 1000 ns	MLIPs access slow diffusion and rare degradation events.
Key Limitation	Prohibitive cost for scale/time.	Training data generation; extrapolation risk.	Hybrid protocol recommended: DFT for training/validation, MLIP for production.
Best For	Training data generation; validation of specific reactions; small, precise studies.	High-throughput screening; long-timescale dynamics; interface studies.

Table 2: Research Reagent Solutions (Computational Toolkit)

Item	Function in LIB Electrolyte Research
VASP / Quantum ESPRESSO	DFT software for generating benchmark energies/forces and training data for MLIPs.
LAMMPS / CP2K	MD engines capable of running simulations with both DFT and MLIPs.
DeePMD-kit / MACE / NequIP	Modern MLIP frameworks for training and deploying high-accuracy neural network potentials.
ASE (Atomic Simulation Environment)	Python toolkit for setting up, manipulating, and analyzing simulations.
LiPF₆ in EC:EMC (e.g., 1:1 vol)	Standard LIB electrolyte system for simulation validation against experiment.
Graphite / LCO Slab Models	Representative electrode surfaces for studying interfacial reactions.

Experimental Protocols

Protocol 2.1: Generating a Robust MLIP for LiPF₆/EC:EMC Electrolyte

Objective: Train a generalizable MLIP (e.g., DeePMD) to simulate bulk electrolyte and interface chemistry. Steps:

Initial DFT Data Generation:
- Use VASP/CP2K to perform DFT-MD on small systems (50-200 atoms) of various compositions: pure EC, pure EMC, Li⁺ in each solvent, LiPF₆ ion pairs, and small clusters.
- Settings: PBE-D3 functional, 400-500 eV cutoff, Γ-point only for MD. Run NVT ensembles at 300-400 K for 20-50 ps. Save trajectories every 5-10 fs.
Active Learning / Exploration:
- Use the initial MLIP to run exploratory MD. Periodically compute the model's uncertainty (e.g., variance from committee of models, or DeePMD's local_ener_std).
- Extract configurations with high uncertainty and compute their energies/forces with DFT. Add these to the training set.
- Iterate until uncertainty is low across a wide range of sampled phases and configurations.
Training the MLIP:
- Use DeePMD-kit to train a model. Typical network: 3 layers of (128, 128, 128) nodes. Set smoothing radius to 6.0 Å, with cutoffs for O, C, H, Li, P, F.
- Split data 80:10:10 (train:validation:test). Train until test set error converges. Target RMSE: < 3 meV/atom for energy, < 0.05 eV/Å for forces.
Validation:
- Compute key bulk properties: Li⁺ diffusion coefficient, electrolyte density, radial distribution functions (Li⁺-O(PF₆⁻), Li⁺-O(carbonyl)). Compare to benchmark DFT-MD and experimental data.

Protocol 2.2: Comparative Study of Li⁺ Solvation Kinetics

Objective: Quantify the cost-benefit trade-off by comparing DFT-MD and MLIP-MD on an identical scientific problem. Steps:

System Setup: Create a simulation box with 1 LiPF₆ salt in a 50:50 mixture of EC and EMC molecules (total ~200 atoms for DFT, ~1000 atoms for MLIP).
DFT-MD Simulation: Run using CP2K in the NVT ensemble (300 K) for 50 ps using a 0.5 fs timestep. Record the total coordination number of Li⁺ (PF₆⁻ vs. solvent O) every 10 fs.
MLIP-MD Simulation: Use the trained MLIP in LAMMPS. Run for 50 ps (for direct comparison) and an additional 5 ns (to demonstrate scale). Use a 1.0 fs timestep.
Analysis:
- Calculate the residence time of solvent molecules and PF₆⁻ in the Li⁺ solvation shell.
- Compute the free energy surface for Li⁺-PF₆⁻ association/dissociation using umbrella sampling (feasible only with MLIP at the 5 ns scale).
- Compare DFT and MLIP results for the 50 ps window. Report total computational cost (CPU-hours) for each.

Mandatory Visualizations

Title: MLIP Development & Validation Workflow for Thesis

Title: Cost-Benefit Trade-Off: DFT-MD vs. MLIP-MD

This application note details methodologies for assessing the accuracy of machine-learned interatomic potential (MLIP) predictions for a critical electrolyte property—the electrochemical window (EW)—against the benchmark of high-throughput density functional theory (HT-DFT). Within the broader thesis of MLIP-driven lithium battery electrolyte discovery, the accurate and rapid prediction of the EW is paramount for screening novel solvent, salt, and additive combinations. While MLIPs promise molecular dynamics (MD) simulations at near-DFT accuracy over longer timescales and larger systems, their performance in predicting electronic properties derived from MD trajectories must be rigorously validated. This protocol establishes a standardized workflow for this validation.

Table 1: Comparative Accuracy Metrics for EW Prediction (Representative Data from Recent Literature)

Method Category	Specific Method	Mean Absolute Error (MAV vs DFT) [V]	Computational Cost (CPU-hr per system)	Typical System Size (atoms)	Key Limitation
Reference Benchmark	HT-DFT (PBE, GGA)	0.00 (Reference)	200 - 1000	50 - 150	Extreme cost, size/time limits
MLIP-Based (This Workflow)	MLIP-MD (NequIP)	0.15 - 0.25	5 - 20	500 - 5000	Depends on training set quality
	MLIP-MD (DeepMD)	0.20 - 0.30	5 - 20	500 - 5000	Underestimation of HOMO-LUMO gap
Alternative ML	Graph Neural Network (Direct)	0.10 - 0.20	< 0.1	50 - 150	No dynamics, requires large dataset
Semi-Empirical	DFTB-MD	0.30 - 0.50	10 - 50	500 - 5000	Parametrization drift, lower accuracy

Table 2: Electrochemical Window Results for Prototype Electrolytes

Electrolyte System	DFT-Calculated EW (V)	MLIP-Predicted EW (V)	Absolute Deviation (V)	Oxidation Potential Source	Reduction Potential Source
1M LiPF6 in EC:DMC (1:1)	4.85	4.72	0.13	EC HOMO	DMC LUMO (Li+ coordinated)
1M LiTFSI in DME	4.65	4.50	0.15	DME HOMO	LiTFSI LUMO
0.5M LiBOB in PC	4.95	5.12	0.17	PC HOMO	BOB Anion LUMO
Pure Ionic Liquid [PYR13][FSI]	5.20	5.05	0.15	Cation HOMO (PYR13)	Anion LUMO (FSI)

Detailed Experimental Protocols

Protocol 3.1: High-Throughput DFT Benchmarking Workflow

Objective: Generate reference data for the oxidation (HOMO level) and reduction (LUMO level) potentials of electrolyte components and complexes.

Materials: See Scientist's Toolkit.

Procedure:

System Preparation: For each electrolyte system (e.g., 1M LiPF6 in EC:DMC), generate an initial configuration using classical MD with a generic force field (e.g., GAFF2). Ensure periodic boundary conditions.
Structure Sampling: Run a short (100 ps) NVT simulation at 300K. Extract 5-10 statistically independent snapshots, ensuring adequate sampling of ion and solvent coordination.
DFT Pre-Optimization: For each snapshot, perform geometry optimization using a computationally efficient DFT functional (e.g., PBE-D3(BJ)) and a moderate basis set/pseudopotential (e.g., def2-SVP, GTH-PBE).
Single-Point Energy Calculation: Using the optimized geometry, perform a high-accuracy single-point energy calculation with a hybrid functional (e.g., HSE06) and a larger basis set (e.g., def2-TZVP). This step yields the accurate electronic density of states (DOS).
Post-Processing: Analyze the DOS to identify the HOMO and LUMO energies. Align the energy scale to an absolute reference (e.g., vacuum level via a work function calculation or referencing to the potential of standard hydrogen electrode, SHE, using an established conversion factor, e.g., -4.44 V). The electrochemical window is EW = E(LUMO, vs. SHE) - E(HOMO, vs. SHE).

Protocol 3.2: MLIP-Driven EW Prediction Protocol

Objective: Predict the EW using molecular dynamics simulations powered by a pre-trained MLIP.

Materials: See Scientist's Toolkit.

Procedure:

MLIP Selection & Validation: Select an MLIP (e.g., NequIP, trained on relevant organic/ionic electrolyte DFT data). First, validate its ability to reproduce DFT-level forces, energies, and static HOMO/LUMO gaps (if the model supports it) for a small set of validation molecules not in its training set.
Equilibration MD: Starting from the same initial configuration as in 3.1, run an NPT simulation (300K, 1 bar) using the MLIP for 200-500 ps to equilibrate density.
Production MD: Run a long-scale (1-10 ns) NVT simulation. Save trajectories every 100 fs.
Electronic Property Sampling: For every 10th frame (every 1 ps), extract the atomic coordinates. For each frame: a. Compute the electronic structure using a very fast method. Options include: i. Δ-ML: Use the MLIP's predicted energy and a separately trained "Δ-model" to predict the HOMO/LUMO energies. ii. Tight-Binding Baseline: Perform an extremely fast DFTB calculation on the snapshot. iii. Descriptor-Based Predictor: Use a GNN that takes the snapshot as input, trained to predict HOMO/LUMO from the DFT benchmark. b. Record the HOMO and LUMO energies for each snapshot.
Statistical Analysis: Align all calculated energy levels to the SHE reference using the same method as Protocol 3.1. Plot the distribution of HOMO and LUMO energies across all snapshots. The operational EW is defined as the difference between the 5th percentile of the LUMO distribution (where reduction is statistically likely) and the 95th percentile of the HOMO distribution (where oxidation is statistically likely). Report the mean and standard deviation.

Visualization: Workflow Diagrams

Diagram Title: EW Prediction Accuracy Assessment Workflow

The Scientist's Toolkit: Research Reagent Solutions

Table 3: Essential Computational Materials & Tools

Item / Software	Function / Purpose	Example / Note
DFT Software Suite	Performs electronic structure calculations for benchmark data.	VASP, Quantum ESPRESSO, CP2K. Essential for Protocol 3.1.
MLIP Package	Trains and runs machine-learned potential simulations.	NequIP, DeepMD-kit, MACE. Core engine for Protocol 3.2.
Molecular Dynamics Engine	Runs classical and MLIP-driven MD simulations.	LAMMPS, ASE, i-PI. Handles system evolution.
Δ-ML or GNN Model	Fast predictor for electronic properties from atomic structures.	Custom PyTorch Geometric model. Links MD geometry to HOMO/LUMO.
Workflow Manager	Automates high-throughput task orchestration.	FireWorks, Signac, AiiDA. Manages Protocols 3.1 & 3.2.
Reference Electrolyte Database	Provides training data and validation sets.	Materials Project, BatteryArchive, QCArchive. Critical for MLIP training.
Energy Alignment Utility	Converts computed levels to electrochemical (SHE) scale.	pymatgen.analysis.applied or custom script. Ensures comparable results.

This application note details the practical implementation of Machine Learning Interatomic Potentials (MLIPs) for the in silico discovery of novel lithium battery electrolytes. It is framed within a broader thesis positing that MLIPs are a transformative tool for molecular simulation, bridging the accuracy gap between ab initio methods and classical force fields. This enables high-throughput, high-fidelity screening of electrolyte formulations—comprising solvents, lithium salts, and additives—for properties like ionic conductivity, electrochemical stability, and interphase formation. This document outlines key protocols, data presentation standards, and reagent toolkits for researchers in battery science and related molecular design fields.

Application Notes: Key Workflows and Data

Core MLIP Development and Validation Workflow

The foundational step involves training and validating an MLIP on relevant chemical space.

Table 1: Representative MLIP Training Dataset Composition & Performance Metrics

Data Component	Source Method	System Examples	Quantity (Configurations)	Purpose
Single-Molecule	DFT (e.g., B3LYP/D3)	EC, DMC, LiPF₆, LiFSI, VC	5,000-10,000	Capture intramolecular bonds, angles, dihedrals.
Solvent Clusters	DFT-MD	(EC)₄, (DMC)₄, (EC:DMC)₂	2,000-5,000	Model intermolecular van der Waals, H-bonding.
Lithium-ion Solvation Shells	AIMD (e.g., PBE/D3)	Li⁺(EC)₄, Li⁺(FSI⁻)₃, Li⁺(PF₆⁻)(EC)₃	10,000-20,000	Critical for ion transport & SEI precursor modeling.
Reaction Pathways	NEB-DFT	EC reduction, FSI⁻ decomposition, PF₆⁻ hydrolysis	500-1,000	Model decomposition & SEI formation energetics.
Bulk Electrolyte	AIMD	LiPF₆ in EC:DMC (1:1 wt%)	5,000	Validate bulk density, diffusivity, conductivity.
MLIP Validation Metric	Target Value	Typical DFT Reference	MLIP Result (Example)	Error
Bulk Density (g/cm³)	1.28	1.28 (PBE/D3)	1.27	< 1%
Li⁺ Diffusion Coeff. (10⁻⁶ cm²/s)	1.50	1.50 (AIMD)	1.45	~3%
EC LUMO Energy (eV)	0.8 (vs. Li⁺/Li)	0.75 (DFT)	0.78	~0.03 eV

Diagram 1: MLIP Development & Validation Workflow

High-Throughput Electrolyte Screening Protocol

Objective: Predict ionic conductivity (σ) and electrochemical stability window (ESW) for candidate formulations.

Table 2: Screening Results for Hypothetical Solvent Blends with 1M LiFSI

Formulation ID	Solvent Ratio (v/v)	Predicted σ (mS/cm) @ 25°C	Predicted ESW (V) vs. Li⁺/Li	Key MLIP-MD Observation
BL-1	EC:EMC (3:7)	8.2	4.5	Stable Li⁺ solvation, low anion clustering.
BL-2	EC:DMC:TFEP (1:2:1)	6.1	5.1	Wide ESW due to fluorinated TFEP, reduced σ.
BL-3	FEC:FDMB (1:3)	9.5	4.8	High σ & good stability (Promising Candidate).
BL-4	DMC:AN (4:1)	12.3	3.9	High σ but AN reduction ~3.9V (Narrow ESW).

Protocol 1: Conductivity Prediction via MLIP-MD

System Building: Using PACKMOL, create a simulation box with ~100-200 solvent molecules, corresponding Li⁺ and anion counts for target concentration (e.g., 1M).
Equilibration: Run MLIP-MD in the NPT ensemble (300 K, 1 bar) for 2-5 ns using LAMMPS/ASE interface to achieve correct density.
Production Run: Perform MLIP-MD in the NVT ensemble for 10-20 ns, saving trajectories every 10 fs.
Analysis: Calculate the Mean Squared Displacement (MSD) of Li⁺ and anions. Apply the Einstein relation: D = (1/(6Nt)) * lim_{t→∞} d(Σᵢ [rᵢ(t)-rᵢ(0)]²)/dt, where N is the number of ions. Compute conductivity via the Nernst-Einstein equation: σ = (ρ q² / (k_B T)) * (D₊ + D₋), where ρ is ion number density.

Protocol 2: Electrochemical Stability Window Estimation

HOMO/LUMO Proxy: For each component (solvent, anion), extract 50-100 representative snapshots from equilibrated MLIP-MD.
Single-Point DFT: Perform quick DFT (e.g., ωB97X-D/6-31+G*) calculations on the isolated molecules in their MD-extracted geometries.
Statistical Analysis: Calculate the distribution of HOMO (for oxidation potential, Eox ≈ -HOMO - C) and LUMO (for reduction potential, Ered ≈ -LUMO - C) energies. The ESW is approximated as E_red(solvent) to E_ox(solvent or anion). The most negative LUMO among components typically dictates the reduction limit.

The Scientist's Toolkit: Key Research Reagents & Software

Table 3: Essential Computational & Experimental Reagent Solutions

Category	Item/Solution	Function & Relevance
MLIP Software	NequIP, MACE, Allegro	Graph neural network-based MLIP frameworks; state-of-the-art for accuracy & data efficiency.
MD Engine	LAMMPS	Primary engine for running large-scale MLIP-MD simulations.
DFT Codes	VASP, CP2K, Gaussian	Generate ab initio training data (energies, forces, stresses) for MLIP training.
Training Datasets	Open Catalyst Project, Materials Project	Public benchmark datasets for pre-training or comparative analysis.
Experimental Validation - Electrolyte	1M LiPF₆ in EC:DMC (1:1 by wt)	Standard baseline electrolyte for benchmarking conductivity, ESW, and SEI performance.
Experimental Validation - Additive	Fluoroethylene Carbonate (FEC)	Common SEI-forming additive; a critical benchmark for MLIP-predicted reduction pathways.
Experimental Validation - Salt	Lithium Bis(fluorosulfonyl)imide (LiFSI)	Modern salt alternative to LiPF₆; key for studying anion-derived SEI and corrosion.
Characterization	Linear Sweep Voltammetry (LSV)	Experimental technique to determine electrochemical stability window (ESW).
Characterization	Electrochemical Impedance Spectroscopy (EIS)	Measures bulk ionic conductivity for validation of MLIP-MD predictions.

Promises and Pitfalls: Critical Analysis

Diagram 2: MLIP Electrolyte Discovery: Feedback Loop & Pitfalls

Pitfalls & Mitigation Protocols:

Pitfall 1 (Data Gaps): MLIPs fail for chemistries/coordination modes absent from training data.
- Protocol: Implement active learning. During screening, flag configurations with high model uncertainty (e.g., high variance in ensemble MLIPs). Run targeted DFT on these configurations and iteratively update the training set.
Pitfall 2 (Extrapolation Errors): Predicting properties far outside trained conditions (e.g., extreme temperatures).
- Protocol: Explicitly include targeted configurations in training (e.g., high-T AIMD, strained geometries). Always report the uncertainty estimation of the MLIP alongside predictions.
Pitfall 3 (Validation Lag): In silico predictions require experimental confirmation.
- Protocol: Prioritize synthesis and testing of top candidates using a standardized lab protocol (e.g., coin cell testing with LSV/EIS). Use this feedback to recalibrate the screening descriptors.

Conclusion

MLIPs represent a paradigm shift in lithium battery electrolyte simulation, offering near-quantum accuracy at classical computational costs. This synthesis demonstrates their foundational role in understanding complex liquid and interfacial phenomena, provides a robust methodological framework for application, offers solutions to key implementation challenges, and validates their superior predictive power. For researchers in battery development, adopting MLIPs is no longer just an option but a strategic imperative to accelerate the design cycle of next-generation electrolytes with tailored properties. Future directions must focus on developing more transferable, multi-component potentials and integrating MLIP simulations with autonomous experimental labs to usher in an era of AI-driven battery innovation.