NMR-ML: NMR Markup Language Creator and Extractor

This toolkit consists of two main components:

nmrml_creator.py - creates nmrML files from various input formats including:
nmrml_extract.py - extracts data from nmrML files and converts it to other formats:

Download:  

nmr_ml.tar.gz

Installation:

Prerequisites

The toolkit requires Python 3.x and several dependencies:

# Required Python packages
pip install numpy
pip install pandas
pip install nmrglue
pip install matplotlib
pip install rdkit

Usage:

nmrml_creator.py

This script creates nmrML files from various input data sources.

Basic Usage
python nmrml_creator.py [options]
Common Options
Example Command
python nmrml_creator.py -solvent D2O -standard DSS -freq 700MHz -spec_type 1D-1H -sdf molecule.mol -pl peaklist.txt -output_path output.nmrML

nmrml_extract.py

This script extracts data from nmrML files into various formats.

Basic Usage
python nmrml_extract.py --input  [--outfolder ] [--outbase ]
Options
Example Command
python nmrml_extract.py --input spectrum.nmrML --outfolder ./output --outbase my_spectrum

Input File Formats:

SDF/MOL Files

Standard chemical structure files containing molecular information.

Peak List Files

Text files containing peak information for NMR spectra. Format:

shift multiplicity intensity j_coupling

Chemical Shift Files (CSV)

CSV files containing chemical shift assignments with the following columns:

Atom Type,Atom No,Predicted Shift(ppm),Actual Shift(ppm),Predicted Multiplet Type,Predicted J coupling(Hz),Actual J coupling(Hz)

Bruker 2D Zip Files

Compressed Bruker format files containing 2D NMR data.

Output Files:

nmrML Files

XML-based files following the nmrML standard, containing:

Extracted Files from nmrml_extract.py

Algorithms:

nmrml_creator.py

  1. Molecular Structure Processing:Parses SDF/MOL files to extract atom coordinates and bond information, converts to nmrML-compatible XML format
  2. Spectrum Processing:Processes raw spectral data (FID or processed spectra), applies optional processing like water signal removal, encodes spectral data in base64 format (with optional compression)
  3. Peak and Assignment Processing:Parses peak lists, identifies chemical shifts and multiplet structures, maps peaks to molecular structure if assignments available
  4. 2D Spectrum Processing:Processes 2D NMR data (e.g., HSQC), creates contour plots and JSON representations, handles assignments

nmrml_extract.py

  1. XML Parsing:Parses the nmrML XML structure, extracts metadata, molecular structure, and spectral data
  2. Molecular Structure Extraction:Converts XML representation back to MOL format, reconstructs 2D coordinates
  3. Assignment Extraction:Extracts peak assignments, converts to CSV, maps to molecular structure
  4. Spectral Data Decoding:Decodes base64 spectral data, decompresses, converts to numeric arrays

Examples:

The examples directory contains sample input files:

Contributing:

Contributions to improve the toolkit are welcome. Please follow these steps:

  1. Fork the repository
  2. Create a feature branch
  3. Submit a pull request

License:

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments:

This toolkit was developed as part of the Natural Products Magnetic Resonance Database (NP-MRD) project at the Wishart Lab.