If you have in-house standards which have been acquired with MS2 spectra data, then you can construct the in-house MS2 spectra databases using the metID package.

Data preparation

Firstly, please transform your raw standard MS data (positive and negative modes) to mzXML format using ProteoWizard. The parameter setting is shown in the figure below:

Data organization

Secondly, please organize your standard information as a table, and output it in a csv or xlsx format. The format of stanford information can refer to our demo data in demoData package.

From column 1 to 11, the columns are “Lab.ID”, “mz”, “RT”, “CAS.ID”, “HMDB.ID”, “KEGG.ID”, “Formula”, “mz.pos”, “mz.neg”, “Submitter”, respectively. It is OK if you have other information for the standards. As the demo data show, there are other additional information, namely “Family”, “Sub.pathway” and “Note”.

  • mz: Accurate mass of compound.

  • RT: Retention time, unit is second.

  • mz.pos: Mass to change ratio of compound in positive mode, for example, M+H.

  • mz.neg: Mass to change ratio of compound in negative mode, for example, M-H.

  • Submitter: The name of person or organization.

Then create a folder and put your mzXML format datasets (positive mode in ‘POS’ folder and negative mode in ‘NEG’ folder) and compound information in it. The mzXML file should have the collision energy in the name of each file. For example, test_NCE25.mzXML.

The names of the mzXML files should be like this: xxx_NCE25.mzXML.

Run construct_database() function

library(demoData)
library(metID)
path <- system.file("database_construction", package = "demoData")
file.copy(
  from = path,
  to = ".",
  overwrite = TRUE,
  recursive = TRUE
)
#> [1] TRUE
new.path <- file.path("./database_construction")

test.database <- construct_database(
  path = new.path,
  version = "0.0.1",
  metabolite.info.name = "metabolite.info_RPLC.csv",
  source = "Michael Snyder lab",
  link = "http://snyderlab.stanford.edu/",
  creater = "Xiaotao Shen",
  email = "shenxt1990@163.com",
  rt = TRUE,
  mz.tol = 15,
  rt.tol = 30,
  threads = 3
)
#> Reading metabolite information...
#> Reading positive MS2 data...
#> Reading MS2 data...
#> Processing...
#> 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |===============================================                       |  67%
  |                                                                            
  |======================================================================| 100%
#> 
#> OK
#> Reading negative MS2 data...
#> Reading MS2 data...
#> Processing...
#> 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |===============================================                       |  67%
  |                                                                            
  |======================================================================| 100%
#> 
#> OK
#> Matching metabolites with MS2 spectra (positive)...
#> OK
#> Matching metabolites with MS2 spectra (negative)...
#> OK
#> All done!

The arguments of construct_database() can be found here construct_database().

test.database is a databaseClass object, you can print it to see its information.

test.database
#> -----------Base information------------
#> Version: 0.0.1 
#> Source: Michael Snyder lab 
#> Link: http://snyderlab.stanford.edu/ 
#> Creater: Xiaotao Shen ( shenxt1990@163.com )
#> With RT information
#> -----------Spectral information------------
#> There are 14 items of metabolites in database:
#> Lab.ID; Compound.name; mz; RT; CAS.ID; HMDB.ID; KEGG.ID; Formula; mz.pos; mz.neg; Submitter; Family; Sub.pathway; Note 
#> There are 170 metabolites in total
#> There are 108 metabolites in positive mode
#> There are 104 metabolites in negative mode
#> Collision energy in positive mode:
#> NCE25; NCE50 
#> Collision energy in negative mode:
#> NCE25; NCE50

Note: test.database is only a demo database (metIdentifyClass object). We will don’t use it for next metabolite identification.

MS1 database

If you do not have MS2 data, please use construct_database() function to construct MS1 database.

Retention time correction


The metabolite retention time (RT) may shift in different batches. Therefore, if you spike internal standards into your standards and biological samples, you can correct the RTs in database using the rtCor4database() function.

Data preparation

Firstly, please prepare two internal standard (IS) tables for the database and biological samples. The format of the IS table is shown in the figure below:

The IS table for the database should be named as “database.is.table.xlsx” and the IS table for experiment should be named as “experiment.is.table.xlsx”.

Run rtCor4database() function

test.database2 <- rtCor4database(experiment.is.table = "experiment.is.table.xlsx",
                                 database.is.table = "database.is.table.xlsx",
                                 database = test.database,
                                 path = new.path)

The database should be the database (databaseClass object) for which you want to correct RTs.

Note: test.database2 is only a demo database (metIdentifyClass object). We will not use it for the next metabolite identification step.