If you have in-house standards which have been acquired with MS2 spectra data, then you can construct the in-house MS2 spectra databases using the
Firstly, please transform your raw standard MS data (positive and negative modes) to mzXML format using ProteoWizard. The parameter setting is shown in the figure below:
Secondly, please organize your standard information as a table, and output it in a csv or xlsx format. The format of stanford information can refer to our demo data in
From column 1 to 11, the columns are “Lab.ID”, “mz”, “RT”, “CAS.ID”, “HMDB.ID”, “KEGG.ID”, “Formula”, “mz.pos”, “mz.neg”, “Submitter”, respectively. It is OK if you have other information for the standards. As the demo data show, there are other additional information, namely “Family”, “Sub.pathway” and “Note”.
mz: Accurate mass of compound.
RT: Retention time, unit is second.
mz.pos: Mass to change ratio of compound in positive mode, for example, M+H.
mz.neg: Mass to change ratio of compound in negative mode, for example, M-H.
Submitter: The name of person or organization.
Then create a folder and put your mzXML format datasets (positive mode in ‘POS’ folder and negative mode in ‘NEG’ folder) and compound information in it. The mzXML file should have the collision energy in the name of each file. For example,
The names of the mzXML files should be like this:
library(demoData) library(metID) path <- system.file("database_construction", package = "demoData") file.copy( from = path, to = ".", overwrite = TRUE, recursive = TRUE ) #>  TRUE new.path <- file.path("./database_construction") test.database <- construct_database( path = new.path, version = "0.0.1", metabolite.info.name = "metabolite.info_RPLC.csv", source = "Michael Snyder lab", link = "http://snyderlab.stanford.edu/", creater = "Xiaotao Shen", email = "email@example.com", rt = TRUE, mz.tol = 15, rt.tol = 30, threads = 3 ) #> Reading metabolite information... #> Reading positive MS2 data... #> Reading MS2 data... #> Processing... #> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100% #> #> OK #> Reading negative MS2 data... #> Reading MS2 data... #> Processing... #> | | | 0% | |======================= | 33% | |=============================================== | 67% | |======================================================================| 100% #> #> OK #> Matching metabolites with MS2 spectra (positive)... #> OK #> Matching metabolites with MS2 spectra (negative)... #> OK #> All done!
test.database is a
databaseClass object, you can print it to see its information.
test.database #> -----------Base information------------ #> Version: 0.0.1 #> Source: Michael Snyder lab #> Link: http://snyderlab.stanford.edu/ #> Creater: Xiaotao Shen ( firstname.lastname@example.org ) #> With RT information #> -----------Spectral information------------ #> There are 14 items of metabolites in database: #> Lab.ID; Compound.name; mz; RT; CAS.ID; HMDB.ID; KEGG.ID; Formula; mz.pos; mz.neg; Submitter; Family; Sub.pathway; Note #> There are 170 metabolites in total #> There are 108 metabolites in positive mode #> There are 104 metabolites in negative mode #> Collision energy in positive mode: #> NCE25; NCE50 #> Collision energy in negative mode: #> NCE25; NCE50
test.databaseis only a demo database (metIdentifyClass object). We will don’t use it for next metabolite identification.
If you do not have MS2 data, please use
construct_database() function to construct MS1 database.
The metabolite retention time (RT) may shift in different batches. Therefore, if you spike internal standards into your standards and biological samples, you can correct the RTs in database using the
Firstly, please prepare two internal standard (IS) tables for the database and biological samples. The format of the IS table is shown in the figure below:
The IS table for the database should be named as “database.is.table.xlsx” and the IS table for experiment should be named as “experiment.is.table.xlsx”.
test.database2 <- rtCor4database(experiment.is.table = "experiment.is.table.xlsx", database.is.table = "database.is.table.xlsx", database = test.database, path = new.path)
database should be the database (
databaseClass object) for which you want to correct RTs.
test.database2is only a demo database (metIdentifyClass object). We will not use it for the next metabolite identification step.