vignettes/metabolite_annotation_using_MS2.Rmd
metabolite_annotation_using_MS2.Rmd
The peak table must contain “name” (peak name), “mz” (mass to charge ratio) and “rt” (retention time, unit is second). It can be from any data processing software (XCMS, MS-DIAL and so on).
The raw MS2 data from DDA or DIA should be transfered to msp, mgf or mzXML format files using ProteoWizard software.
The database must be generated using constructDatabase()
function. You can also use the public databases we provoded here.
Place the MS1 peak table, MS2 data and database which you want to use in one folder like below figure shows:
identify_metabolites()
function
We use the demo data in metID
package to show how to use metID
to identify metablite without MS2 spectra. Here, we use the in-house database from Michael Snyder lab (msDatabase_rplc0.0.2
).
In-house database The in-house database in our lab were provided, with RPLC and HILIC mode RT information. They were acquired using Thermo Fisher QE-plus. However, the LC system may be different with your experiments, so if you want to use our in-house database for metabolite identification, please set
rt.match.tol
as 100000000 (no limitation). The in-house database can be downloaded in my github.
First we load the MS1 peak, MS2 data and database from metID
package and then put them in a example
folder.
##creat a folder nameed as example
path <- file.path(".", "example")
dir.create(path = path, showWarnings = FALSE)
##get MS1 peak table from metID
ms1_peak <- system.file("ms1_peak", package = "metID")
file.copy(from = file.path(ms1_peak, "ms1.peak.table.csv"),
to = path, overwrite = TRUE, recursive = TRUE)
#> [1] TRUE
##get MS2 data from metID
ms2_data <- system.file("ms2_data", package = "metID")
file.copy(from = file.path(ms2_data, "QC1_MSMS_NCE25.mgf"),
to = path, overwrite = TRUE, recursive = TRUE)
#> [1] TRUE
##get database from metID
database <- system.file("ms2_database", package = "metID")
file.copy(from = file.path(database, "msDatabase_rplc0.0.2"),
to = path, overwrite = TRUE, recursive = TRUE)
#> [1] TRUE
Now in your ./example
, there are three files, namely ms1.peak.table.csv
, QC1_MSMS_NCE25.mgf
and msDatabase_rplc_0.0.2
.
<-
annotate_result3 identify_metabolites(ms1.data = "ms1.peak.table.csv",
ms2.data = c("QC1_MSMS_NCE25.mgf"),
ms2.match.tol = 0.5,
ce = "all",
ms1.match.ppm = 15,
rt.match.tol = 30,
polarity = "positive",
column = "rp",
path = path,
candidate.num = 3,
database = "msDatabase_rplc0.0.2",
threads = 3)
#> Use old data
#> Matching peak table with MS2 spectrum...
#> 23 out of 100 peaks have MS2 spectra.
#> Selecting the most intense MS2 spectrum for each peak...OK
#> Use all CE values.
#>
#> Identifing metabolites with MS/MS database...
#>
|
| | 0%
|
|======================= | 33%
|
|=============================================== | 67%
|
|======================================================================| 100%
#>
#> All done.
Note: You can also provide more than one MS2 data. Just provide them to
ms2.data
as a vector.
Most of the parameters are same with in Annotate metabolites according to MS1 database using metID package
.
Some parameters for MS2 matching:
ms2.data
: The ms2 data.
ce
: The collision energy of spectra used for matching. Set as all
to use all the spectra.
ms2.match.tol
: The MS2 similarity tolerance for peak and database metabolite match. The MS2 similarity refers to the algorithm from MS-DIAl. So if you want to know more information about it, please read this publication.
\[MS2\;Simlarity\;Score\;(SS) = Fragment\;fraction*Weight_{fraction} + Dot\;product(forward) * Weight_{dp.reverse}+Dot\;product(reverse)*Weight_{dp.reverse}\]
fraction.weight
: The weight for fragment match fraction.\[Fragment\;match\;fraction = \dfrac{Match\;fragement\;number}{All\;fragment\;number}\]
dp.forward.weight
: The weight for dot product (forward)
dp.forward.weight
: The weight for dot product (forward)
\[Dot\;product = \dfrac{\sum(wA_{act.}wA_{lib})^2}{\sum(wA_{act.})^2\sum(wA_{lib})^2}with\;w =1/(1+\dfrac{A}{\sum(A-0.5)})\]
The return result annotate_result3
is a metIdentifyClass
object, you can directory get the brief information by print it in console:
annotate_result3
#> --------------metID version-----------
#> 1.0.0
#> -----------Identifications------------
#> (Use get_identification_table() to get identification table)
#> There are 100 peaks
#> 23 peaks have MS2 spectra
#> There are 30 metabolites are identified
#> There are 21 peaks with identification
#> -----------Parameters------------
#> (Use get_parameters() to get all the parameters of this processing)
#> Polarity: positive
#> Collision energy: all
#> database: msDatabase_rplc0.0.2
#> Total score cutoff: 0.5
#> Column: rp
#> Adduct table:
#> (M+H)+;(M+H-H2O)+;(M+H-2H2O)+;(M+NH4)+;(M+Na)+;(M-H+2Na)+;(M-2H+3Na)+;(M+K)+;(M-H+2K)+;(M-2H+3K)+;(M+CH3CN+H)+;(M+CH3CN+Na)+;(2M+H)+;(2M+NH4)+;(2M+Na)+;(2M+K)+;(M+HCOO+2H)+
Most of the detailed annotation information are same with Annotate metabolites according to MS1 database using metID package
.
You can also use ms2plot()
function to output the MS2 specra match plot for one, multiple or all peaks.
##which peaks have identifications
which_has_identification(annotate_result3) %>%
head()
#> MS1.peak.name MS2.spectra.name
#> 1 pRPLC_603 mz162.112442157672rt37.9743312
#> 2 pRPLC_722 mz181.072050304971rt226.14144
#> 3 pRPLC_1046 mz181.072050673093rt196.800648
#> 4 pRPLC_1112 mz209.092155077047rt58.3735608
#> 5 pRPLC_1307 mz314.232707486156rt400.268664
#> 6 pRPLC_1860 mz249.185015539689rt579.6807
Becase we need the information from database, so we need to load database first.
ms2.plot1 <- ms2plot(object = annotate_result3,
database = msDatabase_rplc0.0.2,
which.peak = "pRPLC_603")
ms2.plot1
You can also output interactive MS2 spectra match plot by setting
interaction.plot
as TRUE.
##which peaks have identification
ms2.plot2 <- ms2plot(object = annotate_result3,
database = msDatabase_rplc0.0.2,
which.peak = "pRPLC_603",
interaction.plot = TRUE)
ms2.plot2
Some time you want to get the dark theme. Because the plot from ms2plot
is a ggplot2
object, so you can just set the theme as ‘dark theme’.
ms2.plot1 <- ms2plot(object = annotate_result3,
database = msDatabase_rplc0.0.2,
which.peak = "pRPLC_603",
col.exp = "white")
ms2.plot1_2 <-
ms2.plot1 +
ggdark::dark_theme_bw()
ms2.plot1_2
Just use plotly
to convert it to interactive plot.
ms2.plot1_2 %>%
plotly::ggplotly()
You can set the which.peak
as a vector of peak names to output multiple peaks MS2 match plot, or set it as all
to output all MS2 spectra match plots.
For example, if we want to output all the MS2 spectra match plots:
ms2plot(
object = annotate_result3,
database = msDatabase_rplc0.0.2,
which.peak = "all",
path = file.path(path, "inhouse"),
threads = 3
)
Then all the MS2 spectra match plots will be output in the “inhouse” folder.