Basic usage - Searchο
Spectra comparison is performed as searching a query spectrum against the index in the reference library. You can perform identity search, open search, neutral loss search or hybrid search based on your need.
Search with internal clean functionο
Suppose you have established a library locally under path_of_your_library using the aforementioned method.
Now you can perform search with a query spectrum in correct format like this:
import numpy as np
# For each query spectrum, 'precursor_mz' and 'peaks' are necessary.
# 'precursor_mz' should be a float, and 'peaks' should be a 2D np.ndarray like np.ndarray([[m/z, intensity], [m/z, intensity], [m/z, intensity]...], dtype=np.float32).
query_spectrum = {"precursor_mz": 150.0,
"peaks": np.array([[100.0, 1.0], [101.0, 1.0], [102.0, 1.0]], dtype=np.float32)}
If your query spectra is a list consisting of several spectra:
import numpy as np
# For each query_spectra_list, it is a list consisting of multiple dictionaries of query MS2 spectra.
# For each query spectrum, 'precursor_mz' and 'peaks' are necessary.
# 'precursor_mz' should be a float, and 'peaks' should be a 2D np.ndarray like np.ndarray([[m/z, intensity], [m/z, intensity], [m/z, intensity]...], dtype=np.float32).
query_spectra_list = [{
"precursor_mz": 150.0,
"peaks": np.array([[100.0, 1.0], [101.0, 1.0], [102.0, 1.0]], dtype=np.float32)
},{
"precursor_mz": 250.0,
"peaks": np.array([[108.0, 1.0], [113.0, 1.0], [157.0, 1.0]], dtype=np.float32)
},{
"precursor_mz": 299.0,
"peaks": np.array([[119.0, 1.0], [145.0, 1.0], [157.0, 1.0]], dtype=np.float32)
},
]
You can call the DynamicEntropySearch class with corresponding path_data to search the library like this:
from ms_entropy import DynamicEntropySearch
# Select the path for your library
entropy_search=DynamicEntropySearch(path_data=path_of_your_library)
# Search the library and you can fetch the metadata from the results with the highest scores
result=entropy_search.search_topn_matches(
precursor_mz=query_spectrum['precursor_mz'],
peaks=query_spectrum['peaks'],
ms1_tolerance_in_da=0.01, # You can change ms1_tolerance_in_da as needed.
ms2_tolerance_in_da=0.02, # You can change ms2_tolerance_in_da as needed.
method='open', # Or 'neutral_loss' or 'hybrid' or 'identity'.
precursor_ions_removal_da=1.6, # Peaks with m/z greater than ``precursor_mz - precursor_ions_removal_da`` are removed during cleaning.
noise_threshold=0.01, # Relative intensity threshold for noise filtering during cleaning. Peaks with intensity ``< noise_threshold * max(intensity)`` are removed.
min_ms2_difference_in_da=0.05, # Minimum spacing allowed between MS/MS peaks during cleaning.
max_peak_num=None, # Maximum number of peaks to keep after cleaning.
clean=True, # If you don't want to use the internal clean process in this function, set it to False.
topn=3, # You can change topn as needed.
need_metadata=True, # Set it to True if need metadata.
)
# After that, you can print the result like this:
print(result)
Note
Cleaning the query spectrum is necessary. You can use the internal clean function of
search_topn_matches()or seperate the clean and search process. This is introduced in the following part.search_topn_matches()is suitable for identification that requires metadata whenneed_metadatain it isTrue. If it is set toFalse, the location of matched spectra in library will be returned.
The search_topn_matches() function accepts the following parameters:
peaks: The peaks of the query spectrum, which is a numpy array in format of[[mz, intensity], [mz, intensity], ...].precursor_mz: The precursor m/z of the query spectrum.ms1_tolerance_in_da: The mass tolerance to apply to the precursor m/z in Da, used only for identity search. Default is0.01.ms2_tolerance_in_da: The mass tolerance to apply to the fragment ions in Da. Default is0.02.method: The search method to employ. Available methods includeidentity,open,neutral_loss, andhybrid. A string is acceptable. The default value isopen.clean: Whether to clean the query spectrum before searching. Default isTrue.precursor_ions_removal_da: The mass tolerance for removing the precursor ions in Da. Fragment ions with m/z larger thanprecursor_mz - precursor_ions_removal_dawill be removed. Based on our tests, removing precursor ions can enhance search performance. Default is1.6.noise_threshold: The intensity threshold for removing noise peaks. Peaks with intensity smaller thannoise_threshold * max(fragment ion's intensity)will be removed. Default is0.01.min_ms2_difference_in_da: Minimum spacing allowed between MS/MS peaks during cleaning. Default is0.05Da.max_peak_num: Maximum number of peaks to keep after cleaning.Nonekeeps all peaks. Default isNone.topn: Number of top-matching spectra to return. IfNone, all spectra are returned. Default is3.need_metadata: IfTrue(default), return the metadata dictionary for each matched spectrum. IfFalse, return (global_index, similarity) tuples instead. Default isTrue.
An example result:
[{
'id': 'Demo spectrum 2',
'precursor_mz': 200.0,
'peaks': array([[100. , 0.33333334], [101. , 0.33333334], [102. , 0.33333334]], dtype=float32),
'metadata': 'ABC',
'open_search_entropy_similarity': np.float32(0.99999994)
}, {
'id': 'Demo spectrum 1',
'precursor_mz': 150.0,
'peaks': array([[100. , 0.33333334], [101. , 0.33333334], [103. , 0.33333334]], dtype=float32),
'open_search_entropy_similarity': np.float32(0.6666666)
}, {
'precursor_mz': 350.0,
'peaks': array([[100. , 0.33333334], [101. , 0.33333334],[302. , 0.33333334]], dtype=float32), 'open_search_entropy_similarity': np.float32(0.6666666)
}]
In this result:
This is generated from searching the query spectrum against an existing library. Select correct
methodto perform search.3 top matched spectra are given in descending order of similarity. This is set by
topninsearch_topn_matches(). If number of spectra with similarity greater than 0 is less thantopn, then output the actual matching number of spectra.Metadata of spectra is given if
need_metadatainsearch_topn_matches()is set toTrue. Users can add information for spectra when constructing library. These additional information other than βprecursor_mzβ and βpeaksβ, like βidβ, benefit the compound identification.
If the query spectra is a list, iterate it to perform search.
from ms_entropy import DynamicEntropySearch
# Assign the path for your library
entropy_search=DynamicEntropySearch(path_data=path_of_your_library)
# For query_spectra_list, iterate it to perform search for each elements.
for spec in query_spectra_list:
result=entropy_search.search_topn_matches(
precursor_mz=spec['precursor_mz'],
peaks=spec['peaks'],
ms1_tolerance_in_da=0.01, # You can change ms1_tolerance_in_da as needed.
ms2_tolerance_in_da=0.02, # You can change ms2_tolerance_in_da as needed.
method='open', # or 'neutral_loss' or 'hybrid' or 'identity'.
clean=True, # If you don't want to use the internal clean process in this function, set it to False.
topn=3, # You can change topn as needed.
need_metadata=True, # Set it to True if need metadata.
)
# After that, you can print the result like this:
print(result)
Search with external clean functionο
If you want to seperate clean and search process, you can set clean in search_topn_matches() to False and use an external clean function clean_spectrum() in ms_entropy.
You can use the clean_spectrum() function in ms_entropy to clean the query spectrum and then use individual search functions to search the library.
from ms_entropy import clean_spectrum
query_spectrum = {"precursor_mz": 150.0,
"peaks": np.array([[100.0, 1.0], [101.0, 1.0], [102.0, 1.0]], dtype=np.float32)}
precursor_ions_removal_da = 1.6
query_spectrum['peaks'] = clean_spectrum(
peaks = query_spectrum['peaks'],
max_mz = query_spectrum['precursor_mz'] - precursor_ions_removal_da
)
Now the query_spectrum is cleaned and ready for search. Then pass it to the search_topn_matches() with clean set to False.
result=entropy_search.search_topn_matches(
precursor_mz=query_spectrum['precursor_mz'],
peaks=query_spectrum['peaks'],
ms1_tolerance_in_da=0.01, # You can change ms1_tolerance_in_da as needed.
ms2_tolerance_in_da=0.02, # You can change ms2_tolerance_in_da as needed.
method='open', # Or 'neutral_loss' or 'hybrid' or 'identity'.
precursor_ions_removal_da=1.6, # Peaks with m/z greater than ``precursor_mz - precursor_ions_removal_da`` are removed during cleaning.
noise_threshold=0.01, # Relative intensity threshold for noise filtering during cleaning. Peaks with intensity ``< noise_threshold * max(intensity)`` are removed.
min_ms2_difference_in_da=0.05, # Minimum spacing allowed between MS/MS peaks during cleaning.
max_peak_num=None, # Maximum number of peaks to keep after cleaning.
clean=False, # If you don't want to use the internal clean process in this function, set it to False.
topn=3, # You can change topn as needed.
need_metadata=True, # Set it to True if need metadata.
)
You can also pass the query spectrum into the search functions mentioned in Useful Functions like this:
# Identity search
entropy_similarity = entropy_search.identity_search(
precursor_mz = query_spectrum['precursor_mz'],
peaks = query_spectrum['peaks'],
ms1_tolerance_in_da = 0.01,
ms2_tolerance_in_da = 0.02
)
Tools: external clean functionο
Both search_topn_matches() function and search() function include internal cleaning of query spectrum before performing search.
If you want to seperate these two process, you can set clean in these two functions to False and use an external clean function. See example.
You can use the clean_spectrum function to clean the query spectrum and then use individual search functions to search the library.
Clean spectrumο
Before performing a spectra search, the query spectrum should be pre-processed using the clean_spectrum() function in ms_entropy. This function accomplishes the following:
Remove empty peaks (m/z <= 0 or intensity <= 0).
Remove peaks with m/z values greater than
precursor_mz - precursor_ions_removal_da(removes precursor ions to improve the quality of spectral comparison).Centroid the spectrum by merging peaks within +/-
min_ms2_difference_in_daand sort the resulting spectrum by m/z.Remove peaks with intensity less than
noise_threshold* maximum intensity.Retain only the top max_peak_num peaks and remove all others.
Normalize the intensity to sum to 1.