Useful functionsο
Here we introduce functions that you may find useful when using Dynamic Entropy Search.
About index constructionο
Manually convert the index to a Flash Entropy Search-compatible formatο
Based on the Flash Entropy Search, the index of Dynamic Entropy Search can be converted to a compact structure which can be used in Flash Entropy Search, resulting in a faster search speed and less storage.
from ms_entropy import DynamicEntropySearch
from pathlib import Path
# Choose an existing index and assign the path
entropy_search = DynamicEntropySearch(path_data=path_of_your_library)
# Manually sort the blocks in index
entropy_search.convert_to_fast_search()
# Manually convert the index to a compact structure
entropy_search.read()
group_num=len(entropy_search.group_start)
for i in range(group_num):
group_path=Path(path_of_your_library)/f"{i}"
entropy_search.entropy_search.path_data=group_path
entropy_search.entropy_search.read()
entropy_search.convert_current_index_to_flash()
This operation internally sorts the blocks in the index and then removes reserved space in the index.
After this process, index can be converted to the structure compatible with Flash Entropy Search. Performance of search will improve when using the search functions in DynamicEntropySearch.
Note
When using add_new_spectra() functions to update index, there will be an automatic conversion of index structure if the size of this group meets the limit.
You can set convert_to_flash in add_new_spectra() and build_index() to False to disable this feature.
from ms_entropy import DynamicEntropySearch
entropy_search = DynamicEntropySearch(path_data=path_of_your_library)
entropy_search.add_new_spectra(spectra_list=spectra_1_for_library, convert_to_flash=False)
entropy_search.add_new_spectra(spectra_list=spectra_2_for_library, convert_to_flash=False)
entropy_search.add_new_spectra(spectra_list=spectra_3_for_library, convert_to_flash=False)
......
entropy_search.build_index(convert_to_flash=False)
entropy_search.write()
It should be noted that if the index of group is already a compact structure, converting operation and subsequent adding operation will both result in an error.
Construct an index only for open searchο
If you only need to construct index for open search, it is unnecessary to process neutral loss data.
By setting index_for_neutral_loss in add_new_spectra() and build_index() to False, you can construct the index for open search more efficiently.
Hereβs an example:
from ms_entropy import DynamicEntropySearch
entropy_search = DynamicEntropySearch(path_data=path_of_your_library)
entropy_search.add_new_spectra(spectra_list=spectra_1_for_library, index_for_neutral_loss=False)
entropy_search.add_new_spectra(spectra_list=spectra_2_for_library, index_for_neutral_loss=False)
entropy_search.add_new_spectra(spectra_list=spectra_3_for_library, index_for_neutral_loss=False)
......
entropy_search.build_index(index_for_neutral_loss=False)
entropy_search.write()
Warning
Once index_for_neutral_loss is set to False, it will no longer be possible to construct neutral loss index of this library. Keep this parameter to False all the time to avoid errors.
Whatβs more, only open search can be performed under this circumstance. It is necessary to check the value of method when using search functions. Performing identity search, neutral loss search or hybrid search can cause error.
About spectra searchο
Different search functions can serve different objectives.
Make sure the query spectra are all cleaned before search.
General searchο
Search top matches with metadataο
from ms_entropy import DynamicEntropySearch
# Assign the path for your library
entropy_search=DynamicEntropySearch(path_data=path_of_your_library)
# For query_spectra_list, iterate it to perform search for each elements.
for spec in query_spectra_list:
result=entropy_search.search_topn_matches(
precursor_mz=spec['precursor_mz'],
peaks=spec['peaks'],
ms1_tolerance_in_da=0.01, # You can change ms1_tolerance_in_da as needed.
ms2_tolerance_in_da=0.02, # You can change ms2_tolerance_in_da as needed.
method='open', # or 'neutral_loss' or 'hybrid' or 'identity'.
clean=True, # If you don't want to use the internal clean process in this function, set it to False.
topn=3, # You can change topn as needed.
need_metadata=True, # Set it to True if need metadata.
)
# After that, you can print the result like this:
print(result)
An example result:
[{
'id': 'Demo spectrum 2',
'precursor_mz': 200.0,
'peaks': array([[100. , 0.33333334], [101. , 0.33333334], [102. , 0.33333334]], dtype=float32),
'metadata': 'ABC',
'open_search_entropy_similarity': np.float32(0.99999994)
}, {
'id': 'Demo spectrum 1',
'precursor_mz': 150.0,
'peaks': array([[100. , 0.33333334], [101. , 0.33333334], [103. , 0.33333334]], dtype=float32),
'open_search_entropy_similarity': np.float32(0.6666666)
}, {
'precursor_mz': 350.0,
'peaks': array([[100. , 0.33333334], [101. , 0.33333334],[302. , 0.33333334]], dtype=float32), 'open_search_entropy_similarity': np.float32(0.6666666)
}]
Search that only requires similarity without metadataο
from ms_entropy import DynamicEntropySearch
# Assign the path for your library
entropy_search=DynamicEntropySearch(path_data=path_of_your_library)
### Use `search()` and get an array with all entropy similarities ###
result=entropy_search.search(
precursor_mz=query_spectrum['precursor_mz'],
peaks=query_spectrum['peaks'],
ms1_tolerance_in_da=0.01, # You can change ms1_tolerance_in_da as needed.
ms2_tolerance_in_da=0.02, # You can change ms2_tolerance_in_da as needed.
method='all', # or 'neutral_loss' or 'hybrid' or 'identity' or 'open'.
clean=True, # If you don't want to use the internal clean process in this function, set it to False.
)
print(result)
Example result:
{
'identity_search': array([0.6666666, 0. , 0. , 0. , 0. , 0. ], dtype=float32),
'open_search': array([0.6666666 , 0.99999994, 0.3333333 , 0.6666666 , 0. , 0. ], dtype=float32),
'neutral_loss_search': array([0.6666666, 0. , 0.6666666, 0.3333333, 0. , 0. ], dtype=float32),
'hybrid_search': array([0.6666666 , 0.99999994, 0.99999994, 0.99999994, 0. , 0. ], dtype=float32)
}
This result:
includes the results of all search methods because method is set to all.
returns only similarity array in the order of spectra in the library.
Specific searchο
Specific searches include identity search, open search, neutral loss search and hybrid search.
Note
These following specific search functions donβt have an internal clean function. Thus, external clean function is necessary before using these search functions. See external clean before search and the following examples.
Identity searchο
from ms_entropy import DynamicEntropySearch
# Assign the path for your library
entropy_search=DynamicEntropySearch(path_data=path_of_your_library)
precursor_ions_removal_da = 1.6
query_spectrum['peaks'] = clean_spectrum(
peaks = query_spectrum['peaks'],
max_mz = query_spectrum['precursor_mz'] - precursor_ions_removal_da
)
### Use `identity_search()` and get an array with all entropy similarities based on identity search ###
result=entropy_search.identity_search(
precursor_mz=query_spectrum['precursor_mz'],
peaks=query_spectrum['peaks'],
ms1_tolerance_in_da=0.01, # You can change ms1_tolerance_in_da as needed.
ms2_tolerance_in_da=0.02, # You can change ms2_tolerance_in_da as needed.
)
print(result)
Example result:
[0. 0. 0.99999994 0. 0. 0. ]
This result:
includes the results of identity search.
returns only similarity array in the order of spectra in the library.
Open searchο
from ms_entropy import DynamicEntropySearch
# Assign the path for your library
entropy_search=DynamicEntropySearch(path_data=path_of_your_library)
query_spectrum['peaks'] = clean_spectrum(
peaks = query_spectrum['peaks'],
)
### Use `open_search()` and get an array with all entropy similarities based on open search ###
result=entropy_search.open_search(
peaks=query_spectrum['peaks'],
ms2_tolerance_in_da=0.02, # You can change ms2_tolerance_in_da as needed.
)
print(result)
Example result:
[0.3333333 0.3333333 0.99999994 0.3333333 0. 0. ]
This result:
includes the results of open search.
returns only similarity array in the order of spectra in the library.
Neutral loss searchο
from ms_entropy import DynamicEntropySearch
# Assign the path for your library
entropy_search=DynamicEntropySearch(path_data=path_of_your_library)
precursor_ions_removal_da = 1.6
query_spectrum['peaks'] = clean_spectrum(
peaks = query_spectrum['peaks'],
max_mz = query_spectrum['precursor_mz'] - precursor_ions_removal_da
)
### Use `neutral_loss_search()` and get an array with all entropy similarities based on neutral loss search ###
result=entropy_search.neutral_loss_search(
precursor_mz=query_spectrum['precursor_mz'],
peaks=query_spectrum['peaks'],
ms2_tolerance_in_da=0.02, # You can change ms2_tolerance_in_da as needed.
)
print(result)
Example result:
[0.3333333 0. 0.99999994 0.3333333 0. 0. ]
This result:
includes the results of neutral loss search.
returns only similarity array in the order of spectra in the library.
Hybrid searchο
from ms_entropy import DynamicEntropySearch
# Assign the path for your library
entropy_search=DynamicEntropySearch(path_data=path_of_your_library)
precursor_ions_removal_da = 1.6
query_spectrum['peaks'] = clean_spectrum(
peaks = query_spectrum['peaks'],
max_mz = query_spectrum['precursor_mz'] - precursor_ions_removal_da
)
### Use `hybrid_search()` and get an array with all entropy similarities based on hybrid search ###
result=entropy_search.hybrid_search(
precursor_mz=query_spectrum['precursor_mz'],
peaks=query_spectrum['peaks'],
ms2_tolerance_in_da=0.02, # You can change ms2_tolerance_in_da as needed.
)
print(result)
Example result:
[0.6666666 0.3333333 0.99999994 0.6666666 0. 0. ]
This result:
includes the results of hybrid search.
returns only similarity array in the order of spectra in the library.