Quick start

Overview

from ms_entropy import FlashEntropySearch

# Construct the FlashEntropySearch class
entropy_search = FlashEntropySearch()

# Step 1: Build the index from the library spectra
entropy_search.build_index(spectral_library)

# Step 2: Search the library
entropy_similarity = entropy_search.search(
    precursor_mz = query_spectrum_precursor_mz, peaks = query_spectrum_peaks)

In detail

Suppose you have a spectral library, you need to format it like this:

import numpy as np
spectral_library = [{
    "id": "Demo spectrum 1-A",
    "precursor_mz": 150.0,
    "peaks": [[100.0, 1.0], [101.0, 1.0], [103.0, 1.0]]
}, {
    "id": "Demo spectrum 2-C",
    "precursor_mz": 250.0,
    "peaks": np.array([[200.0, 1.0], [101.0, 1.0], [202.0, 1.0]], dtype=np.float32),
    "XXX": "YYY",
}, {
    "id": "Demo spectrum 3-B",
    "precursor_mz": 200.0,
    "peaks": np.array([[100.0, 1.0], [101.0, 1.0], [102.0, 1.0]], dtype=np.float32),
    "metadata": "ABC"
}, {
    "precursor_mz": 350.0,
    "peaks": [[100.0, 1.0], [101.0, 1.0], [302.0, 1.0]]}]

Note that the precursor_mz and peaks keys are required, the reset of the keys are optional.

Then you have your query spectrum looks like this:

query_spectrum = {"precursor_mz": 150.0,
                  "peaks": [[100.0, 1.0], [101.0, 1.0], [102.0, 1.0]]}

You can call the FlashEntropySearch class to search the library like this:

from ms_entropy import FlashEntropySearch
entropy_search = FlashEntropySearch()
# Step 1: Build the index from the library spectra
spectral_library_new = entropy_search.build_index(spectral_library)
# Step 2: Search the library
entropy_similarity = entropy_search.search(
    precursor_mz=150.0, peaks=[[100.0, 1.0], [101.0, 1.0], [102.0, 1.0]])

Warning

It is important to note that for efficient identity searching, all spectra in the spectral_library get re-sorted based on their precursor m/z values during the indexing process, and the search function returns the similarity scores in the same order as the re-sorted spectra. The build_index function returns a list of these re-sorted spectra, useful for mapping results back to the original spectra.

In this example, the original order of the spectra is “Demo spectrum 1-A”, “Demo spectrum 2-C”, “Demo spectrum 3-B”, … After indexing, the spectra are re-sorted by precursor m/z, so the order becomes “Demo spectrum 1-A”, “Demo spectrum 3-B”, “Demo spectrum 2-C”, … You can check the difference by printing the spectral_library_new variable: print(spectral_library_new). The variable entropy_similarity will have the same order as spectral_library_new.

After that, you can print the results like this:

import pprint
pprint.pprint(entropy_similarity)

The result will look like this:

{'hybrid_search': array([0.6666666 , 0.99999994, 0.99999994, 0.99999994], dtype=float32),
'identity_search': array([0.6666667, 0.       , 0.       , 0.       ], dtype=float32),
'neutral_loss_search': array([0.6666666, 0.       , 0.6666666, 0.3333333], dtype=float32),
'open_search': array([0.6666666 , 0.99999994, 0.3333333 , 0.6666666 ], dtype=float32)}

The values are the similarity scores for each spectrum in the spectral_library_new list. For example, the array [0.6666666 , 0.99999994, 0.3333333 , 0.6666666] in the open_search key means that the query spectrum has a similarity score of 0.6666666 with the first spectrum in the spectral_library_new list, which is “Demo spectrum 1-A” or entropy_search[0], a similarity score of 0.99999994 with the second spectrum in the spectral_library_new list, which is “Demo spectrum 3-B” or entropy_search[1], and a similarity score of 0.3333333 with the third spectrum in the spectral_library_new list (entropy_search[2]), and so on.

Note

In default, the search function will return the similarity scores for all four search modes, which are identity_search, open_search, neutral_loss_search, and hybrid_search. To save time, you can specify the search mode by setting the method parameter, for example, method = {'identity', 'open'} will only return the similarity scores for identity_search and open_search. Click here for more details.

Examples

You can find several examples of how to use the package in the examples directory, the example.py script is a good starting point to get familiar with the package.

Want more?

Still have questions? Want more functions?

We also provided more function tools to help you calculate the spectral similarity, please go to the rest sections for more information.