================
Useful functions
================

Reading Spectra from a File
===========================
For ease of use, a function named ``read_one_spectrum`` is provided in the ``ms_entropy`` package, allowing you to easily read spectra from a file. Here is an example of how you can use it:

.. code-block:: python

    from ms_entropy import read_one_spectrum
    for spectrum in read_one_spectrum('path/to/spectrum/file'):
        print(spectrum)

This function returns a dictionary, where each key-value pair corresponds to a specific metadata of the spectrum.

Currently, the ``read_one_spectrum`` function supports the following file formats: ``.mgf``, ``.msp``, ``.mzML``, and ``.lbm2`` from the MS-DIAL software.

----------------

Get the top-n results from the Flash entropy search results
===========================================================

Once you have conducted a search in your spectral library, you may want to focus only on the top-N results, or the results with a similarity score that is higher than a certain threshold. The ``get_topn_matches`` function has been designed specifically for this purpose.

The ``get_topn_matches`` function takes three parameters:

- ``similarity_array``: The similarity scores that the search function has returned.

- ``topn``: The number of top results you want to retrieve. If you set this to None, all results will be retrieved.

- ``min_similarity``: The minimum similarity score that results should have. If you set this to None, all results will be retrieved.

The function will return a list of dictionaries. Each dictionary corresponds to a spectrum in the library. The dictionary is similar to the one in the library spectra (the input of the ``build_index``), with the addition of an ``entropy_similarity`` key to store the similarity score of the spectrum.

Here's how you can use the ``get_topn_matches`` function:

.. code-block:: python

    topn_match = entropy_search.get_topn_matches(entropy_similarity, topn=3, min_similarity=0.01)


This example will return a list of the top 3 matches with a similarity score greater than 0.01.

----------------

Get the metadata of a specifical spectrum from the Flash entropy search object
==============================================================================

After you've conducted a search in your spectral library, you may want to retrieve the metadata of a specific spectrum. For this, you can use the ``__getitem__`` function.

For instance, let's say that after a search, you found that the third spectrum (index start from 0) in the library has the highest similarity score. You can call ``entropy_search[2]`` to retrieve the metadata of the third spectrum.

Here's an example of how you can use the ``__getitem__`` function:

.. code-block:: python

    from ms_entropy import FlashEntropySearch
    entropy_search = FlashEntropySearch()
    entropy_search.build_index(spectral_library)

    # Get the metadata of the third spectrum
    metadata = entropy_search[2]

The metadata was extracted and stored when you called the ``build_index`` function. The data will remain available even if you save and reload the index using either the pickle module or the read and write functions.

----------------

Get the matched peaks number of query spectrum to the library Spectra
=====================================================================

If you also want to know the number of matched peaks between the query spectrum and the library spectra, you can set the ``get_matched_peaks_number`` parameters to ``True``. Then, the returned results will be a list of two numpy arrays. The first array contains the similarity scores, and the second array contains the number of matched peaks.

At this moment, the ``get_matched_peaks_number`` parameter is only supported by the ``identity_search``, ``open_search``, and ``neutral_loss_search`` functions.

Here's an example of how you can use the ``get_matched_peaks_number`` parameter:

.. code-block:: python

    import numpy as np
    from ms_entropy import FlashEntropySearch

    spectral_library = [{
        "id": "Demo spectrum 1",
        "precursor_mz": 150.0,
        "peaks": [[100.0, 1.0], [101.0, 1.0], [103.0, 1.0]]
    }, {
        "id": "Demo spectrum 2",
        "precursor_mz": 200.0,
        "peaks": np.array([[100.0, 1.0], [101.0, 1.0], [102.0, 1.0]], dtype=np.float32),
        "metadata": "ABC"
    }, {
        "id": "Demo spectrum 3",
        "precursor_mz": 250.0,
        "peaks": np.array([[200.0, 1.0], [101.0, 1.0], [202.0, 1.0]], dtype=np.float32),
        "XXX": "YYY",
    }, {
        "precursor_mz": 350.0,
        "peaks": [[100.0, 1.0], [101.0, 1.0], [302.0, 1.0]]}]
    query_spectrum = {"precursor_mz": 150.0,
                      "peaks": [[100.0, 1.0], [101.0, 1.0], [102.0, 1.0]]}

    entropy_search = FlashEntropySearch()
    # Step 1: Build the index from the library spectra
    spectral_library = entropy_search.build_index(spectral_library)
    # Step 2: Clean the query spectrum
    query_spectrum['peaks'] = entropy_search.clean_spectrum_for_search(
        precursor_mz = query_spectrum['precursor_mz'],
        peaks = query_spectrum['peaks']
    )
    # Step 3: Search the library
    # This parameter is supported by the identity_search, open_search, and neutral_loss_search functions
    entropy_similarity, matched_peaks_number = entropy_search.identity_search(
        precursor_mz = query_spectrum['precursor_mz'],
        peaks = query_spectrum['peaks'],
        ms1_tolerance_in_da = 0.01,
        ms2_tolerance_in_da = 0.02,
        output_matched_peak_number = True
    )
    print(entropy_similarity)
    print(matched_peaks_number)

----------------

Save and load index for the Flash entropy search object
=======================================================

After you have built the index, you have the option to save it to disk for later use.

Using pickle
------------

You can use Python's built-in ``pickle`` module to save and load the ``FlashEntropySearch`` object, as follows:

.. code-block:: python

    import pickle
    # Save the index
    with open('path/to/index', 'wb') as f:
        pickle.dump(entropy_search, f)
    # And load the index
    with open('path/to/index', 'rb') as f:
        entropy_search = pickle.load(f)

Using ``read`` and ``write`` functions
--------------------------------------

We also provide ``read`` and ``write`` functions to save and load the index.


To save a ``FlashEntropySearch`` object to disk:

.. code-block:: python

    entropy_search.write('path/to/index')


To load a ``FlashEntropySearch`` object from disk:

.. code-block:: python

    entropy_search = FlashEntropySearch()
    entropy_search.read('path/to/index')


If you're working with a very large spectral library, or your computer's memory is limited, you can use the ``low_memory`` parameter to partially load the library and reduce the memory usage. For example:

.. code-block:: python

    entropy_search = FlashEntropySearch(low_memory=True)
    entropy_search.read('path/to/index')

The index only needs to be built once. After that, you can use the read function to load the index. If you built the index using the ``low_memory=False`` mode, you can still load it using a ``FlashEntropySearch`` object with either the ``low_memory=False`` or ``low_memory=True`` mode.