Bioinformatics Alignment Pandarallel Nucleotides and Amino Acids Benchmark in Single and Multi-node Environments
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

This Jupyter benchmark performs both DNA and protein alignments in parallel using the
pandarallel `parallel_apply` call. It can be run with `dragon` to
compare performance on your machine. The DNA/nucleotide workload is run in a single-node environment.
The protein/amino acid workload is run in a multi-node environment.

The use case utilizes pairwise alignments from pyalign, a jaccard distance calculation for the E value, and a hamming distance calculation for the coverage percentage.
The timings are provided with Dragon and base multiprocessing for `parallel_apply`, the multiprocessing verison of pandas `apply`.
The application utilizes nucleotide and amino acid workloads for feature selection.
The time to run the workloads is calculated and displayed in the pandas. K-means clustering is used to group sequences by alignment and percentage coverage of alignments.

The code demonstrates the following key concepts working with Dragon:

* How to write programs that can run with Dragon and base multiprocessing
* How to use pandarallel and pandas with Dragon for feature selection
* How pandarallel handles different dtypes
* How to utilize pandarallel in a multi-node environment
* How to utilize k-means clustering on features such as alignment, E value, and percentage coverage

The following notebook was used for the single-node comparison:

.. literalinclude:: ../../examples/jupyter/doc_ref/bioinformatics_alignment_pandarallel_demo.py

For the single-node run, both base multiprocessing and Dragon are compared. The runs utilized a single node with 2 AMD EPYC 7742 64-Core Processors with 128 cores.
Dragon employs a number of optimizations on base multiprocessing; the Dragon start method outperforms the use of the base multiprocessing spawn start method on the same hardware. 

The timing for the base multiprocessing runtime is:

.. list-table:: Base Multiprocessing Timings for Nucleotide Alignments with Different Number of Bars
   :widths: 25 25 50
   :header-rows: 1

   * - Pandarallel Function
     - Number of Bars
     - Time
   * - PyAlign Alignment Score
     - 128
     - 61.000052
   * - E Score
     - 10
     - 22.919291
   * - Percentage Coverage
     - 1
     - 18.000021
   * - Total Time
     -
     - 101.919364


The timing for the single-node Dragon runtime is:

.. list-table:: Dragon Timings for Nucleotide Alignments with Different Number of Bars
   :widths: 25 25 50
   :header-rows: 1

   * - Pandarallel Function
     - Number of Bars
     - Time
   * - PyAlign Alignment Score
     - 128
     - 11.601343
   * - E Score
     - 10
     - 7.882140
   * - Percentage Coverage
     - 1
     - 7.930996
   * - Total Time
     -
     - 27.174203

For multi-node Dragon run, the run was on 2 Apollo nodes. Each Apollo node has 1x AMD Rome CPU with 4x AMD MI100 GPUs and 128 cores. 
The multi-node use case scales with the total number of CPUs reported by the allocation. As there are more nodes, workers, and CPUs available for multi-node, Dragon extends 
multiprocessing's stock capabilities and demonstrates additional improvement to measured execution time. 
Base multiprocessing does not support multi-node workloads. 

The following notebook was used for the multi-node comparison:

.. literalinclude:: ../../examples/jupyter/doc_ref/bioinformatics_alignment_pandarallel_multinode_demo.py

The timing for the multi-node Dragon runtime is:

.. list-table:: Multi-node Timings for Amino Acids Alignments with Same Number of Bars
   :widths: 25 25 50
   :header-rows: 1

   * - Pandarallel Function
     - Number of Bars
     - Time
   * - PyAlign Alignment Score
     - 10
     - 7.031509
   * - E Score
     - 10
     - 6.835784
   * - Percentage Coverage
     - 10
     - 7.396781
   * - Total Time
     -
     - 21.264074