Error-free software is exceedingly rare. Something as simple as a missing quotation mark in the source code can prevent a program from executing correctly. Errors that occur early in the code often propagate downstream, causing further issues. Many software errors arise from inadequate testing prior to release, including insufficient verification and validation across different computing environments, parameters, and use cases. When selecting software, it's crucial to understand that popularity or widespread use is not necessarily an indicator of its quality.

As big data continues to revolutionize both medical and research fields, diagnostic software products—ranging from personalized healthcare apps to genetic testing tools—are becoming indispensable in healthcare. Specifically, bioinformatic software plays a key role in the advancement of precision medicine, directly influencing healthcare decisions for patients. In clinical settings, errors in medical software, often referred to as Software as a Medical Device (SaMD), are unacceptable. To increase the clinical adoption of genomic-based diagnostics, we must build more robust scientific tools and rigorously validate them to ensure their safe application in patient care.

Case Studies of Scientific Software Errors

A. Lack of Software Development Training

A 2021 benchmarking study of 48 scientific software tools revealed that 3 of the 5 highest-scoring tools were developed by researchers with formal computer science training. In contrast, the 6 lowest-scoring tools were developed by self-taught programmers with no formal training in software development (Zapletal et al., 2019).

B. Unexpected Software Behavior

Unexpected, operating-system-dependent behavior in Python's glob module led to the miscalculation of data in over 150 published studies (Bhandari Neupane et al., 2019).

C. Improper Data Formatting

Microsoft Excel frequently converts gene symbols into dates or floating-point numbers, affecting 19.6% of publications that include Excel files with gene symbols. Journals most impacted by this issue include Nucleic Acids Research, Genome Biology, Nature Genetics, Genome Research, Genes and Development, and Nature (Lewis et al., 2021.

Open-Access Tools for Software Quality Evaluation

SoftWipe evaluates adherence to coding standards in scientific software written in C or C++. It assesses the software based on characteristics like the number of compilers used, coding format consistency, and the degree of code duplication. However, SoftWipe does not evaluate code correctness or validate the software against reference standards.

Benchtop compares software outputs to evaluate metrics such as concordance and accuracy against a reference dataset. Unlike SoftWipe, Benchtop doesn’t assess code quality but instead highlights discrepancies in outputs caused by code changes or variations in computing environments.

In addition to these tools, open-source development platforms like GitHub play a vital role in improving software quality. They allow users to engage with the software by examining source code, reporting bugs, raising questions, or suggesting improvements. This collaborative effort helps reduce undetected errors, accelerates troubleshooting, and optimizes software performance.

References

1. Bhandari Neupane, J., Neupane, R. P., Luo, Y., Yoshida, W. Y., Sun, R., & Williams, P. G. (2019). Characterization of Leptazolines A-D, Polar Oxazolines from the Cyanobacterium Leptolyngbya sp., Reveals a Glitch with the "Willoughby-Hoye" Scripts for Calculating NMR Chemical Shifts. Organic letters, 21(20), 8449–8453. https://doi.org/10.1021/acs.orglett.9b03216

2. Lewis D. (2021). Autocorrect errors in Excel still creating genomics headache. Nature, 10.1038/d41586-021-02211-4. Advance online publication. https://doi.org/10.1038/d41586-021-02211-4

3. Zapletal, A., Höhler, D., Sinz, C., & Stamatakis, A. (2021). The SoftWipe tool and benchmark for assessing coding standards adherence of scientific software. Scientific reports, 11 (1), 10015. https://doi.org/10.1038/s41598-021-89495-8