Bonjour, ceci est un commentaire. Pour supprimer un commentaire, connectez-vous et affichez les commentaires de cet article. Vous pourrez alors…
U. Sarkans, H. Parkinson, G. G. Lara, A. Oezcimen, A. Sharma, N. Abeygunawardena, S. Contrino, E. Holloway, P. Rocca- Serra, G. Mukherjee, M. Shojatalab, M. Kapushesky, S. A. San-sone, A. Farne, T. Rayner and A. Brazma. (2005) The ArrayEx-press gene expression database: a software engineering and im-plementation perspective. Bioinformatics 21(8): 1495- 1501.
- Listed: 22 May 2026 23 h 24 min
Description
U. Sarkans, H. Parkinson, G. G. Lara, A. Oezcimen, A. Sharma, N. Abeygunawardena, S. Contrino, E. Holloway, P. Rocca- Serra, G. Mukherjee, M. Shojatalab, M. Kapushesky, S. A. San-sone, A. Farne, T. Rayner and A. Brazma. (2005) The ArrayEx-press gene expression database: a software engineering and im-plementation perspective. Bioinformatics 21(8): 1495- 1501.
**U. Sarkans, H. Parkinson, G. G. Lara, A. Oezcimen, A. Sharma, N. Abeygunawardena, S. Contrino, E. Holloway, P. Rocca‑Serra, G. Mukherjee, M. Shojatalab, M. Kapushesky, S. A. Sansone, A. Farne, T. Rayner and A. Brazma. (2005) The ArrayExpress gene expression database: a software engineering and implementation perspective. *Bioinformatics* 21(8): 1495‑1501.**
—
### Introduction: Why ArrayExpress Still Matters in 2024
When the 2005 *Bioinformatics* paper introduced **ArrayExpress**, it wasn’t just another data repository—it was a bold statement about the future of **gene expression databases** and **software engineering** in life sciences. More than a decade later, researchers still turn to ArrayExpress for high‑quality **microarray** and **RNA‑Seq** datasets, underscoring the paper’s lasting influence on **bioinformatics**, **functional genomics**, and **open science**. In this post we’ll unpack the key innovations described by Sarkans et al., explore how the platform has evolved, and highlight why its engineering principles remain a benchmark for modern **data sharing** solutions.
### A Software‑Centric Vision for Biological Data
The authors framed ArrayExpress as a **software engineering challenge** rather than a purely biological one. Their design goals—**scalability**, **interoperability**, and **robust metadata handling**—anticipated the explosion of high‑throughput experiments that would follow. By leveraging a **modular architecture**, the team enabled seamless integration with the **MIAME** (Minimum Information About a Microarray Experiment) standards, ensuring that every dataset carried the contextual information needed for reproducibility.
Key engineering takeaways include:
1. **Layered Architecture** – Separation of the presentation, business logic, and data storage layers made it easier to upgrade components without disrupting user access.
2. **Metadata‑Driven Indexing** – Rich, searchable annotations allowed scientists to locate relevant experiments quickly, a feature that today’s **FAIR** (Findable, Accessible, Interoperable, Reusable) initiatives still champion.
3. **Open‑Source Toolkit** – The release of the underlying codebase encouraged community contributions, fostering a collaborative ecosystem that later projects like **EBI’s Expression Atlas** could build upon.
### From Microarrays to Multi‑Omics: Evolution of the Platform
While the 2005 article focused primarily on **microarray** data, the underlying infrastructure proved flexible enough to accommodate emerging technologies. By 2010, ArrayExpress began ingesting **RNA‑Seq** datasets, and more recently it supports **single‑cell transcriptomics**, **ChIP‑Seq**, and even **proteomics** experiments. This adaptability is a direct result of the **software engineering principles** highlighted by Sarkans et al.:
– **Extensible Data Model** – New assay types could be added by extending the existing schema rather than redesigning it from scratch.
– **API‑First Approach** – Robust RESTful services enable programmatic access, powering downstream tools such as **Bioconductor**, **Galaxy**, and custom **machine‑learning pipelines**.
– **Continuous Integration/Deployment (CI/CD)** – Automated testing pipelines ensure that updates do not compromise data integrity, a practice that aligns with modern **DevOps** standards.
### Impact on the Bioinformatics Community
The paper’s citation count (over 2,500 times) reflects its role as a cornerstone reference for anyone building **biological data repositories**. Researchers benefit from:
– **High‑Quality Curated Datasets** – Consistent metadata improves statistical power in meta‑analyses and cross‑study comparisons.
– **Reproducibility** – Transparent versioning and provenance tracking make it easier to replicate published findings, addressing a major concern in contemporary science.
– **Education & Training** – The open‑source code and detailed documentation serve as teaching material for bioinformatics curricula, illustrating best practices in **software engineering for life sciences**.
### Looking Forward: Lessons for Future Databases
What can new projects learn from the ArrayExpress blueprint?
1. **Prioritize Standards Early** – Aligning with community standards (e.g., **FAIR**, **MIAME**, **MINSEQE**) avoids costly retrofits later.
2. **Design for Extensibility** – A modular codebase accommodates novel assay types without breaking existing functionality.
3. **Invest in Community** – Open‑source licensing and clear contribution guidelines attract developers who can extend the platform’s capabilities.
By embracing these principles, the next generation of **gene expression databases** can achieve the same longevity and relevance that ArrayExpress enjoys today.
### Conclusion
The 2005 *Bioinformatics* article by Sarkans et al. remains a seminal work that married **software engineering rigor** with the needs of **functional genomics**. Its foresight in building a scalable, interoperable, and metadata‑rich system set the stage for today’s thriving **bioinformatics ecosystem**. Whether you’re a bench scientist searching for expression data, a bioinformatician developing analysis pipelines, or a software architect designing a new data repository, the lessons from ArrayExpress are as valuable now as they were fifteen years ago.
*Keywords: ArrayExpress, gene expression database, bioinformatics, software engineering, microarray, RNA‑Seq, functional genomics, data sharing, open science, FAIR data, metadata, reproducibility, data repository, high‑throughput sequencing.*
3 total views, 1 today
Sponsored Links
L. E. Warner, J. Svaren, J. Milbrandt and J. R. Lupski, (1999) Functional c...
L. E. Warner, J. Svaren, J. Milbrandt and J. R. Lupski, (1999) Functional consequences of mutations in the early growth re-sponse 2 gene (EGR2) correlate […]
No views yet
M. Safford, S. Collins, M. A. Lutz, A. Allen, C. Huang, J. Kowalski, A. Bla...
M. Safford, S. Collins, M. A. Lutz, A. Allen, C. Huang, J. Kowalski, A. Blackford, M. R. Horton, C. Drake, R. H. Schwartz and J. […]
No views yet
Acuity 4.0: http://www.moleculardevices.com/pages/software/ gn_acuity.html
Acuity 4.0: http://www.moleculardevices.com/pages/software/ gn_acuity.html None
2 total views, 2 today
GenePix pro 4.1: http://www.axon.com
GenePix pro 4.1: http://www.axon.com None
1 total views, 1 today
G. F. Berriz and F. P. Roth, The Synergizer service for translat-ing gene, ...
G. F. Berriz and F. P. Roth, The Synergizer service for translat-ing gene, protein, and other biological identifiers. (2008). Bio-informatics. [Epub ahead of print]. None
1 total views, 1 today
K. J. Bussey, D. Kane, M. Sunshine, S. Narasimhan, S. Nishi-zuka, W. C. Rei...
K. J. Bussey, D. Kane, M. Sunshine, S. Narasimhan, S. Nishi-zuka, W. C. Reinhold, B. Zeeberg, W. Ajay and J. N. Weinstein, (2003) MatchMiner: a […]
1 total views, 1 today
M. Kanehisa, S. Goto, S. Kawashima, Y. Okuno and M. Hattori, (2004) The KEG...
M. Kanehisa, S. Goto, S. Kawashima, Y. Okuno and M. Hattori, (2004) The KEGG resource for deciphering the genome. Nucleic Acids Res, 32. **”The KEGG […]
1 total views, 1 today
S. Khalid, M. Khan, P. Wang, X. Liu and S. -L. Li, (2006b). Application of ...
S. Khalid, M. Khan, P. Wang, X. Liu and S. -L. Li, (2006b). Application of bioinformatics in the design of gene expression microarrays. Second International […]
1 total views, 1 today
S. Khalid, F. Fraser, M. Khan, P. Wang, X. Liu and S. Li, (2006a). Analysin...
S. Khalid, F. Fraser, M. Khan, P. Wang, X. Liu and S. Li, (2006a). Analysing Microarray Data using the Multi-functional Immune Ontologiser. J. Integrative Bioinformatics […]
2 total views, 1 today
A. Subramanian, P. Tamayo, V. K. Mootha, S. Mukherjee, B. L. Ebert, M. A. G...
A. Subramanian, P. Tamayo, V. K. Mootha, S. Mukherjee, B. L. Ebert, M. A. Gillette, A. Paulovich, S. L. Pomeroy, T. R. Golub, E. S. […]
2 total views, 1 today
L. E. Warner, J. Svaren, J. Milbrandt and J. R. Lupski, (1999) Functional c...
L. E. Warner, J. Svaren, J. Milbrandt and J. R. Lupski, (1999) Functional consequences of mutations in the early growth re-sponse 2 gene (EGR2) correlate […]
No views yet
M. Safford, S. Collins, M. A. Lutz, A. Allen, C. Huang, J. Kowalski, A. Bla...
M. Safford, S. Collins, M. A. Lutz, A. Allen, C. Huang, J. Kowalski, A. Blackford, M. R. Horton, C. Drake, R. H. Schwartz and J. […]
No views yet
Acuity 4.0: http://www.moleculardevices.com/pages/software/ gn_acuity.html
Acuity 4.0: http://www.moleculardevices.com/pages/software/ gn_acuity.html None
2 total views, 2 today
GenePix pro 4.1: http://www.axon.com
GenePix pro 4.1: http://www.axon.com None
1 total views, 1 today
G. F. Berriz and F. P. Roth, The Synergizer service for translat-ing gene, ...
G. F. Berriz and F. P. Roth, The Synergizer service for translat-ing gene, protein, and other biological identifiers. (2008). Bio-informatics. [Epub ahead of print]. None
1 total views, 1 today
K. J. Bussey, D. Kane, M. Sunshine, S. Narasimhan, S. Nishi-zuka, W. C. Rei...
K. J. Bussey, D. Kane, M. Sunshine, S. Narasimhan, S. Nishi-zuka, W. C. Reinhold, B. Zeeberg, W. Ajay and J. N. Weinstein, (2003) MatchMiner: a […]
1 total views, 1 today
M. Kanehisa, S. Goto, S. Kawashima, Y. Okuno and M. Hattori, (2004) The KEG...
M. Kanehisa, S. Goto, S. Kawashima, Y. Okuno and M. Hattori, (2004) The KEGG resource for deciphering the genome. Nucleic Acids Res, 32. **”The KEGG […]
1 total views, 1 today
S. Khalid, M. Khan, P. Wang, X. Liu and S. -L. Li, (2006b). Application of ...
S. Khalid, M. Khan, P. Wang, X. Liu and S. -L. Li, (2006b). Application of bioinformatics in the design of gene expression microarrays. Second International […]
1 total views, 1 today
S. Khalid, F. Fraser, M. Khan, P. Wang, X. Liu and S. Li, (2006a). Analysin...
S. Khalid, F. Fraser, M. Khan, P. Wang, X. Liu and S. Li, (2006a). Analysing Microarray Data using the Multi-functional Immune Ontologiser. J. Integrative Bioinformatics […]
2 total views, 1 today
A. Subramanian, P. Tamayo, V. K. Mootha, S. Mukherjee, B. L. Ebert, M. A. G...
A. Subramanian, P. Tamayo, V. K. Mootha, S. Mukherjee, B. L. Ebert, M. A. Gillette, A. Paulovich, S. L. Pomeroy, T. R. Golub, E. S. […]
2 total views, 1 today
Recent Comments