Welcome, visitor! [ Login

 

Hao Fan. (2005) Investigating a Heterogeneous Data Integration Approach for Data Warehousing. PhD thesis, Birkbeck College, University of London.

  • Listed: 11 May 2026 15 h 36 min

Description

Hao Fan. (2005) Investigating a Heterogeneous Data Integration Approach for Data Warehousing. PhD thesis, Birkbeck College, University of London.

**Hao Fan. (2005) Investigating a Heterogeneous Data Integration Approach for Data Warehousing. PhD thesis, Birkbeck College, University of London.**

When we think of modern business intelligence, the phrase *data warehousing* usually conjures images of clean, structured tables and well‑defined schemas. Yet the real world rarely presents data in a single, tidy format. In 2005, Hao Fan tackled this reality head‑on in her seminal PhD thesis, “Investigating a Heterogeneous Data Integration Approach for Data Warehousing.” Her work laid a foundational framework for integrating diverse data sources—structured, semi‑structured, and unstructured—into a coherent analytical platform.

### Why Heterogeneous Data Integration Matters

In the early 2000s, organizations were grappling with the explosion of data coming from spreadsheets, relational databases, XML documents, and emerging web services. Traditional Extract‑Transform‑Load (ETL) pipelines faltered when confronted with such variety, leading to incomplete analytics and costly manual interventions. Fan’s research identified *heterogeneity*—the differing formats, semantics, and quality levels—as a key bottleneck in building robust data warehouses.

### Key Contributions of the Thesis

1. **Unified Integration Model** – Fan proposed a flexible architecture that treats each source as a *data layer* rather than forcing a one‑size‑fits‑all schema. This layered model allows source‑specific transformations while preserving a global, coherent view for end users.

2. **Semantic Matching Techniques** – To align disparate data sets, Fan introduced algorithmic approaches that detect and reconcile synonyms, homonyms, and hierarchical relationships. This semantic matching is critical for ensuring that aggregated metrics are accurate and meaningful.

3. **Quality Assurance Framework** – Recognizing that mixed data can degrade trust, the thesis outlines a comprehensive data quality framework. It includes automated validation, consistency checks, and a feedback loop for continuous improvement.

4. **Prototype Implementation** – Using the Birkbeck College research platform, Fan built a prototype that demonstrated improved query performance and reduced ETL time compared to conventional methods.

### Impact on Today’s Data Ecosystem

Although published over a decade ago, Fan’s insights remain highly relevant. Modern cloud platforms—AWS Redshift, Snowflake, Google BigQuery—rely on data integration patterns that echo her layered, semantic approach. Likewise, the rise of *data lakes* and *data fabric* concepts can trace conceptual roots back to the heterogeneous integration challenges she addressed.

### Takeaway for Practitioners

– **Adopt Flexible Schemas**: Resist the urge to flatten all data into rigid tables. A multi‑layered architecture can better accommodate evolving source systems.
– **Invest in Semantics**: Accurate semantic mapping reduces misinterpretation and aligns analytical outcomes with business realities.
– **Prioritize Data Quality**: Implement automated quality checks early in the pipeline to prevent costly downstream corrections.

In essence, Hao Fan’s thesis is more than a historical artifact—it is a blueprint that continues to inform data integration best practices. Whether you’re architecting a new warehouse, migrating to a cloud platform, or simply striving for cleaner analytics, the principles she laid out in 2005 offer timeless guidance for turning heterogeneous data into strategic insight.

No Tags

22 total views, 1 today

  

Listing ID: N/A

Report problem

Processing your request, Please wait....

Sponsored Links

 

Hofmann-Wellenhof, B., Lichtenegger, H. and Collins, J. (2000) GPS Theory a...

Hofmann-Wellenhof, B., Lichtenegger, H. and Collins, J. (2000) GPS Theory and Practice. Fifth revised edition. Springer- Verlag/Wien, New York, 382 pp. Okay, let’s see. The […]

No views yet

 

He, X., Yang, G. and Chen, Y. (2005) Pseudolite-Augmented GPS For Deformati...

He, X., Yang, G. and Chen, Y. (2005) Pseudolite-Augmented GPS For Deformation Monitoring: Analysis and Experimental Study. International Symposium on GPS/GNSS. The Hong Kong Polytechnic […]

1 total views, 1 today

 

Grejner-Brzezinska, D. (1997) Airborne Integrated Mapping System: Positioni...

Grejner-Brzezinska, D. (1997) Airborne Integrated Mapping System: Positioning Module. In: ION (Editor), 53rd ION Annual Meeting, Albuquerque, New Mexico, pp. 225-235. Okay, so I need […]

1 total views, 1 today

 

Dai, L., Wang, J. and Rizos, C. (2001) The role of pseudosatellite signals ...

Dai, L., Wang, J. and Rizos, C. (2001) The role of pseudosatellite signals in precise GPS-based positioning. Journal of Geospatial Engineering, 3(1): 33-44. Okay, I […]

5 total views, 5 today

 

Cramer, M., (2003) Integrated GPS/inertial and digital aerial triangulation...

Cramer, M., (2003) Integrated GPS/inertial and digital aerial triangulation: Recent test results. In: D. Fritsch (Editor), Photogrammetric Week ’03, Herbert Wichmann Verlag, Heidelberg, pp. 161?72. […]

6 total views, 6 today

 

Coleman, T.F. (2006) Optimization Toolbox. The MathWorks, Natick, MA, USA.

Coleman, T.F. (2006) Optimization Toolbox. The MathWorks, Natick, MA, USA. **Coleman, T.F. (2006) Optimization Toolbox. The MathWorks, Natick, MA, USA.** — When you see a […]

6 total views, 6 today

 

Choi, I.K., Wang, J., Han, S. and Rizos, C. (2000) Pseudolites: a new tool ...

Choi, I.K., Wang, J., Han, S. and Rizos, C. (2000) Pseudolites: a new tool for surveyors? 2nd Trans Tasman Survey Congress, Queenstown, New Zealand, pp. […]

4 total views, 4 today

 

Bouska, C.T.J. and Raquet, J.F. (2003) Tropospheric Model Error Reduction i...

Bouska, C.T.J. and Raquet, J.F. (2003) Tropospheric Model Error Reduction in Pseudolite Based Positioning Systems. ION GPS/GNSS 2003, Portland OR, USA, pp. 390-298. “Bouska, C.T.J. […]

5 total views, 5 today

 

Biberger, R.J., Teuber, A., Pany, T. and Hein, G.W. (2003) Development of a...

Biberger, R.J., Teuber, A., Pany, T. and Hein, G.W. (2003) Development of an APL Error Model for Precision Approaches and Validation by Flight Experiments. In: […]

6 total views, 6 today

 

Bernese (1999) Bernese GPS Software Manual, University of Bern.

Bernese (1999) Bernese GPS Software Manual, University of Bern. **Bernese (1999) Bernese GPS Software Manual, University of Bern.** *Unlocking the Power of Precise Positioning: A […]

4 total views, 4 today

 

Hofmann-Wellenhof, B., Lichtenegger, H. and Collins, J. (2000) GPS Theory a...

Hofmann-Wellenhof, B., Lichtenegger, H. and Collins, J. (2000) GPS Theory and Practice. Fifth revised edition. Springer- Verlag/Wien, New York, 382 pp. Okay, let’s see. The […]

No views yet

 

He, X., Yang, G. and Chen, Y. (2005) Pseudolite-Augmented GPS For Deformati...

He, X., Yang, G. and Chen, Y. (2005) Pseudolite-Augmented GPS For Deformation Monitoring: Analysis and Experimental Study. International Symposium on GPS/GNSS. The Hong Kong Polytechnic […]

1 total views, 1 today

 

Grejner-Brzezinska, D. (1997) Airborne Integrated Mapping System: Positioni...

Grejner-Brzezinska, D. (1997) Airborne Integrated Mapping System: Positioning Module. In: ION (Editor), 53rd ION Annual Meeting, Albuquerque, New Mexico, pp. 225-235. Okay, so I need […]

1 total views, 1 today

 

Dai, L., Wang, J. and Rizos, C. (2001) The role of pseudosatellite signals ...

Dai, L., Wang, J. and Rizos, C. (2001) The role of pseudosatellite signals in precise GPS-based positioning. Journal of Geospatial Engineering, 3(1): 33-44. Okay, I […]

5 total views, 5 today

 

Cramer, M., (2003) Integrated GPS/inertial and digital aerial triangulation...

Cramer, M., (2003) Integrated GPS/inertial and digital aerial triangulation: Recent test results. In: D. Fritsch (Editor), Photogrammetric Week ’03, Herbert Wichmann Verlag, Heidelberg, pp. 161?72. […]

6 total views, 6 today

 

Coleman, T.F. (2006) Optimization Toolbox. The MathWorks, Natick, MA, USA.

Coleman, T.F. (2006) Optimization Toolbox. The MathWorks, Natick, MA, USA. **Coleman, T.F. (2006) Optimization Toolbox. The MathWorks, Natick, MA, USA.** — When you see a […]

6 total views, 6 today

 

Choi, I.K., Wang, J., Han, S. and Rizos, C. (2000) Pseudolites: a new tool ...

Choi, I.K., Wang, J., Han, S. and Rizos, C. (2000) Pseudolites: a new tool for surveyors? 2nd Trans Tasman Survey Congress, Queenstown, New Zealand, pp. […]

4 total views, 4 today

 

Bouska, C.T.J. and Raquet, J.F. (2003) Tropospheric Model Error Reduction i...

Bouska, C.T.J. and Raquet, J.F. (2003) Tropospheric Model Error Reduction in Pseudolite Based Positioning Systems. ION GPS/GNSS 2003, Portland OR, USA, pp. 390-298. “Bouska, C.T.J. […]

5 total views, 5 today

 

Biberger, R.J., Teuber, A., Pany, T. and Hein, G.W. (2003) Development of a...

Biberger, R.J., Teuber, A., Pany, T. and Hein, G.W. (2003) Development of an APL Error Model for Precision Approaches and Validation by Flight Experiments. In: […]

6 total views, 6 today

 

Bernese (1999) Bernese GPS Software Manual, University of Bern.

Bernese (1999) Bernese GPS Software Manual, University of Bern. **Bernese (1999) Bernese GPS Software Manual, University of Bern.** *Unlocking the Power of Precise Positioning: A […]

4 total views, 4 today