Explore the details of my software engineering internship where I developed web applications and data pipelines for geospatial data visualization.
During my internship at Levitree, I developed a customer-facing web portal for visualizing terrain data across 160,000+ square miles and created ETL data pipelines to process 700,000+ geospatial data points using Python, React, and Google Cloud Platform (GCP).
The project involved automating manual workflows for data ingestion of geological records and enabling real-time data quality validations for terrain data using Kepler.gl, an open-source geospatial analysis tool developed by Uber. The web portal provides interactive maps and data analytics tools that help users understand terrain characteristics, elevation patterns, and geographic features across vast regions.
One of the main challenges was automating the ingestion of geological records from diverse sources. To address this, I created OCR models to read geospatial data reports and utilized LLM APIs to extract structured data from the collected information. This automated process replaced manual data entry workflows and significantly improved data accuracy.
Another major challenge was handling the massive volume of geospatial data efficiently. I set up APIs to query existing data sources in databases and designed ETL pipelines that could batch process 700,000+ data points while maintaining data integrity. The system includes real-time data quality validations using Kepler.gl to ensure terrain data accuracy before visualization.
Creating an intuitive user interface that could display complex terrain data across 160,000+ square miles was also challenging. I implemented interactive visualization features using React and Kepler.gl, ensuring the interface remained responsive even when rendering large datasets.
The web portal successfully processes and visualizes terrain data across 160,000+ square miles with excellent performance. The automated data ingestion workflows eliminated manual data entry, while the OCR and LLM-based extraction systems achieved high accuracy in processing geological records.
The ETL pipelines efficiently handle 700,000+ geospatial data points with real-time quality validations, enabling accurate queries and interactive exploration. The system has been deployed to production and is actively used by customers for terrain analysis and geographic planning.
Since the company data used in this project is proprietary, I have included a visualization demonstrating Kepler.gl's geospatial analysis capabilities using historical earthquake data from the United States Geological Survey (USGS). This example showcases the type of interactive geospatial visualization and analysis tools that were implemented in the Levitree web portal.
Data Source: USGS Earthquake Hazards Program