Hydrographic Data Processing on a Robust, Network-Coupled Parallel Cluster

TitleHydrographic Data Processing on a Robust, Network-Coupled Parallel Cluster
Publication TypeThesis
Year2015
AuthorsVenugopal, R
Degree and ProgramMaster of Science
DegreeComputer Science
Number of Pages100
Date PublishedDecember
UniversityUniversity of New Hampshire
LocationDurham, NH

There have been tremendous advances in acoustic sensor technologies and widespread adoption of multibeam echo-sounders in the recent past, which have enabled the efficient collection of large quantities of bathymetric data in every survey. However, timely dissemination of this data to the scientific community has been constrained by the relatively slow progress in the development of new data processing architectures. The current solutions for powerful, efficient and near-real time data processing systems entail high capital investments and technical complexities. Therefore, the installation base for these systems has been very small. The work presented here proposes a new architecture for bathymetric data processing based on parallel computing paradigms. The solution works by distributing the processing workload across a cluster of network-attached compute nodes. The success of using parallel processing for bathymetric data depends on the accurate measurement of the processing workload and its effective distribution across the compute nodes, thereby maximizing speedup and efficiency. These compute resources can be existing installations and other COTS components, such as blade servers, thereby reducing installation and maintenance expenditure.

For workload determination, an estimation algorithm was developed that uses stochastic sampling of the raw bathymetric data file. This produces a low cost and high accuracy estimate of the processing requirements for each line to be processed. This workload information, coupled with file and system metadata, is used as input to different load balancing algorithms - First Come First Served (FCFS), Longest Job First (LJF) and Contention-Reduction (CR). The performance of FCFS and LJF algorithms is highly dependent on the characteristics of the input dataset while CR scheduling aims to characterize the input and adjust load distribution for the best combination of speedup and efficiency. The choice of these algorithms depends on the requirements of the installation, i.e. prioritization of speedup or efficiency. To ensure robustness, watchdog mechanisms monitor the state of all the components of the processing system and can react to system faults and failures, through a combination of automated and manual techniques. Although not part of the current implementation, there is potential for adding redundant critical components and to enable live-failover, thereby reducing or eliminating system downtime.

The methods for workload estimation and distribution are templates for extending this framework to include additional types of bathymetric data and develop flexible, self-learning algorithms to deal with diverse datasets. This research lays the groundwork for the design of a ship-based system that would enable near-real time data processing and result in a faster ping-to-chart solution.

https://scholars.unh.edu/thesis/1060