Hydrographic Data Processing on a Robust, Network-Coupled Parallel Cluster
CCOM/JHC
There have been tremendous advances in acoustic sensor technologies and widespread adoption of multibeam echo-sounders, which have enabled the efficient collection of large quantities of bathymetric data in every survey. However, the fast dissemination of this data to the scientific community has been restrained by the relatively slow progress in the development of new data processing architectures. The current solutions for powerful, efficient and near-real time data processing systems entail high capital investments and technical complexities. Therefore, the installation base for these systems has been very small. The work presented here proposes a new architecture for bathymetric data processing based on parallel computing paradigms. The solution works by distributing the processing workload across a cluster of network-attached compute nodes. The success of using parallel processing for bathymetric data depends on the accurate measurement of the processing workload and its effective distribution across the compute nodes, thereby maximizing speedup and efficiency. These compute resources can be existing installations and other COTS components, such as blade servers. This system is designed to pool a variety of network-accessible compute resources to form a data processing cluster.
For this proof-of-concept implementation, raw bathymetric data was limited to the eXtended Triton Format (XTF) files (obtained from two distinct surveys) and three load balancing algorithms. Given the packet based organization of data in the XTF bathymetric file and the need to determine the workload in the most efficient way, an estimation algorithm based on stochastic sampling was developed. This algorithm can estimate the computational workload required for the transformation of an XTF file, with an average percentage error of 0.08. This workload information and other file metadata is used as input to different load balancing algorithms – First Come First Served (FCFS), Longest Job First (LJF) and Dynamic Priority. A combination of a dataset and its suited load balancing algorithm results in the most effective distribution of the processing workload. For instance, LJF was able to achieve near-linear speedup and operated at a minimum efficiency of 0.95. Watchdog mechanisms monitor the state of all the components of the processing system and can react to system faults and failures, through a combination of automated and manual techniques. Outages in non-critical components, like compute nodes, are handled by actions that involve isolation of the problem module and operating under reduced capacity, until the issue has been rectified. The maintenance and repair costs and the required technical expertise are low due to the use of COTS components. Although not part of the current implementation, there are provisions for adding redundant critical components and enable live-failover, thereby reducing or eliminating system downtime.
The aim of this research was to prove the feasibility of new parallel data processing system that is powerful, efficient, and cost-effective. The methods for workload estimation and distribution are templates for extending this framework to include additional types of bathymetric data and develop flexible, self-learning algorithms to deal with diverse datasets. This research lays the groundwork for the design of a ship-based system that would enable near-real time data processing and result in a faster ping-to-chart solution.