Clustering and data reduction algorithm applied to spectra measured with a multi-static HF sounding system in Peru
A network of HF radio beacons and receivers for ionospheric sounding has been operating in Peru since 2016. This multi-static radar is composed of three transmitting stations and six receiving stations deployed around the central coast and Andes region of Peru. The beacons transmit two radio frequencies (2.72 MHz and 3.64 MHz) that are modulated with three different pseudo-random codes, one for each transmission station. This configuration allows the measurement of group delay (pseudorange), Doppler shift, power, and other parameters for each radio link. These measurements are used to estimate the regional plasma density as a function of space and time, information that is used in the forecasting of the occurrence of Spread-F.
The random codes allow us to discriminate the spectra of the signals coming from a given station, however, since the codes were not perfect, cross-talk signals were present in the measured spectra causing distortion. To improve the quality of the spectral data, the radar transmission scheme was modified such that the transmission frequencies for each station were separated by 3.3 Hz between each other. This modification allows us, in reception, to spectrally separate and identify the signals coming from a given station, displacing the cross-talk in frequency but not eliminating it. In order to extract from the measured spectra only the signals of interest discarding cross-talk and inference signals, an algorithm based on clustering techniques has been implemented. The algorithm is capable of detecting clusters of data in the spectra, classifying them as the coherent echoes of interest, while tagging the remaining sectors as “noise” that will be discarded. Thus, only the spectra of coherent echoes are preserved, while the rest, considered as noise, is replaced by a fixed noise value specific to each spectrum. As a result, the storage size of each spectrum is significantly reduced. In this work, we present a detailed description of the stages of the developed algorithm, examples of its application with different cases, and a comparison between the original and filtered data and their storage sizes.