Development of a Machine-Learning algorithm for Classifying quality of Red-line and Green-line Airglow Images
Proper analysis of red and green airglow images requires both large amounts of data and for that data to contain measurable information through a lack of obfuscation in each image. Given the binary nature of a clear vs unclear classification and the potential time and storage space saved by automating such a categorization, we decided to create a supervised machine learning algorithm for that purpose using logistic regression. Three hundred greenline images from the Midlatitude Allsky-imaging Network for GeoSpace Observations (MANGO) archive from four observation sites were manually labeled to train and test the model, with features derived from a 2D Fourier transform of each image.
After features were chosen, the accuracy of the algorithm reached 95.6% accuracy, with the model weighted slightly to favor classifying images as unclear to ensure the resulting clear dataset is not polluted with any incorrect categorizations. The algorithm was then tested on an independent dataset of 50 images, each from a site on which the model had not been trained. The resulting accuracy was 90%, showing potential for real application even for sites not included in the model’s training data.