ZCAL: CALIBRATING RADIO INTERFEROMETRIC DATA WITH MACHINE LEARNING

Abstract

Calibration is the most critical data processing step needed for generating images of high dynamic range (CASA cookbook, 2009). With ever-increasing data volumes produced by modern radio telescopes (Aniyan & Thorat, 2017), astronomers are overwhelmed by the amount of data that needs to be manually processed and analyzed using limited computational resources (Yatawatta, 2020). Therefore, intelligent and automated systems are required to overcome these challenges. Traditionally, astronomers use a package such as Common Astronomy Software Applications (CASA) to compute the gain solutions based on regular observations of a known calibrator source (Thompson et al., 2017) (Abebe, 2015) (Grobler et al., 2016) (CASA cookbook, 2009). The traditional approach to calibration is iterative and time-consuming (Jajarmizadeh et al., 2017), thus, the proposal of machine learning techniques. The applications of machine learning have created an opportunity to deal with complex problems currently encountered in radio astronomy data processing (Aniyan & Thorat, 2017). In this work, we propose the use of supervised machine learning models to first generation calibration (1GC), using the KAT-7 telescope environmental and pointing sensor data recorded during observations. Applying machine learning to 1GC, as opposed to calculating the gain solutions in CASA, has shown evidence of reducing computation, as well as accurately predicting the 1GC gain solutions and antenna behaviour. These methods are computationally less expensive, however they have not fully learned to generalise in predicting accurate 1GC solutions by looking at environmental and pointing sensors. We use an ensemble multi-output regression models based on random forest, decision trees, extremely randomized trees and K-nearest neighbor algorithms. The average prediction error obtained during the testing of our models on testing data is ≈ 0.01 < rmse < 0.09 for gain amplitude per antenna, and 0.2rad < rmse < 0.5rad for gain phase. This shows that the instrumental parameters used to train our model strongly correlate with gain amplitude effects than a phase.

1. INTRODUCTION

Modern-day astronomy is at an unprecedented stage, with a deluge of data from different telescopes. In contrast to conventional methods, today astronomical discoveries are data-driven. The upcoming Square Kilometer Array (SKA) is expected to produce terabytes of data every hour (The SKA telescope). With this exponential growth of data, challenges for data calibration, reduction, and analysis also increase (Aniyan & Thorat, 2017), making it difficult for astronomers to manually process and analyse the data (Yatawatta, 2020). Therefore, intelligent and automated systems are required to overcome these challenges. One of the main issues in radio astronomy is determining the quality of observational data. Astronomical signals are very weak by the time they reach the Earth's surface. They are easily corrupted by atmospheric interferences, incorrect observational parameters (e.g. telescope locations or telescope pointing parameters), malfunctioning signal receivers, interference from terrestrial man-made radio sources and tracking inaccuracies (Taylor et al., 1999) . Therefore, it is required to do proper corrections to the observational data before processing the data. Radio astronomers spend a considerable amount of time performing a series of preprocessing steps called calibration, which involves the determination of a set of parameters to correct the received data. These generally include instrumental as well as astronomical parameters. The general strategy for 1

