ZCAL: CALIBRATING RADIO INTERFEROMETRIC DATA WITH MACHINE LEARNING

Abstract

Calibration is the most critical data processing step needed for generating images of high dynamic range (CASA cookbook, 2009). With ever-increasing data volumes produced by modern radio telescopes (Aniyan & Thorat, 2017), astronomers are overwhelmed by the amount of data that needs to be manually processed and analyzed using limited computational resources (Yatawatta, 2020). Therefore, intelligent and automated systems are required to overcome these challenges. Traditionally, astronomers use a package such as Common Astronomy Software Applications (CASA) to compute the gain solutions based on regular observations of a known calibrator source (Thompson et al., 2017) (Abebe, 2015) (Grobler et al., 2016) (CASA cookbook, 2009). The traditional approach to calibration is iterative and time-consuming (Jajarmizadeh et al., 2017), thus, the proposal of machine learning techniques. The applications of machine learning have created an opportunity to deal with complex problems currently encountered in radio astronomy data processing (Aniyan & Thorat, 2017). In this work, we propose the use of supervised machine learning models to first generation calibration (1GC), using the KAT-7 telescope environmental and pointing sensor data recorded during observations. Applying machine learning to 1GC, as opposed to calculating the gain solutions in CASA, has shown evidence of reducing computation, as well as accurately predicting the 1GC gain solutions and antenna behaviour. These methods are computationally less expensive, however they have not fully learned to generalise in predicting accurate 1GC solutions by looking at environmental and pointing sensors. We use an ensemble multi-output regression models based on random forest, decision trees, extremely randomized trees and K-nearest neighbor algorithms. The average prediction error obtained during the testing of our models on testing data is ≈ 0.01 < rmse < 0.09 for gain amplitude per antenna, and 0.2rad < rmse < 0.5rad for gain phase. This shows that the instrumental parameters used to train our model strongly correlate with gain amplitude effects than a phase.

1. INTRODUCTION

Modern-day astronomy is at an unprecedented stage, with a deluge of data from different telescopes. In contrast to conventional methods, today astronomical discoveries are data-driven. The upcoming Square Kilometer Array (SKA) is expected to produce terabytes of data every hour (The SKA telescope). With this exponential growth of data, challenges for data calibration, reduction, and analysis also increase (Aniyan & Thorat, 2017) , making it difficult for astronomers to manually process and analyse the data (Yatawatta, 2020). Therefore, intelligent and automated systems are required to overcome these challenges. One of the main issues in radio astronomy is determining the quality of observational data. Astronomical signals are very weak by the time they reach the Earth's surface. They are easily corrupted by atmospheric interferences, incorrect observational parameters (e.g. telescope locations or telescope pointing parameters), malfunctioning signal receivers, interference from terrestrial man-made radio sources and tracking inaccuracies (Taylor et al., 1999) . Therefore, it is required to do proper corrections to the observational data before processing the data. Radio astronomers spend a considerable amount of time performing a series of preprocessing steps called calibration, which involves the determination of a set of parameters to correct the received data. These generally include instrumental as well as astronomical parameters. The general strategy for doing these corrections makes use of a calibrator source. Calibrator sources are well suited for determining astronomical parameters for data corrections because they have known characteristics such as the brightness, shape, and frequency spectrum (Taylor et al., 1999) . This process of calibration is iterative and time-consuming. During scientific observations, different external parameters such as atmospheric pressure, temperature wind conditions, and relative humidity are collected through thousands of sensors attached to the telescopes and its adjoining instrumentation. The data coming from different sensors may provide information about the external conditions that may have corrupted the observed data. This piece of information is not always included in the conventional calibration steps. We propose to use machine learning methods to predict the calibration solutions, looking at pointing and environmental sensor data. This is mainly motivated by the fact that calibration steps make corrections to data that has been corrupted by environmental parameters. In this study, we make use of data from the Karoo Array Telescope (KAT-7), an array consisting of seven telescopes, which is a precursor to the MeerKAT radio telescope The SKA telescope. We look at eight types of sensor data recorded during observations, with a calibrator source PKS1613-586 to generate the training and testing dataset. The overall generated dataset contains sensor data per telescope and calibration solutions for the signal received by each telescope in horizontal polarization (H-pol) and vertical polarization (V-pol). These calibrator solutions are calculated using the astronomy software called Common Astronomy Software Applications.

2. CALIBRATION

In radio astronomy, ideally one might think that after obtaining the observed visibilities the next step would be to directly retrieve the actual visibilities of the target source and perform imaging. However, the measured visibilities V obs are different from the actual visibilities V T rue and this is due to instrumental and environmental effects (Richard Thompson et al., 2017) . An example of these effects on the signal measured by a radio interferometry include antenna gains (slowly and fast time-varying instrumental part), atmospheric effects, pointing errors (tracking inaccuracies) and incorrect observation parameters (antenna pointing parameters). Signal effects are classified into two types, direction-independent effects (affecting the signal from all directions equally) and directiondependent effects (which vary based on the sky position of the signal) (Taylor et al., 1999) . These effects can be corrected by estimating the errors associated with the measured visibilities, thereby recovering the true visibilities. This process is called calibration. In its simplest form, calibration minimizes the error between observed and predicted (model) visibilities by estimating the correct complex instrumental gain response (Grobler et al., 2016) . Suppose for baseline pair (i, j), the observed visibility is V obs i,j (t) and the true visibility is V T rue i,j (t) at observation time t. The basic calibration formula is written as, V obs i,j = G i,j V T rue i,j + i,j (t) where, G i,j(t) denotes the complex antenna gains for baseline (i, j) as a result of unwanted effects and may vary with time (Thompson et al., 2001) . The extra term i,j (t) is a stochastic complex noise (Taylor et al., 1999) . Most of the corruptions in data occur before the signal is correlated and the response associated with antenna i does not depend on the response of antenna j. Note that the sources that are the subject of astronomical investigation will be referred to as "target sources" to distinguish them from calibrator sources (Thompson et al., 2001) .

3. KAT-7 TELESCOPE

The KAT-7 is a seven-dish interferometry that was built as an engineering prototype for techniques and technologies in preparation for the 64-dish Karoo Array Telescope (MeerKAT) (Foley et al., 2016) . These instruments are located in the Northern Cape Karoo desert region and are operated remotely from Cape Town. The construction of KAT-7 began in 2008 with the writing of the telescope requirements specification and was completed in 2010. It was then operated in engineering (commissioning) mode until its shut-down in 2016 (Foley et al., 2016) .

3.1. SENSOR DATA

During science observations, different external parameters like atmospheric pressure, temperature wind conditions, and relative humidity are also collected through thousands of sensors attached

