THE USE OF OPEN SOURCE BOARDS FOR DATA COL-LECTION AND MACHINE LEARNING IN REMOTE DE-PLOYMENTS

Abstract

Machine learning is being adopted in many walks of life to solve various problems. This is being driven by development of robust machine learning algorithms, availability of large datasets and low cost computation resources. Some machine learning applications require deployment of devices off-the-grid for data collection and real time monitoring. Such applications require development of systems that can operate autonomously during their deployment. Advancement in technology has seen development of low-cost and low-power open-source microcontrollers and single board computers. These boards can be interfaced with a wide array of sensors and can perform computation processes. The boards are finding wide applications in data collection and machine learning initiatives. This paper will describe how the boards are leveraged for off-grid deployments.

1. INTRODUCTION

Machine learning is a discipline that comprises a wide range of algorithms and modeling tools used for diverse data processing tasks. The main goal of machine learning is to recognise patterns in data using computers and give informed insights on how to solve problems (Carleo et al., 2019) . Machine learning is one of the most rapidly progressing technical fields. This progress can be attributed to the development of new robust learning algorithms, availability of large volumes of data and low-cost computation resources. Machine learning has been adopted in many fields such as computer vision, pattern recognition, speech recognition, engineering, finance, sciences and healthcare (El Naqa & Murphy, 2015) . Data plays a central role in machine learning. The basic concept of machine learning algorithms is building models that learn from input (training) data and make predictions using new data based on the learnt experience (Lotfian et al., 2021) . The development of machine learning is seeing emergence of new applications that lack enough labelled data. Most modern machine learning techniques require a large amount of labelled data (Roh et al., 2019) . This makes data collection to be an important step in any machine learning task. Additionally, most of the available datasets are from developed countries and some of these are proprietary. These data may not be adequate to reflect all scenarios due to ecological and geographical differences across the world. This calls for localised data collection initiatives to build datasets that will be appropriate for localised problems solving (Ooko et al., 2021) . One source of data for machine learning applications is sensors. Sensors are devices that detect changes in their surroundings and convert it to an electrical signal. A microprocessor interfaced with the sensor processes the output signal of the sensor and gives an output that corresponds to a set of measures (Javaid et al., 2021) . Some applications require deployment of sensors in remote regions that are far from the grid. A good example is ecological data collection using sensors such as acoustic sensors and camera traps. For such off-grid deployments, several issues arise. They include: source of power, storage, on-board data processing, cost of hardware, communication and degree of autonomy for long term deployments (Gibb et al., 2019) . Advancement in technology has seen development of low-cost open-source single board computers and microcontrollers. These boards consume low power, can be interfaced with a wide array of sensors and have relatively sufficient processing capabilities. The boards are finding applications in

