THE USE OF OPEN SOURCE BOARDS FOR DATA COL-LECTION AND MACHINE LEARNING IN REMOTE DE-PLOYMENTS

Abstract

Machine learning is being adopted in many walks of life to solve various problems. This is being driven by development of robust machine learning algorithms, availability of large datasets and low cost computation resources. Some machine learning applications require deployment of devices off-the-grid for data collection and real time monitoring. Such applications require development of systems that can operate autonomously during their deployment. Advancement in technology has seen development of low-cost and low-power open-source microcontrollers and single board computers. These boards can be interfaced with a wide array of sensors and can perform computation processes. The boards are finding wide applications in data collection and machine learning initiatives. This paper will describe how the boards are leveraged for off-grid deployments.

1. INTRODUCTION

Machine learning is a discipline that comprises a wide range of algorithms and modeling tools used for diverse data processing tasks. The main goal of machine learning is to recognise patterns in data using computers and give informed insights on how to solve problems (Carleo et al., 2019) . Machine learning is one of the most rapidly progressing technical fields. This progress can be attributed to the development of new robust learning algorithms, availability of large volumes of data and low-cost computation resources. Machine learning has been adopted in many fields such as computer vision, pattern recognition, speech recognition, engineering, finance, sciences and healthcare (El Naqa & Murphy, 2015) . Data plays a central role in machine learning. The basic concept of machine learning algorithms is building models that learn from input (training) data and make predictions using new data based on the learnt experience (Lotfian et al., 2021) . The development of machine learning is seeing emergence of new applications that lack enough labelled data. Most modern machine learning techniques require a large amount of labelled data (Roh et al., 2019) . This makes data collection to be an important step in any machine learning task. Additionally, most of the available datasets are from developed countries and some of these are proprietary. These data may not be adequate to reflect all scenarios due to ecological and geographical differences across the world. This calls for localised data collection initiatives to build datasets that will be appropriate for localised problems solving (Ooko et al., 2021) . One source of data for machine learning applications is sensors. Sensors are devices that detect changes in their surroundings and convert it to an electrical signal. A microprocessor interfaced with the sensor processes the output signal of the sensor and gives an output that corresponds to a set of measures (Javaid et al., 2021) . Some applications require deployment of sensors in remote regions that are far from the grid. A good example is ecological data collection using sensors such as acoustic sensors and camera traps. For such off-grid deployments, several issues arise. They include: source of power, storage, on-board data processing, cost of hardware, communication and degree of autonomy for long term deployments (Gibb et al., 2019) . Advancement in technology has seen development of low-cost open-source single board computers and microcontrollers. These boards consume low power, can be interfaced with a wide array of sensors and have relatively sufficient processing capabilities. The boards are finding applications in off-grid data collection and machine learning tasks. Majority of microcontrollers lack an operating system (OS). These microcontrollers are programmed directly using binary/assembly code or binary code compiled from C-style languages. Single boards like the Raspberry Pi, the BeagleBone and the Jetson Nano have operating systems and can be operated like general purpose desktops (Fitzpatrick et al., 2020; Güven et al., 2017) . With TinyML technology algorithms and software that can run on the resource constrained open-access boards have been developed. In this paper we will describe how open-access boards are used for off-grid data collection and machine learning tasks.

2. OPEN ACCESS BOARDS

According to Tucson Amateur Packet Radio (TAPR) Open Hardware Licence, 'Open Hardware is a thing -a physical artefact, either electrical or mechanical -whose design information is available to, and usable by, the public in a way that allows anyone to make, modify, distribute, and use that thing' (TAPR.org). Over the years, several low-cost low-power open-source hardware projects have emerged leading to development of standalone systems that can perform computation processes without additional hardware. These hardware include microcontrollers like the Arduino and single board computers like the Raspberry Pi and they are finding applications in diverse fields from industries, homes, education, health and research (Güven et al., 2017) . The microcontrollers and single board computers form the heart of systems designed for data collection and processing. Manual data collection in outdoor field research can be time consuming, labour intensive, costly and irregular. However, using open-source hardware, the process can be automated leading to efficient data collection and even enable real time monitoring (Daniel K & Peter J, 2012) . Open source hardware boards with high processing capabilities and interfaces to connect with a wide array of sensors exist in the market. These features make them well suited for data collection and real time monitoring. The use of open-source hardware in remote data collection and real time monitoring is faced with various constraints that include: (1) source of power; (2) processing capabilities; (3) storage; (4) and communication requirements. The following subsections discuss how these hurdles are overcome while using open-access hardware for off-grid data collection and real time monitoring.

2.1. POWERING OPEN SOURCE BOARDS OFF-THE-GRID

For off-grid sensors deployment, alternative methods of powering the sensors need to be devised. Generally, the sensors are powered using batteries due to the intermittent nature of alternative sources like solar and wind (Prauzek et al., 2018; 2016) . Despite their ability to provide an uninterruptible power source, batteries are prone to depletion of charge after a given period of usage. By coupling the batteries with other sources of energy such as solar and wind, we can harvest energy from the surroundings. Open source boards have relatively low power requirements in the range of milliwatts to a few watts and they can be powered from DC sources like batteries. A photovoltaic and battery system can be used to power the open-source boards for long term deployments. The boards can also be programmed to have flexible operation schedules saving on energy requirements of the systems.

2.2. DATA COLLECTION, STORAGE AND PROCESSING

Most open-source boards can be interfaced with a wide array of sensors like temperature sensors, cameras, microphones and ultrasonic sensors. The sensors are interfaced using the general purpose input output (GPIO) pins, CSI ports or USB ports that come with the boards. Some boards like the Arduino also have on-board sensors. The sensors enable the boards to interact with the outside world and are used for data collection. The data collected by the boards can either be transmitted to a central hub or the cloud for storage or stored in storages such as SD cards and USB flash disks. Deployments of sensors lead to collection of large volumes of data. The data needs to be sorted, cleaned and labelled before it can be used for machine learning applications. These tasks are, however, difficult and time consuming. Onboard processing of data can be used to filter out unnecessary data and only save the appropriate data. This can be achieved by adding filters in the data acquisition pipeline. The filters are achieved by setting thresholds of certain parameters of the data that

