A Year Acquiring and Publishing Drone Aerial Images in Research on Agriculture, Forestry, and Private Urban Gardens

Precision farming technologies aim to optimize the use of farming inputs both spatially and temporally for improved economic outcomes and reduced environmental impacts of farming. In precision farming, a field is considered as a heterogeneous entity with variable topography, soil properties, weed infestation, and yield potential, whereby management practices are tailored spatially and temporally (Finger et al., 2019). Precision farming thus strongly relies on site-specific sensing of variables that are essential for management decisions. Georeferencing techniques and spatial mapping are important elements in precision farming. Introduction


Introduction
The development of digitalization and measurement technologies in recent decades has enabled digital devices and sensors to produce huge amounts of data that has great potential in optimizing processes related to production chains or service production. In the field of bioeconomy, the main production processes are related to food and biomass production. Digitalization provides a wide variety of opportunities to support, manage, and monitor production based on data collected from the field.
Image-based data collection and analysis provides a huge potential to support these goals. Visual data collected from agricultural fields enables automated analysis tasks and can provide real-time information on production status. To acquire suitable visual information, basically two different alternatives exist: Drone imaging has been shown to have increasing value in monitoring and analysing different kinds of processes related to agriculture and forestry. In long-term monitoring and observation tasks, huge amounts of image data are produced and stored. Environmental drone image datasets may have value beyond the studies that produced the data. A collection of image datasets from multiple data producers can, for example, provide more diverse training input for a machine learning model for vegetation classification, compared with a single dataset limited in time and location. To ensure reproducible research, research data such as image datasets should be published in usable and undegraded form, with sufficient metadata. Timely storage in a stable research data repository is recommended, to avoid loss of data. This work presents research datasets of 2020 drone images acquired from agricultural and forestry research sites of Häme University of Applied Sciences, and from Hämeenlinna urban areas. Those images that do not contain personal data are made freely available under a Creative Commons Attribution license. For images containing personal data, such as images of private homes, privacypreserving forms of data sharing may be possible in the future.
A single observation that is inconsistent with some generalization points to the falsehood of the generalization, and thereby 'points beyond itself'. Ian Hacking Philosopher of science Spatial information can be collected with scanners mounted in tractors (Pallatino et al., 2019) or using satellite imagery (Segarra et al., 2020).
As well, drones have increasingly been used to collect data on several features relevant for precision agriculture (Tsouros et al., 2019). Compared with satellite-based remote sensing, drone technology can produce images with considerably higher spatial resolution in the centimeter range. Also, the temporal resolution of drone-based imagery can be decided by the user, which leads to flexibility in comparison with satellite data. Drones have also been used for research purposes in monitoring field experiments (Viljanen et al., 2018, Dehkordi et al., 2020. Non-destructive monitoring of vegetation is a major benefit for practical agronomy as well as for research use. Drone aerial imaging has been utilized in a wide range of agricultural applications. Some of the most common applications of drone imaging in precision agriculture are weed mapping and management, vegetation growth monitoring and yield estimation, vegetation health monitoring, and irrigation management (Tsouros et al., 2019). Imaging has been used in monitoring many vegetation traits, for example, biomass amount (ten Harkel et al., 2020), nitrogen status (Caturegli et al., 2016), moisture and plant water stress status (Hoffmann et al., 2016), temperature (Sagan et al., 2019), and various vegetation indices (Viljanen et al., 2018). Deep learning-based prediction of crop yield from drone aerial images has also shown promising results (Nevavuori et al., 2019, Nevavuori et al., 2020. In the field of forestry, drone-based photogrammetric methods can be used in several different ways. The methods used may provide general forest inventory data that focuses on common stand variables such as volume and height (Tuominen et al., 2017). Practical forest planning in Finland based on drone-collected photogrammetric data is rapidly advancing and is currently being piloted. Drones have proven especially useful in detection and inventory of various forest damage areas, such as windthrow areas (Mokros et al., 2017, Panagiotidis et al., 2019 and bark beetle outbreak areas (Näsi et al., 2015, Briechle et al., 2020. Drones have been successfully used in various forest fire suppression and prevention tasks for several years (Ollero et al., 2006, Akhloufi et al., 2020. Increasing demand to safeguard forest biodiversity has also encouraged the use of photogrammetry-based methods. These methods have proven to be a useful inventory tool, when important structural factors such as keystone species like aspen (Viinikka et al., 2020), standing dead trees (Briechle et al., 2020), or coarse woody debris (Thiel et al., 2020) are located in a forest landscape.
In the area of private urban gardening, drone-based imaging may provide new approaches to monitor the effects of gardening practices on the vegetation and on carbon sequestration. In low-density housing areas, the surface coverage pattern is typically very diverse, consisting of numerous individual plots and gardens. Homeowners reshape and modify private domestic gardens based on personal preferences and individual gardening practices. The role and meaning of vegetation and gardening practices vary, resulting in plot-to-plot variations in carbon sequestration and evapotranspiration, which affects stormwater management and the degree of reduction in the urban heat island phenomenon. Approaches to sustainable urban development have put an increasing interest in low-density housing areas that cover large areas in cities. A single plot is not the main focus, but rather the entity they form together. This raises the challenge to find suitable methods for easily studying the on-going changes at multiple scales to provide data both on the quality and quantity of vegetation. Plot and block scale choices and elements define housing area scale attributes.
This article describes vegetation monitoring-related aerial image acquisition by Häme University of Applied Sciences (HAMK) using drones, in 2020, both the processes and experiences gained. Apart from the image data, the main research in three areas is to be published separately. In total, approximately 200,000 image files, approximately 1 TB in size, are in the process of being published openly (see Data Availability). Particular features of the presented datasets are including the original image files, using a multispectral camera, and that one of the research sites, Mustiala biochar field, was imaged several times over the growing season with some near-simultaneous satellite imagery available from public sources. Based on a search by the authors using Google Dataset Search (https://datasetsearch.research.google.com/) at the time of writing, drone aerial image datasets are rapidly increasing in number, but are typically orders of A Year Acquiring and Publishing Drone Aerial Images in Research on Agriculture, Forestry, and Private Urban Gardens Olli Niemitalo, Eero Koskinen, Jari Hyväluoma, Outi Tahvonen, Esa Lientola, Henrik Lindberg, Olli Koskela, Iivari Kunttu magnitude smaller than the datasets presented in this work and do not usually include the original image files, although they may be available upon request. In this article, we also discuss the benefits and challenges of publishing drone image datasets.

Methods
Drone aerial image datasets were collected from multiple sites in Kanta-Häme, in southern Finland. Three of the sites ( Fig. 1) are presented in this work: 1. Mustiala biochar field (Fig. 6) is located at the Mustiala educational and research farm of Häme University of Applied Sciences, in Tammela (60°49' N, 23°45'24'' E). The biochar experiment consists of 10 adjacent plots, each of size 10 m × 100 m (1000 m²), and with a total area of 1 ha. Biochar soil amendment was applied on five of the plots at a rate of ca. 20 t/ha. The other five plots were control treatments without biochar amendment. Ground control points (GCPs) were placed covering the field in a roughly 100 m × 100 m die face-5 pattern (see Fig. 6). GCPs were 29 cm × 29 cm sized black-and-yellow 2x2 checkerboard-style cut-outs of A3 prints. They were georeferenced with the aid of a real-time kinematics (RTK) capable Trimble Geo 7X (H-Star) hand-held GPS receiver.
2. Evo old forest ( Fig. 3) consists of seven separate stands all located in the Evo state forest. The stands, with a pooled area of 160 ha, are dominated by mature Norway spruce (Picea abies) with an age range of 80-120 years. All stands have a rather high amount of dead standing trees, known to be important for biodiversity, for example, for cavity-nesting birds. The standing dead trees were catalogued in 2019-2020 to function as reference data for photogrammetric methods.
3. Hämeenlinna private urban gardens consist of approximately 5-10 domestic gardens in the sparsely populated urban areas of Hämeenlinna.
The drone used in all of the imaging missions was a DJI Matrice 210 RTK V2 quadcopter camera drone. The camera payload for each mission was selected (see Fig. 2) from the following cameras:  Figure 6 were converted to the sRGB color space in GIMP version 2.10.18 by assigning an sRGB gamma=1 color profile, by adjusting brightness in the curves tool using a linear ramp crossing the origin, and by converting to sRGB color profile using a relative colorimetric rendering intent. Geographical illustrations were made in QGIS version 3.12.2 (QGIS Development Team 2020).
For a comparison with the aerial images, a Sentinel-2 satellite image ( Fig. 6) of the Mustiala biochar field was manually selected and retrieved from the Copernicus Open Access Hub (Copernicus Sentinel Data 2020) for a cloud-free day that coincided with a drone imaging mission on 2020-05-22. For Figure 7, machine learning image segmentation of sRGB-color space images in 0.1 m / pixel resolution was done using the DroneDeploy Aerial Segmentation Benchmark U-Net model "keras baseline" run gg1z3jjr by Stacey Svetlichnaya (DroneDeploy 2019), using a tile size of 300 × 300 pixels.

Results
The GCP location data and most of the acquired aerial images are being made publicly available (see section Data Availability). Figure 3 shows the camera locations for all individual images taken during the Evo old forest imaging missions. Figure 4 shows a sample image from each camera from a Mustiala biochar field imaging mission on 2020-05-22. Before that flight, an image was taken of the calibrated reflectance panel (Fig. 5). Near- Figure 2. The     simultaneous drone and satellite imagery from that day are shown for comparison in Figure 6. A single image and an orthomosaic built from multiple images are presented for comparison in Figure 7, together with their machine learning image segmentation. For the Mustiala biochar field, the location of the GCPs were resolved with a horizontal and a vertical accuracy of 0.1 m, as reported in the shapefile from the hand-held RTK GPS receiver.

Discussion and Conclusions
We encountered many technical challenges during data collection. Interoperability problems between drone and camera systems from different manufacturers prevented camera triggering by the drone and tagging images with high-accuracy RTK GPS information. Much consideration should be given to the process of publication of research data from high-resolution aerial imaging missions. Aspects to consider include data storage and availability, software compatibility, the rights of data producers, the rights of possible data subjects, license agreements of processing software, and others.
Open publication of research data improves the reproducibility of science, and reduces barriers to participating in science and utilizing scientific data and results, especially for under-resourced and underrepresented participants. For academic data producers, the growing recognition of data as research output (see San Francisco Declaration on Research Assessment insufficient to answer all questions arising in later interpretation of the collected data, which resulted in increased work wrangling the data. It would be advisable for further work going forward to complement automatic logs with notes about the settings used and both operator intent and actions.
Selecting the flight parameters turned into something of an "art in itself". Flight speed, altitude, side and frontal overlap, camera orientation, and triggering mode significantly influenced the image results. For example, flight altitude requires a compromise between image resolution and flight time. Lowering the altitude increases the flight time, which is accompanied by possible in-flight battery depletion, making further use of the images more complicated by more dynamic scenes.
For photometric applications, rapidly changing light conditions can be a problem even for short flights. As well, movement from the combination of windy weather and vegetation invalidates the assumption of a and buildings managed by a private person, or by their family. GDPR requires that before personal data can be handled for a purpose, the data subject's voluntary consent for that purpose must be obtained. A data subject also has the right to be forgotten, that is, to bindingly request that their personal data be erased without excessive delay. National laws may still allow legitimate scientific research that protects data privacy. Nevertheless, a data subject's rights may bar redistribution or open publication of personal data under an irrevocable license, such as any of the popular Creative Commons licenses, possibly even when a data subject proactively authors the data as free speech.
The image dataset concerning Hämeenlinna private urban gardens was collected with informed consent by the data subjects, the homeowners. It was deemed necessary to deposit the data in a fully closed manner, at least for the time being, while the legal landscape of personal data is still developing. The EU has proposed a Data Governance Act (COM/2020/767 final) following a European Strategy for Data (COM/2020/66 final). The Data Governance Act defines roles and mechanisms for altruistic data sharing in managed data ecosystems. Among other things, the act is intended to streamline handling of requests by authorized users to access data, on the condition that the data subject has consented to handling of the data for the requested purpose.
Returning to our case study, unlike the presented aerial images captured mostly 80 m above ground, images from a closer range (Fig. 5) would allow distinguishing not only trees, but also individual small plants. Likewise, it would be possible to identify and count the plants and analyze their physical characteristics (allometry) and health.
Drone image data is being increasingly utilized in machine learning, as exemplified by research cited above in the Introduction. The purpose of a machine learning model might be, for example, to segment an input image into different class labels (Fig. 7). Class label masks of drone imagery could be converted to lowerresolution ground truth class density data for interpreting satellite imagery. In semi-supervised learning, unlabeled images would also be included to help the model better capture the natural joint probability distribution of the images and segmentation. Such uses make general-use unlabeled data valuable as a research output, thereby complementing the existing 2012) may bring financial incentives to more widely publish research data. Incentives by science funders may be applied retroactively. A major practical reason to publish research data is the possibility that a dataset may have tremendous utility value outside the research project, far beyond the organization that produced it. Many possible later uses of data cannot be anticipated at the time of its creation and collection.
Not all research data that is openly published remains available. As an example, Khan et al. (2021) were only able to retrieve 94 out of 121 open-access medical ophthalmology imaging datasets. Storing research data in an established repository such as Zenodo (https://zenodo.org/) gives a level of guarantee of data longevity. It also allows obtaining a Digital Object Identifier (DOI) for sharing and citing the data. Upon a recent successful storage quota application by HAMK, the image datasets presented in this article are being stored and published in Fairdata services, funded by the Ministry of Education and Culture (Finland). Publication of research data in a repository effectively forces the storage of the data together with metadata describing the data, ownership of the data, and its usage license. The additional information resolves many ambiguities when using the data. Structured metadata in repositories ensures the dataset is indexed in research databases.
A repository may also allow incremental publication of data. If publishing the data takes place this way, already during its collection instead of at the end of a research project, then use of and citation of the data outside of the data producing organization can start much earlier. Accelerated publication also ensures that the data or information of concern is not lost during the project or when personnel leave the project. Academic data producers typically have an interest in priority publication of their research. Early publication of general-use research data, such as the image datasets presented in this article, is less likely to conflict with that interest, compared to early open publication of all data vital to the main publication.
Location data from an image or other information about a person or object that can be associated with a person, either directly or using additional information, is likely to be considered as "personal data" by the EU's General Data Protection Regulation (GDPR). Examples of such objects in aerial imagery include vehicles, land, A Year Acquiring and Publishing Drone Aerial Images in Research on Agriculture, Forestry, and Private Urban Gardens Olli Niemitalo, Eero Koskinen, Jari Hyväluoma, Outi Tahvonen, Esa Lientola, Henrik Lindberg, Olli Koskela, Iivari Kunttu Compression artifacts that arise from lossy image compression methods such as JPEG might interfere with radiometric analyses. On the other hand, compared to lossy compression, lossless image compression results in significantly larger image files, increasing the cost of storage and transfer. Storage space requirements of image datasets could be eased somewhat by re-encoding lossless TIFF files, using a more efficient lossless method than what is available from the camera. However, rewriting the image files might also detrimentally affect metadata or other extra data, reducing the usability of the files. Image file metadata could be restored by transferring it from the original file using tools such as ExifTool and Exiv2. Usability of any modified original files should be tested at least in the most likely processing pipelines available. Rewriting of image files may be necessary to mask out personal information.
For geographical image data, it is important to also publish metadata describing the radiometric quality and the location accuracy. Information about things that affect illumination, such as clouds and sun, as well as calibration images and light sensor data, can be important for future users. Processing pipelines should be described, and when possible, source code and operation environment or information likewise included. For more information on quality assurance data and other important metadata, see Aasen et al. (2018) and Tmuši et al. (2020).
The image datasets presented in this work consist mainly of unaltered raw images directly from cameras. Publishing the raw data without embargo also became an effective way to distribute the data within the HAMK organization, as well as outside it. Another rationale behind the decision to publish not only post-processed data products, but also raw primary data, was that data users might wish to apply their own processing pipelines to ensure uniform processing of all their input data. An example of a data product is an orthomosaic (Fig. 6), which is straightforward to use in various applications and valuable in providing a visual overview of data. As demonstrated in Figure 7, the visual clarity of an orthomosaic might not be quite as high as that of the source material, the individual images, with differences that affect labeling by a machine learning model. In the future, novel photogrammetry pipelines will likely result in data products of higher quality than what is achievable using today's tools -if the raw data is still available.
situation and future of labeled and unlabeled data. Machine learning also benefits from more diverse data, for example, from image datasets collected at an ecologically diverse set of locations. Big data in such cases can come from many small data.
Data producers may have reasons not to disclose their raw data. In such cases, other, somewhat futuristic forms of information sharing may be possible. In federated learning, data holders combine their efforts in a coordinated fashion to train a shared machine learning model, without communicating their original data. Alternatively, a generative model could be used to collect non-private artificial samples from the approximate distribution of the original data, while preserving the privacy of the original data points, typically measured by differential privacy. Depending on the amount of original data and required degree of privacy, generated samples may be of sufficient quality to be used similarly to the original data. For a privacypreserving generative method suitable for images, see Chen et al. (2020).
The author of a dataset consisting of drone images may wish to limit data use. A photograph taken for scientific purposes by equipment under automatic control is unlikely to be considered as creative work and would not be protected by copyright as such. In the United States, a dataset can be copyrighted as a compilation if it is sufficiently original in selection, coordination, or arrangement, but the copyright of the dataset does not extend to any non-copyrightable data items (U.S. Copyright Office, 2021). Similarly, for EU-based authors, national implementations of the Database Directive (96/9/EC) enable copyright of a dataset as a creative collection. Separately from copyright, the directive enables sui generis protection of non-creative datasets based on substantial investment, with a 15year term of protection (European Commission, 2018). In any case, if a separate contract is made between the dataset holder and its retriever, binding clauses in the contract may limit redistribution and use of the dataset by the retriever. For example, commercial use of a dataset by its retriever may be prohibited.
Legal aspects aside, when collecting and sharing data, care should be taken to ensure that the data will be delivered in a format that preserves sufficient quality, preferably in a format that is openly standardized, that is, not a proprietary, software-specific format. Henrik Lindberg (M.Sc., Forestry) is a senior lecturer in HAMK whose field is forest ecology and silviculture. In his research activities, he has focused especially on nature management, forest biodiversity, and forest restoration.
Esa Lientola (Master of Natural Resources, Forestry) is a senior lecturer in forestry at HAMK, who specializes in remote sensing, forest planning, and GIS-applications of forestry. In recent years, he has concentrated particularly on developing the practical use of drones for the study of natural resources.
Olli Koskela is currently working as a research manager at Häme University of Applied Sciences with a data science team. His research areas include many bioeconomic processes, such as dairy production, feed quality management, and soil maintenance. He holds a Master of Science in applied mathematics from Helsinki University, and is currently finalizing his PhD thesis in the field of biomedical engineering at Tampere University. A Year Acquiring and Publishing Drone Aerial Images in Research on Agriculture, Forestry, and Private Urban Gardens Olli Niemitalo, Eero Koskinen, Jari Hyväluoma, Outi Tahvonen, Esa Lientola, Henrik Lindberg, Olli Koskela, Iivari Kunttu Viljanen, N., Honkavaara, E., Näsi