DOI: 
10.22389/0016-7126-2018-935-5-54-63
1 Maiorov A.A.
2 Materuhin A.V.
3 Kondaurov I.N.
Year: 
№: 
935
Pages: 
54-63

Moscow State University of Geodesy and Cartography (MIIGAiK)

1, 
2, 
3, 
Abstract:
Geoinformation technologies are now becoming “end-to-end” technologies of the new digital economy. There is a need for solutions for efficient processing of spatial and spatio-temporal data that could be applied in various sectors of this new economy. Such solutions are necessary, for example, for cyberphysical systems. Essential components of cyberphysical systems are high-performance and easy-scalable data acquisition systems based on smart geosensor networks. This article discusses the problem of choosing a software environment for this kind of systems, provides a review and a comparative analysis of various open source software environments designed for large spatial data and spatial-temporal data streams processing in computer clusters. It is shown that the software framework STARK can be used to process spatial-temporal data streams in spatial-temporal data streams. An extension of the STARK class system based on the type system for spatial-temporal data streams developed by one of the authors of this article is proposed. The models and data representations obtained as a result of the proposed expansion can be used not only for processing spatial-temporal data streams in data acquisition systems based on smart geosensor networks, but also for processing spatial-temporal data streams in various purposes geoinformation systems that use processing data in computer clusters.
The results were obtained as part of the state task of the Ministry of Education and Science of Russia (number for publications: 5.6972.2017/8.9)
References: 
1.   Maiorov A. A., Materukhin A. V. Geoinformatsyonnye aspekty razrabotki informatsyonno-izmeritelnykh sistem na baze raspredelennykh setei intellektualnykh geosensorov. Izvestiya vuzov. Geodeziya i aehrofotosˮyomka, 2017, no. 6, pp. 106–109.
2.   Materukhin A. V. Sistema tipov dlja potokov prostranstvenno-vremennykh dannykh v vide rasshirennoj signatury mnogosortnoj algebraicheskoj sistemy. Izvestiya vuzov. Geodeziya i aehrofotosˮyomka, 2017, no. 2, pp. 121–125.
3.   Tanenbaum A. S., Van Steen M. Distributed Systems: Principles and Paradigms (1st ed.). NJ, USA: Prentice Hall PTR, Upper Saddle River, 2001, 880 p.
4.   Aji A., Sun X., Vo H., Liu Q., Lee R., Zhang X., Saltz J., Wang F. (2013) Demonstration of Hadoop-GIS: a spatial data warehousing system over MapReduce. In Proceedings of the 21st ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (SIGSPATIAL’13). ACM, New York, NY, USA. pp. 528–531. DOI: 10.1145/2525314.2525320.
5.   Apache Software Foundation. URL: www.apache.org
6.   Apache Hadoop. URL: hadoop.apache.org
7.   Apache Spark. URL: spark.apache.org
8.   Borthakur D., Gray J., Sarma J. S., Muthukkaruppan K., Spiegelberg N., Kuang H., Ranganathan K., Molkov D., Menon A., Rash S., Schmidt R., Aiyer A. (2011) Apache hadoop goes realtime at Facebook. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data (SIGMOD ‘11). ACM, New York, NY, USA. pp. 1071-1080. DOI: 10.1145/1989323.1989438.
9.   Borthakur D. (2010) Petabyte scale databases and storage systems at Facebook. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD ‘13). ACM, New York, NY, USA. pp. 1267-1268. DOI: 10.1145/2463676.2463713.
10.   Eldawy A. (2012) Parallel Secondo: Boosting database engines with Hadoop. In IEEE 18th International Conference on Parallel and Distributed Systems (ICPADS), Singapore, December 17–19. no. 2012, pp. 738–743. DOI: 10.1109/ICPADS.2012.119.
11.   Eldawy A., Mokbel, M. F. (2013) A demonstration of SpatialHadoop: an efficient mapreduce framework for spatial data. Proc. VLDB Endow. 6, 12 (August 2013). pp. 1230–1233. DOI: 10.14778/2536274.2536283.
12.   Eldawy A. (2014) SpatialHadoop: towards flexible and scalable spatial processing using mapreduce. In Proceedings of the 2014 SIGMOD PhD symposium (SIGMOD’14 PhD Symposium). ACM, New York, NY, USA. pp. 46-50. DOI: 10.1145/2602622.2602625.
13.   HBase. URL: hbase.apache.org
14.   Hagedorn S., Gö tze P., Sattler K.-U. (2017) The STARK Framework for Spatio-Temporal Data Analytics on Spark. In: Mitschang B., Nicklas D., Leymann F., Schöning H., Herschel M., Teubner J., Härder T., Kopp O., Wieland M. (Hrsg.). Datenbanksysteme für Business, Technologie und Web (BTW 2017), Gesellschaft für Informatik, Bonn, pp. 123-142.
15.   Nishimura S., Das S., Agrawal D., Abbadi A. E. (2013) MD-HBase: design and implementation of an elastic data infrastructure for cloud-scale location services. Distrib. Parallel Databases. Vol. 2, Volume 31, pp. 289–319. DOI: 10.1007/s10619-012-7109-z.
16.   The GIS Tools for Hadoop are a collection of GIS tools for spatial analysis of big data. URL: github.com/Esri/gis-tools-for-hadoop
17.   Thusoo A., Sarma J. S., Jain N., Shao Z., Chakka P., Anthony S., Liu H., Wyckoff P., Murthy R. (2009) Hive: a warehousing solution over a map-reduce framework. Proc. VLDB Endow. vol. 2, no. 2 (August 2009). pp. 1626–1629. DOI: 10.14778/1687553.1687609.
18.   Spark Streaming Programming Guide. URL: spark.apache.org/docs/2.2.0/streaming-programming-guide.html
19.   Yu J., Wu J., Sarwat M. (2015) GeoSpark: a cluster computing framework for processing large-scale spatial data. In Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems (SIGSPATIAL ‘15). ACM, New York, NY, USA, Article 70. 4 p. DOI: 10.1145/2820783.2820860.
20.   Yu J., Wu J., Sarwat M. (2016) A demonstration of GeoSpark: A cluster computing framework for processing big spatial data. In 2016 IEEE 32nd International Conference on Data Engineering, ICDE 2016, 1410–1413 [7498357] Institute of Electrical and Electronics Engineers Inc, DOI: 10.1109/ICDE.2016.7498357.
21.   You S., Zhang J., Gruenwald L. (2015) Large-scale spatial join query processing in cloud. In. ICDEW. pp. 34–41. DOI: 10.1109/ICDEW.2015.7129541.
22.   Zaharia M., Chowdhury M., Franklin M. J., Shenker S., Stoica I. (2010) Spark: cluster computing with working sets. In Proceedings of the 2nd USENIX conference on Hot topics in cloud computing (HotCloud’10). USENIX Association, Berkeley, CA, USA. pp. 10.окт.
23.   Zaharia M., Chowdhury M., Das T., Dave A., Ma J., McCauley M., Franklin M. J., Shenker S., Stoica I. (2012) Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation. USENIX Association, Berkeley, CA, USA. pp. 15–28.
Citation:
Maiorov A.A., 
Materuhin A.V., 
Kondaurov I.N., 
(2018) Using computer clusters for processing spatial-temporal data streams in data acquisition systems. Geodesy and cartography = Geodezia i Kartografia, 79(5), pp. 54-63. (In Russian). DOI: 10.22389/0016-7126-2018-935-5-54-63