Because I worked at an IOT company for a long time, I came across spatio-temporal big data. At the end of 2018, we were reverse-geocoding 2 billion vehicle corrdinates per month with a static Turkey data and OSM Europe data on PostGIS. Apart from processing this big data instantaneously, we did not do much visual analysis.
At late 2016, a customer requested to create a heat map with last 1-month coordinates of their vehicles. It was a cargo company and they would use this data to find regions that need to open new branches or regions where it should move existing branches. There was a simple logic that there should be branches in dense areas on the heat map. We have visualized nearly 1 million of location data. After some trials I decided to call the data on Leaflet and show it with the heatmap plugin. In this process, I also tried Mapserver and Geoserver on the server side but a solution on front end would have been simpler. And also tried Fusion Tables, which Google no longer supports. But we did not find it appropriate to share the customer location with another service provider.
On my next job, at Tübitak, a chance occured that a team of us could search alternatives of Postgis on nosql side and visualize large geospatial data sets on web.
I have been reading on a few blogs how to handle geographic solutions on nosql side. On the other hand, I was following location based applications among the many products developed by Uber. We set out to see we can develop an application to blend them.
While dealing with these issues, many asked a question, “Is the data that you have really big?” You may also hear from dudes who deals with database solutions. “Each customer claims that their data is big. For some, 200 MB of data for some terabytes”.
Generally 3 features can be defined to call the data as big. (A definition here; https://www.oracle.com/big-data/what-is-big-data.html). Volume, velocity and variety. The size of the data volume can be tens of terabytes of data for some organizations, and hundreds of petabytes for another. So as to talk about velocity I may say data production speed is increasing every day. Namely dealing with live data especially social media or iot may face you struggle with streaming data. The last issue is data detail. This consists of data that does not have a definable structure such as database, schema, table, column like on relational databases.
If the story side is enough, let’s talk about what we have found and used on database and front end side.
Since we worked with Postgis before, we used methods for recording, indexing and accessing data we already know. Works that we have done can be found on Postgis frequently asked questions. Indexing, using operators & st_dwithin, st_intersects, etc.
On the nosql side we had GeoMesa and GeoWave. There is a perfect comparison document of these tools; https://github.com/azavea/geowave-geomesa-comparative-analysis/blob/master/docs/report.md
Two quotes made us choose Geomesa. “GeoMesa API easier to work with”, “Geomesa’s documentation is more clear”. On Osgeo web site it is said; GeoMesa is an open-source, distributed, spatio-temporal database built on a number of distributed cloud data storage systems, including Accumulo, HBase, Cassandra, and Kafka. Leveraging a highly parallelized indexing strategy, GeoMesa aims to provide as much of the spatial querying and data manipulation to these key-value stores as PostGIS does to Postgres.
As I see Geomesa examples are mainly on Accumulo. But for us, Cassandra installation and code development was easier, so we continued with Cassandra. Gdelt data can be used with example codes.
See this presentation about Geomesa that I came up with while I was at Foss4g 2019 Bucharest. https://media.ccc.de/v/bucharest-219-a-geo-spatial-big-data-infrastructure-for-asset-management
We used Postgis and Geomesa for storing spatial data. But if there is a geographical object on your hand, it is necessary to visualize it on the map. Grid, bubble aggregation, scatter maps, heat maps, timeline bars are required in this case.
Uber has great solutions on web side. First I met Deck.gl. WebGL-powered framework for visual exploratory data analysis of large datasets. Examples are mainly on maps. Uses Mapbox as base layer. Also they visualize point cloud data. My article on this subject; https://medium.com/@ibrahimsaricicek/visualizing-point-cloud-3d-data-on-web-8f9792385e68
There is also Kepler.gl. It was built with Deck.gl, Kepler.gl utilizes WebGL to render large datasets quickly and efficiently. It is for geospatial data analysis and allows technical and non-technical users to visualize trends in a city or region. It’s important to state that it visualizes a large amount of location data on browsers. You can see below some analysis with Kepler.gl on our application with ~11k earthquake data of Turkey.
Besides Kepler.gl we also tried one more application on our interface. This is from China, although it is not well known is really useful; AntV. There are various graphic solutions not only for maps but also for data visualization. AntV L7 Maps are also uses Webgl and work as well as deck.gl. But doesn’t have a ready to use interface like Kepler.gl. Heat map, grid, bubble aggregation and scatter mapping methods are also very similar. Documentation and examples are sufficient. Uses Mapbox and GaodeMap (what I have seen on their code examples) as base layer. A few images from websites are below.
Below, again with Turkey earthquake data, this time using AntV.
At this point, let’s move on to charts. It is necessary to clean and shrink big data in an appropriate and visualizable manner on the server side for graphical representations. This will be the point we will go on while developing our projects. Many effort is required to display billions of data in a bar chart of 5 tabs. I assume that you have already done this point.
Although there are fewer features than D3js, ChartJs is really useful for creating charts while developing less code. There is also a React wrapper for Chart.js, react-chartjs-2. Here are a few examples of charts from their web site.
One last word, Beside map solutions AntV charts and other graphic solutions seem strong. In particular, I will try and recommend using AntV G2Plot instead of ChartJs.
Let me end with a close like in the old days.