In this lecture, I will give you a brief introduction to open source software,
and then introduce QGIS for GIS,
PostGIS for Spatial DBMS, R for Spatial Data Analytics,
Hadoop for Big Data systems.
What is open source software?
It is a computer program of which,
source code is open and available to the public.
And it presents a license to provide the right to use,
study, change, and distribute the software.
There are different license types.
And it should be noted that,
open source software is open to use but sometimes, it is not always free.
For example, if you develop a commercial solution based on open source software,
you may have to pay for legal protection.
Except for such cases,
open source software can be freely available,
so that cost is the major and big benefit,
and also active update is given due to collaborative development.
Negative aspect is no maintenance and no support.
I'm not really a big fan of open source software, however,
for spatial data science and application,
open source software can work,
I can say that, and they present enough credibility.
QGIS, also known as Quantum GIS,
is the leading open source GIS software.
It can present basic functionality to collect,
store, query, analyze, and visualize spatial data.
In comparison with commercial GIS software,
it has some weakness in advanced spatial analysis.
However, the problem can be resolved with connection to other softwares.
For example, R for advanced data analytics
or even direct programming with Python console.
The figures are QGIS interface.
From the upper left corner to clockwise,
buffering is a simple geo-processing example,
k-nearest neighbor for spatial analysis,
programming using Python console,
and connection to PostGIS.
Now, spatial DBMS.
PostGIS is an open source spatial DBMS,
which is an extension on top of PostgresSQL.
PostgresSQL is an objective relational of DBMS in
which user-defined data types can be stored and managed in DB table.
PostGIS added spatial data types,
spatial functions for query and data management,
spatial indexing, and so on.
The origin of PostgresSQL and PostGIS goes back to Ingres,
the first relational DBMS from UC Berkeley.
With a long history of development,
PostGIS is a very solid and reliable open source software.
The figure shows Postgress interfaces from the upper left corner to clockwise,
command window, administrator window,
query window, and database table view.
Now, data analytics tool.
R is an open source software environment for
statistical and analytical computing and graphics.
Actually, it is undisputedly the leading open source software for data analytics.
It presents a simple and effective programming
language with built-in functions,
and also allows user to add new functionality so that there
are many packages built on R. For spatial analysis,
they are more than 100 packages available.
And SP, is one of the leading packages and the most popular packages.
The figures are R interfaces.
From upper left corner to clockwise,
the simple correlation analysis as a basic
statistical analysis and decision tree example,
DB table with connection to PostGIS and
Geo-visualization example with 'SP' package.
Now Big Data System.
As discussed in the previous lecture,
Hadoop ecosystem is designed for big data management and
processing based on Map-Reduce programming model.
Our problem is that,
Hadoop is not designed for spatial data of which values are spatially correlated,
which should be considered in data processing.
And spatial data structures are tightly woven to each other to present topology.
There are a few alternatives of Big Data systems
for spatial data such as Spatial Hadoop from University of Minnesota,
and GIS Hadoop from Emory University.
Without respect to such efforts,
based on my experience,
Hadoop framework would work only for independent and basic operations,
such as noise removal and data pre-processing.
But management of Spatial Big Data in Hadoop framework would have many issues.
As mentioned before, Hadoop ecosystem provides additional tools
to facilitate processing and management of Big Data such as Hive,
Hbase, and yarn and many others.
Additionally, Python and Java can directly
access HDFS and process Big Data, or even Spatial Big Data.
R can also integrated
with Hadoop framework for direct analysis of Big Data in Spatial Big Data.
The figure illustrates examples of Hadoop processing.
From upper left corner to clockwise,
data node information after Hadoop configuration,
HDFS window, job tracking of MapReduce, Hive query result ,
in command line console.
Now, the Integrated Framework became more
realistic and filled up with softwares in each disciplines.
There are all open source softwares,
QGIS for GIS again,
PostGIS for spatial DBMS,
R for spatial data analytics,
and Hadoop framework for Big Data system.
Throughout my courses, I will give you hands on experience of each software,
and you will make use of them for your assignments and projects.