Big data <> Open data
“Big data” is one of those other trending topics. You could link it to “open data” I blogged about some time ago. The idea behind open data is that different organisations, companies, … give others access to their data so it becomes available to developers who can build something with it. But you can imagine that all those different databases will result in one gigantic data source.
Big data is also the name for large data sets generated by one source or device which collects lots of information and stores it for later use. Take for instance the new Boeing 787s. This plane will create half a terabyte of data per flight. All those sensors and devices on-board the plane collect lots and lots of information during the flight and this information can later be used to adjust fuel consumption, improve the maintenance, etc.
Read more about this airplane here:
An other example is what some TV stations are doing by using twitter messages to predict the popularity of election candidates. For that process you not only have to calculate the number of tweets for every candidate, but have to take into account if those tweets are pro or contra that person.
And the amount of available data is exploding. Apparently 90% of all the data in the world has been created in the last two years, according to IBM.
Those are only a few examples of “big data”, but on this site you will find 25 definitions for it, of which I want to highlight two:
- “Big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. The challenges include capture, curation, storage, search, sharing, transfer, analysis, and visualization.” Cited from Wikipedia
- “The definition of big data? “Who cares? It’s what you’re doing with it,””. Cited from 3/2013 FCW article, quoting Bill Franks.
As more and more devices are connected (see also the web of things) the amount of collected data will keep growing and there are many challenges in front of us. Not only will we need powerful devices and software to store all that data. We will also need ways to transport that data to a central system if we won’t to combine them. And of course the knowledge and tools to analyse and get valuable feedback out of all those data records.
With all that new data that is generated daily, it think Data Scientist is one of the jobs of the future…