Certainly, most of us are already used to provide basic data, to handle some data, to gather specific data, to share data, to analyze data or to interpret data. Suddenly, we have just realized that the unit measure of the data handled in a given amount of time reaches the order of exabytes. This data is not only big in volume, but is also extremely diverse and it moves at incredible speeds.
What about the information in the Data? Who can have access to it? What can they do with it? – These are just some of the questions arising when speaking about Big Data.
Getting mobile, switching to the cloud, being active on the social media – all these behaviors engender creation, sharing and circulation of data. At this point, we doubt our understanding of the implications of this phenomenon and we will try to highlight some aspects related to it.
What does Big Data actually mean?
At first sight we can describe Big Data as very large and complex data sets, impossible or hard to handle with classic data processing tools. The expression itself is being used as it originated from English; we must note that French specialists are currently translating it as “grosses données” (big data) or “données massives” (massive data) or even “datamasse” (datamass) as in “biomass”. The novelty of the concept and the blurred definition lines prevent the localization of the term.
In 2012, Gartner (that has somehow contoured the term in the early 2000’s) has updated the definition: “Big data is high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization.“
The above definition outlines the dimesnions of Big Data – the well-known 3Vs – volume, velocity, variety. Yet, the great thing about this formulation is that it opens multiple perspectives on the Big Data concept. We may note a technology view, a process view and a business view.
The opportunities
When dealing with a completely new size level, the capture, the storage, the research, the distribution, the analysis and the visualization of data must be redefined. The perspective of handling big data are enormous and yet unsuspected!
It is often recalled the possibility to explore information shared in the media, to acquire knowledge and to assess, to analyze trends and to issue forecasts, to manage risks of all kind (commercial, of insurance, industrial, natural) and phenomena of all kind (social, political, religious, etc.). In geodynamics, meteorology, medicine and other explorative fields – big data is ought to improve the way the processes are being deployed and the data interpreted.
Big Data Management
One of biggest challenges at the time being is to build the proper tools and systems to manage big data. As real-time ore near-real time information delivery is one of the key features of big data analytics, the research aim to set-up data base management systems able to correspond to the new requirements.
The technology in progress involves the following:
Storage: For the storage and retrieval of data, the underlying NoSQL developments are best represented by MongoDB, DynamoDB, CouchBase, Cassandra, Redis and Neo4j. Currently they are known as the most performing document, key value, column, graph and distributed databases.
Software: The Apache Hadoop set counts Cloudera, HortonWorks and MapR. Their main goal is to expand the usage of big data platforms to a more diverse and capacious user range. Secondly these technologies focus on increasing the reliability of big data platforms, to enhance the capability of managing them and their performance features.
Data Exploration and Discovery: Big data analytic discovery is a hot research and innovation topic. Major developments have been done by Datameer, Hadapt, Karmasphere, Platfora or Splunk.
Operating Big Data
The hard talks also get to the question: who will define data categories? Who will structure the shape of analytics? Inevitably this will be a man-owned job and it will belong to the new “data scientists”. These will be prepared specialists able to handle data for a specific field. Some will have a certain business domain expertise able to ask the right questions, while others will have technology expertise able to understand the limitations of software and hardware.
The Big Data Readiness
At this point a few points must be stressed out: there are already active research initiatives on how to better handle complex, fast and large amount of data; the explosion of user generated data is due to our way of embracing social media platforms and M2M technologies; a new generation of specialists is on the rise; from an ethical and legislative point of view there are still many questions to be answered.
Probably the vast majority of people involved in the big data phenomenon do not understand it clearly yet. As it is still unclear what we are facing it is less probable to state that we are ready for it. (D.C.)