We are presently living in a world ruled by data. Anything and everything we do is recorded in binary coded, tiny information bits. The trend is so rampant that we have companies recording terabytes of data without actually knowing why, what and how the data is going to be used.
One of the major factor contributing to this alarming trend is availability of cheaper storage. March this year, Google announced a cold storage called NEARLINE; a competitor to Amazon’s GLACIER. A cold storage is a cloud-based version of an old-fashioned way to store massive amounts of data; too important to delete but not important enough to keep close. Hence, the data isn’t immediately available, usually because it’s stored on tape drives or other media and filed away in a vault somewhere.
The scientists at UC San Diego estimate that by 2024, the world’s enterprise servers will annually process the digital equivalent of a stack of books extending more than 4.37 light-years to Alpha Centauri, our closest neighboring star system in the Milky Way Galaxy. (Each book is assumed to be 4.8 centimeters thick and contain 2.5 megabytes of information.).
However, this brings us to another question; how useful is the collated data actually to the users? The data only becomes useful when it provides real value to the consumers, citizens and companies i.e. the data should start making sense. However, in real sense, we are far from it. We have barely begun to scratch the surface of the data iceberg. The real analytics and mining with respect to the data iceberg needs to deal with the underwater part of the iceberg; the unstructured and the dirty data.
While the exponential growth in storage of data is well known, the data becomes more important when it gets actively processed by the world servers signifying deliverance of meaningful data to the users. As the capacity of servers to process this data increases, the development itself will brings in a lot of challenges and opportunities for the data scientists and the data community on the whole.