Data without limits

Werner Vogels, CTO of Amazon, gave a fascinating talk on ‘Data without Limits’ to a packed house at NEXT11. The essential premise of his presentation was that, when it comes to data, bigger is better. And, furthermore, that Big Data could be described as the collection and analysis of large amounts of data to gain competitive advantage.

However, the idea that “the more data we have, the more effective we can be” creates its own set of problems. How and where can it all be stored? Vogels cited the example of Razorfish, a large data analysis company. Razorfish was amassing so much data that, even though they were constantly purchasing new storage hardware, their equipment couldn’t cope. As a result, they were unable to take on any more clients, and had become more focussed on data storage than data analysis, which was their core skill.

The situation was resolved when Razorfish handed their entire storage operation over Amazon. Amazon’s cloud facility, aside from offering virtually limitless storage capacity, can also help companies with data collection, organisation, analysis and sharing, all in a totally secure environment. Furthermore, Amazon’s service also has a human element – known as Mechanical Turk. Teams of specialists, among their many other duties, scrutinise data to avoid repetition, maintain consistency and accuracy while assigning metadata to plain text, thus reducing the level of ‘dirty data’.

Finally, Amazon’s cloud concept allows the sharing of an application or data set so that it is accessible by anyone and can be mixed and matched with your own data sets, which is particularly useful in a public information context.