Tuesday, May 11, 2010

Facebook has the world's largest Hadoop cluster

I like it when different themes are beginning to converge. Earlier this year, I wrote the importance of mining enterprise data. But how do you organize, index, and search a large amount of mostly unstructured data (ie emails, blogs, documents, web pages, logs, videos)? Here comes Hadoop. This is a very good video that explains Hadoop as a new data platform.



Another of my key themes is social networking and Facebook just released data about their Hadoop storage cluster. 21 PB of data, that's 1 million gigabytes and you can bet it will double next year or so. That's a lot of data to mine to get value for advertisers.