Wednesday, January 28, 2009

Facebook Presentation

Facebook must handle large amounts of data and do data analysis on that data and how it is used. Data analysis is hard to do on data for the site because users want to pull the data and pulling data for analysis would slow down the site.

This is a general talk about data analysis involving very large amounts of data.

Facebook has built numerous tools to analyze data involving statistical tools, visual graphs, varying timescales, etc. Building these tools was necessary to automate the analysis, so a small number of people could do the actual analysis.

Large scale log file analysis is easier than doing analysis on a database that is updated frequently. You do not need to keep historical info in the database because it is never retrieved by the site.

Facebook is looking at distributed databases. Commodity hardware is used for the data center.

Data analysis must provide answers to questions from finance, marketing, etc. Don't collect data without a purpose; the amount of data collected can become overwhelming. Focus on what you can learn from your data.

This presentation gave a lot of general principles for data analysis. This will become more important in the coming years and more companies need to deal with such large amounts of data and attempt to analyze the data and learn something from it.

No comments:

Post a Comment