If you’re a programmer, think back for a moment to the first time you hear the career question. You know the one I mean, even if you don’t recognize it as the question: “do you see yourself on the architect track or the management track?” Caught off guard, you panic momentarily as you feel that you have about 5 seconds to decide whether your long term future involves lots of UML diagrams and flow charts or whether it involves lots of Power Point presentations and demanding TPS reports from underlings. If you’re like most, and you were to answer honestly, you’d probably say, “neither, really, because I kind of like writing code.” But you don’t give that answer I never did because you’d effectively be responding to a career development question with, “I have no interest in career development.” But let’s put a pin in that for a moment.
Hadoop++: Nowadays, working over very large data sets (Petabytes of information) is a common reality for several enterprises. In this context, query processing is a big challenge and becomes crucial. The Apache Hadoop project has been adopted by many famous companies to query their Petabytes of information. Some examples of such enterprises are Yahoo! and Facebook. Recently, some researchers from the database community indicated that Hadoop may suffer from performance issues when running analytical queries. We believe this is not an inherent problem of the MapReduce paradigm but rather some implementation choices done in Hadoop. Therefore, the overall goal of Hadoop++ project is to improve Hadoop’s performance for analytical queries. Already, our preliminary results show an improvement of Hadoop++ over Hadoop by up to a factor 20. In addition, we are currently investigating the impact of a number of other optimizations techniques.
It’s pretty well known that Zynga has a metrics driven culture. It’s true. We meticulously measure most areas of our business, from user survey results, to engagement levels in our games, to the actions our players take during a game session. As a result, we generate (and analyze) tens of billions of rows of data daily. The question is, how do we store it?
How to work with the HashTable collection in Visual C#
Scaling the Facebook data warehouse to 300 PB