Archive for October, 2004
Round 1: Yahoo 1 Google 0
Yahoo wins.
Google looses.
Surprisingly enough, I didn’t really want 4 of the first 10 results to be random phpinfo() pages.
Missed Party
Looks like I’ll miss the party in my hometown.
Google Desktop
Folks have been talking about a Google browser, but I didn’t really see what would be so great about it. After reading a thought provoking post, I had a hunch that this is really where Google is going. I was primarily thinking about personal desktop search, but I wouldn’t be surprised to see Google move in the direction that Doug mentions. Just a couple days ago I was at a seminar by a Google employee. Someone asked about if Google is doing anything with desktop search. The presenter either didn’t understand the question, didn’t have a clue, or was a really good actor because he gave no indication that they were moving in this direction. Enter Google Desktop.
Google Talk
I went to visit Adam yesterday at umich to go to a Google Seminar by a googler. It was fun and interesting. A bunch of the information was similar to a talk given by a different googler a couple years ago. I think the most notable new subject discussed was a technology Google employs to easily develop and run programs that analyze their huge stores of data. It was called MapReduce, I think. It sounded pretty slick. Along with this he talked more about how Google manages these types of jobs and how it interacts closely with Google File System.
This MapReduce stuff allowed them to write simpler programs that would compute stuff over their entire dataset without worrying about stuff dying along the way. He showed some sample C++ code of a MapReduce program of maybe 20 lines that would sum up the number of links Google indexed per domain. I believe based on the numbers he mentioned that would have taken less than 20 minutes. Crazy talk! Maybe I’ll try to describe this a bit more in a different post. I’m not sure I remember everything exactly.
It was some good stuff. My favorite quote was when he said how early on in Google’s history, “We got really good at moving out of bankrupt data centers.” This is in regards to them putting 88(?) servers per rack and squeezing them in tight, which hurt the data centers who at that time charged per square feet (I think for stuff like power and cooling and not network usage).
This talk kind of renewed my thought that really there should be a good, open source implmentation of a Google File System type thinger. It seems GFS has some optimizations that would limit its applications. However, I wonder if we could sacrafice some of these optimizations to make it more flexible, but yet still maintain something that is worthwhile. Maybe sometime I’ll get around to solidfying some thoughts on this and how it should work and what features it should have.
Update: MapReduce
Update 2: Looks like a very similar talk given by someone else (Jeff Dean).