Week 11 Readings
IES Ch 14
Parallel query processing is simple. Increase the number of machines accepting queries, and give each a copy of the index. This makes query speed increase directly proportionally with the number of machines. Of course, if the index is too large to be stored on one system this is not possible. Intra-query parallelism is thus more common, where each machine holds only part of the index and queries on terms in that mini-index are directed to the machine.
Replication seems like a very easy way to introduce fault tolerance. If each query is run on each of the machines, then the chance that every single one of these will fail at once is really small. If one fails, then the next in line can finish the query with no loss of time. Also, this multiple redundancy makes it easy to simply replace the failed machines as needed.
This reading also presents a fantastic buildup of MapReduce, by introducing the paper from its simplest form to the more advanced problems encountered in a large search engine.
No comments:
Post a Comment