Efficiently Parsing Billions of Addresses on MapReduce
Xiang Li Hakan Kardes Xin Wang Ang Sun
In this paper,we present a probabilistic address parsing system based on the Hidden Markov Mo del. We also intro duce several novel approaches to build mo dels for noisy real-world addresses,obtaining 95 .6% F-measure. Furthermore, we demonstrate the viability and efficiency of this system for large-scale data by scaling it up to parse billions of addresses with Hadoop.