Ensuring the Comprehensible Approaches for Duplicate Detection in Hugh Data sets

Rachakonda Balaraju, Vadali Srinuvas

Abstract


In the present business world the dataset assumes an imperative part. With the expansion in the enterprises even there is an increment in the dataset so this will prompt the data duplication. Here we think of arrangement called the data duplicate detection i.e. By utilizing the method called progressive duplicate detection. We display two novel, progressive duplicate detection calculations which altogether expands the effectiveness of finding the duplicate data when the execution time is restricted. Here we get the quality data with no unsettling influence to the datasets. Duplicate detection is the way toward evacuating imitation in the archive. These calculation progressively alters conduct by picking parameters e.g. window estimate, square size and so forth. Exhaustive investigations demonstrate that our progressive calculations can twofold the productivity after some time of customary duplicate detection and essentially enhance related work.


References


X. Dong, A. Halevy, and J. Madhavan, “Reference reconciliation in complex information spaces,” in Proc. Int. Conf. Manage. Data, 2005, pp. 85–96.

O. Hassanzadeh, F. Chiang, H. C. Lee, and R. J. Miller, “Framework for evaluating clustering algorithms in duplicate detection,” Proc. Very Large Databases Endowment, vol. 2, pp. 1282– 1293, 2009.

O. Hassanzadeh and R. J. Miller, “Creating probabilistic databases from duplicated data,” VLDB J., vol. 18, no. 5, pp. 1141–1166, 2009.

U. Draisbach, F. Naumann, S. Szott, and O. Wonneberg, “Adaptive windows for duplicate detection,” in Proc. IEEE 28th Int. Conf. Data Eng., 2012, pp. 1073–1083.

S. Yan, D. Lee, M.-Y. Kan, and L. C. Giles, “Adaptive sorted neighborhood methods for efficient record linkage,” in Proc. 7th ACM/ IEEE Joint Int. Conf. Digit. Libraries, 2007, pp. 185–194.

J. Madhavan, S. R. Jeffery, S. Cohen, X. Dong, D. Ko, C. Yu, and A. Halevy, “Webscale data integration: You can only afford to pay as you go,” in Proc. Conf. Innovative Data Syst. Res., 2007.

S. R. Jeffery, M. J. Franklin, and A. Y. Halevy, “Pay-as-you-go user feedback for dataspace systems,” in Proc. Int. Conf. Manage. Data, 2008, pp. 847–860.

L. Kolb, A. Thor, and E. Rahm, “Parallel sorted neighborhood blocking with MapReduce,” in Proc. Conf. Datenbanksysteme in Buro,€Technik und Wissenschaft, 2011.

U. Draisbach, F. Naumann, S. Szott, and O. Wonneberg, “Adaptive windows for duplicate detection,” in Proc. IEEE 28th Int. Conf. Data Eng., 2012, pp. 1073–1083.

S. Yan, D. Lee, M.-Y. Kan, and L. C. Giles,“Adaptive sorted neighbourhood methods for efficient record linkage,” in Proc. 7th ACM/Joint Int. Conf. Digit. Libraries, 2007, pp. 185–194.


Full Text: PDF [Full Text]

Refbacks

  • There are currently no refbacks.



Copyright © 2012, All rights reserved.| ijmca.org

Creative Commons License
International Journal of Mechanical Engineering and Computer Applications by chief editor is licensed under a Creative Commons Attribution 3.0 Unported License.Permissions beyond the scope of this license may be available at www.ijmca.org.