An Iterative Approach to Record Deduplication

M. Roshini Karunya; S. Lalitha; B.Tech.; M.E.

Abstrait

An Iterative Approach to Record Deduplication

M. Roshini Karunya, S. Lalitha, B.Tech., M.E.,

Record deduplication is the task of identifying, in a data repository, records that refer to the same real world entity or object in spite of misspelling words, typos, different writing styles or even different schema representations or data types [1]. The existing system aims at providing Unsupervised Duplication Detection method which can be used to identify and remove the duplicate records from different data sources. UDD, which for a given query, can effectively identify duplicates from the query result records of multiple web databases. Two cooperating classifiers, a Weighted Component Similarity Summing Classifier (WCSS) and Support Vector Machine (SVM) are used to iteratively identify the duplicate records from the non duplicate record and we also present a Genetic Programming (GP) approach to identify record deduplication. Since record deduplication is a time consuming task even for small repositories, our aim is to foster a method that finds a proper combination of the best pieces of evidence, thus yielding a deduplication function that maximizes performance using a small representative portion of the corresponding data for training purposes. We propose two more algorithms namely Particle Swarm Optimization (PSO), Bat Algorithm (BA) to improve the optimization. Index Terms – Data mining, duplicate records, genetic algorithm

Avertissement: Ce résumé a été traduit à l'aide d'outils d'intelligence artificielle et n'a pas encore été examiné ni vérifié

Faits saillants de la revue

Adaptatif Algorithmes numériques avancés Architectures informatiques avancées Bioinformatique et biologie computationnelle Calcul en grille Capteurs sans fil Entreposage de données Informatique autonome et contextuelle Logiciels open source Middleware basé sur des agents Protocole de communication CDMA/GSM Réseau ad hoc Réseaux haut débit et intelligents Reconnaissance de modèles/images d’intelligence artificielle Robotique Sécurité de la base de données Structure de données Systèmes de sécurité Technologie calme Technologie radar

Indexé dans

Index Copernicus

Academic Keys

CiteFactor

Cosmos IF

RefSeek

Hamdard University

World Catalogue of Scientific Journals

International Innovative Journal Impact Factor (IIJIF)

International Institute of Organised Research (I2OR)

Cosmos

Revues internationales

Ingénierie Sciences générales Sciences médicales Sciences pharmaceutiques

Revue internationale de recherche innovante en génie informatique et des communications

Abstrait

An Iterative Approach to Record Deduplication

Faits saillants de la revue

Indexé dans

Revues internationales

Adresse