S.Visalakshi, V.Radha
In real-time dataset, lots of missing values are prompt and habitually become a very serious problem in data mining. A dataset with missing value becomes a general problem of data quality. Some sort of missing data are present moreover in training set or testing set which will affect the accuracy of any learned classifiers. Handling of missing data is very significant, as they have a pessimistic blow on the interpretation and the result of data mining process. Handling of missing value techniques can be grouped into a bunch of four categories namely, Imputation methods, Maximum Likelihood, Machine Learning and Complete Case Analysis. The KNearest Neighbor (KNN) is one of the imputation techniques used to treat missing value. The drawback of KNN is overcome in the proposed KNN. The proposed technique is implemented and used for identify contamination in drinking water. Missing value is implemented in the water dataset. It helps to improve automatic water contamination detection scheme. Some of the existing and popular imputation techniques are compared, and it is proved that the proposed system produces better results compared to other imputation techniques.