Abstrait

Balancing privacy and Utility in Defect Prediction by Privatization Algorithm

Dr.S.Chitra, P.Sumathi, T.Kanchana

We need to find data for quality prediction. Early in the life cycle, projects may lack the data needed to build such predictors. Prior work assumed that relevant training data was found nearest to the local project. To provide defect data-set owners with an effective means of privatizing their data prior to release, the MORPH understands how to maintain class boundaries in a data-set. MORPH is a data mutator that moves the data a random distance, taking care not to cross class boundaries. The value of training on this MORPHed data is tested via a 10-way within learning study and a Cross learning study using Random Forests, Naive Bayes, and Logistic Regression for ten object-oriented defect data-sets from the ROMISE data repository. Measured in terms of exposure of sensitive attributes, the MORPHed data was four times more private than the unMORPHed data. Also, in terms of the f-measures, there was little difference between the MORPHed and unMORPHed data (original data and data privatized by data-swapping) for both the cross and within study. We conclude that at least for the kinds of OO defect data studied in this project, data can be privatized without concerns for inference efficacy.

Avertissement: Ce résumé a été traduit à l'aide d'outils d'intelligence artificielle et n'a pas encore été examiné ni vérifié