Home > Published Issues > 2015 > Volume 10, No. 11, Nov. 2015 >

Research on Parallelization of KNN Locally Weighted Linear Regression Algorithm Based on MapReduce

Tao Xue, Ting-Ting Li, and Bingshuai Sun
Department of Computer Science, Xi’an Polytechnic University, Xi’an 710048, China

Abstract—As it is known to all, linear regression analysis is a significant method in the fields of data mining which plays an important role in scientific research and business data analysis at present. In order to improve the ability of linear regression algorithm to handle large datasets, the parallelization of K-NN (K-Nearest Neighbor) algorithm was come up. Therefore, we put forward a new analysis method, which based on the K-NN algorithm with the Locally Weighted Linear Regression (LWLR) algorithm, combines the characteristics of the KNN and LWLR, short for KNN-LWLR algorithm, then uses the MapReduce programming model and implements the KNN-LWLR algorithm in Hadoop cluster. Related experiments show that, the KNN-LWLR algorithm compared with traditional regression analysis algorithm, the prediction speed has been greatly improved and the scalability is very good when dealing with large scale datasets.
 
Index Terms—KNN-LWLR algorithm, linear regression analysis, locally weighted, MapReduce, parallelization term

Cite: Tao Xue, Ting-Ting Li, and Bingshuai Sun, “Research on Parallelization of KNN Locally Weighted Linear Regression Algorithm Based on MapReduce," Journal of Communications, vol. 10, no. 11, pp. 864-869, 2015. Doi: 10.12720/jcm.10.11.864-869