Please use this identifier to cite or link to this item:
Title: A MapReduce-based parallel K-means clustering for large-scale CIM data verification
Authors: Deng, C
Liu, Y
Xu, L
Yang, J
Liu, J
Li, S
Li, M
Keywords: CIM verification;Stochastic sampling;Clustering;MapReduce;Load balancing
Issue Date: 2015
Publisher: Wiley
Citation: Concurrency Computation, 27, (6): pp.1375-1638, (2015)
Abstract: The Common Information Model (CIM) has been heavily used in electric power grids for data exchange among a number of auxiliary systems such as communication systems, monitoring systems, and marketing systems. With a rapid deployment of digitalized devices in electric power networks, the volume of data continuously grows, which makes verification of CIM data a challenging issue. This paper presents a parallel K-means clustering algorithm for large-scale CIM data verification. The parallel K-means builds on the MapReduce computing model which has been widely taken up by the community in dealing with data-intensive applications. A genetic algorithm-based load-balancing scheme is designed to balance the workloads among the heterogeneous computing nodes for a further improvement in computation efficiency. The performance of the parallel K-means is initially evaluated in a small-scale in-house MapReduce cluster and subsequently evaluated in a commercial cloud computing platform. Finally, the parallel K-means is evaluated in large-scale simulated MapReduce environments. Both the experimental and simulation results show that the parallel K-means reduces the CIM data-verification time significantly compared with the sequential K-means clustering, while generating a high level of precision in data verification.
ISSN: 1532-0626
Appears in Collections:Dept of Electronic and Computer Engineering Research Papers

Files in This Item:
File Description SizeFormat 
Fulltext.pdf665.78 kBAdobe PDFView/Open

Items in BURA are protected by copyright, with all rights reserved, unless otherwise indicated.