Please use this identifier to cite or link to this item:
|Title:||A MapReduce-based parallel K-means clustering for large-scale CIM data verification|
|Keywords:||CIM verification;Stochastic sampling;Clustering;MapReduce;Load balancing|
|Citation:||Concurrency Computation, 27, (6): pp.1375-1638, (2015)|
|Abstract:||The Common Information Model (CIM) has been heavily used in electric power grids for data exchange among a number of auxiliary systems such as communication systems, monitoring systems, and marketing systems. With a rapid deployment of digitalized devices in electric power networks, the volume of data continuously grows, which makes verification of CIM data a challenging issue. This paper presents a parallel K-means clustering algorithm for large-scale CIM data verification. The parallel K-means builds on the MapReduce computing model which has been widely taken up by the community in dealing with data-intensive applications. A genetic algorithm-based load-balancing scheme is designed to balance the workloads among the heterogeneous computing nodes for a further improvement in computation efficiency. The performance of the parallel K-means is initially evaluated in a small-scale in-house MapReduce cluster and subsequently evaluated in a commercial cloud computing platform. Finally, the parallel K-means is evaluated in large-scale simulated MapReduce environments. Both the experimental and simulation results show that the parallel K-means reduces the CIM data-verification time significantly compared with the sequential K-means clustering, while generating a high level of precision in data verification.|
|Appears in Collections:||Dept of Electronic and Computer Engineering Research Papers|
Items in BURA are protected by copyright, with all rights reserved, unless otherwise indicated.