Abstract: Anomaly detection is a crucial analysis topic in the field of data mining as well as machine learning. Several real-world applications like Intrusion or MasterCard fraud detection need a good and efficient framework to spot deviated data instances. A good anomaly detection methodology must be able to accurately establish many varieties of anomalies, robust, need comparatively very little resources, and perform detection in period of time. In this paper we proposed the idea of combining the two different algorithms i.e. Median Based Outlier Detection and Online Oversampling PCA for effective detection of anomaly in online updating mode. Median Based outlier detection uses the interquartile range which is a measure of statistical dispersion being equal to the difference between the upper and lower quartiles. Whereas oversampling PCA does not need to store the entire covariance matrix or data matrix and thus this approach is a more useful in online or large scale problem. Compared with other anomaly detection algorithm our experimental result verifies the feasibility of our proposed method.
Keywords: Anomaly, Leave One Out, Median, Oversampling and Principal Component Analysis
[1]. Hawkins, D.M. 1980. Identification of Outliers. Chapman and Hall Publication.
[2]. Angiulli, F., Basta, S., and Pizzuti, C. 2006.Distance-Based Detection and Prediction of Outliers. IEEE Trans. Knowledge and Data Engg. vol. 18, no. 2, pp. 145-160, 2006.
[3]. Jin, W., Tung, A.K.H., Han, J., and Wang, W. 2006. Ranking Outliers Using Symmetric Neighborhood Relationship. Proc. Pacific-Asia Conf. Knowledge Discovery and Data Mining, 2006.
[4]. Aggarwal, C. C., and Yu, S. P. 2005. An effective and efficient algorithm for high-dimensional outlier detection, The VLDB Journal, vol. 14, pp. 211–221, 2005.
[5]. Kriegel, H.P., Schubert, M. and Zimek, A. 2008. Angle-Based Outlier Detection in High-Dimensional Data. Proc. 14th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, 2008