Anomaly Detection

What is Anomaly Detection?

In data analysis, anomaly detection (also outlier detection) is the identification of rare items, events, or observations that raise suspicions by differing significantly from the majority of the data. We can find many use cases in many fields of industry. So, consider we are running a factory, for instance, so we are in need to figure out the defective and damaged products, then anomaly detection comes into picturization.

We can use anomaly detection to find out the defective ones and we can make our production effective. There are many use cases if we were to consider for anomaly detection. Anomaly detection is a boon to many industries and banks as there is no manpower required to the particular business problem in identification of anomaly, anomaly detection is also used in fraud detection in many banks.



Problem Motivation:

Let us assume that we are having a dataset x(1),x(2),x(3)…..x(m)  and then we are then left with a test example xtest and we are also interested in finding whether the given dataset is anomaly or not.

We define a “model” p(x) that tells us the probability the example is not anomalous. We also use a threshold ϵ (epsilon) as a dividing line so we can say which examples are an anomalous and which are not.

thus, A very common application of anomaly detection is detecting fraud:

  • X(i) = features of user i’s activities
  • Model p(x) from the data.
  • Identify unusual users by checking which have p(x)<ϵ.

If our anomaly detector is flagging too many anomalous examples, then we need to decrease our threshold ϵ

Algo:

Given a training set of examples, {x(1),x(2)……x(m)} where each example is a vector.

In stats, we call this as “independent assumption” on the values of the features inside training example x. So, for this case we are assuming that these values of the features are independent, as a result we are multiplying since they are independent.

ultimately after this we need to choose features x(i) that you think might be an anomaly. So after this process, we are gonna fit the parameters 


So, we flag the datapoint as anomaly only if p(x)<ϵ.

Evaluating an Anomaly Detection System:

To evaluate our learning rule, we tend to take some labeled information, categorized it into abnormal and non-anomalous examples ( y = zero if traditional, y = one if anomalous).

Among that information, take an oversized proportion of excellent, non-anomalous information thus for the coaching assail that to coach p(x).

Then, take a smaller proportion of mixed abnormal and non-anomalous examples (you can typically have more non-anomalous examples) for your cross-validation and check sets.

For example, we tend to could have a collection wherever zero.2% of the info is abnormal. we tend to take an hour of these examples, all of that square measure smart (y=0) for the coaching set. we tend to then take 2 hundredth of the examples for the cross-validation set (with zero.1% of the abnormal examples) and another 2 hundredths from the check set (with another zero.1% of the anomalous).

Algo Evaluation:

On a cross validation/test example x, predict:

If p(x) < ϵ (anomaly), then y=1

If p(x) ≥ ϵ (normal), then y=0

Possible evaluation metrics :

  • True positive, false positive, false negative, true negative.
  • Precision/recall
  • F1 Score

Advantages:

  1. Monitor any data source, including user logs, devices, networks, and servers.
  2. Rapidly identify zero-day attacks as well as unknown security threats.
  3. Find unusual behaviors across data sources that are not identified when using traditional security methods.

Disadvantages:

  1. appropriate feature extraction
  2. defining normal behaviors
  3. handling imbalanced distribution of normal and abnormal data

Summary:

In this article, we have seen the detailed study of anomaly detection using gaussian distribution and we have seen many use cases of anomaly industries and banks and we have also seen the advantages and disadvantages of anomaly detection and most importantly the evaluation methods of anomaly detection.

Written By: Naveen Reddy

Reviewed By: Krishna Heroor

If you are Interested In Machine Learning You Can Check Machine Learning Internship Program
Also Check Other Technical And Non Technical Internship Programs

Leave a Comment

Your email address will not be published. Required fields are marked *