Statistics of Machine Learning is a very vital tool that helps developers and organizations to use formulas to make optimum utilization of the application. Edureify, the best AI Learning App has previously provided information on the Statistics of Machine Learning, the Descriptive Statistics of Machine Learning, Standard and Variable Deviation of Machine Learning, and the 3 Measures of Central Tendency for Machine Learning.
In this article, Edureify will provide information on Range and Interquartile Range that help detect outliers in Machine Learning.
Read on to know more about the Range and Interquartile Range of Machine Learning and also learn from the best coding courses of Edureify to sharpen your knowledge.
What is Range?
In the simplest mathematical explanation, Range is the difference between the maximum value and the minimum value in a dataset. The range is one of the properties of Measures of Dispersion that gives information on how data spreads across a distance.
The formula to calculate Range is:
Range= Maximum value- Minimum value
Using this formula, if the Range is pf a higher value it signifies that the variability of distribution is high and if the value is low then it signifies that the variability of distribution is low.
What is Interquartile Range?
Interquartile Range, IQR is a property that is used to measure variability. The IQR divides a dataset into quartiles. These quartiles store data after the data gets sorted in ascending order and split into 4 equal parts. The first, second, and third quartiles are called Q1, Q2, and Q3. These quartiles are the values that separate the 4 equal parts.
The following is the percentile distribution of the data amongst Q1, Q2, and Q3-
- 25th percentile of the data is represented in Q1
- 50th percentile of the data is represented in Q2
- 75th percentile of data is represented in Q3
The measurement of IQR gives an insight into the width of distribution as most of the points of the dataset are contained in this range.
In a dataset that contains even or odd elements of data points, then-
- Q1 is the median
- Q2 is the median of x smallest points of data
- Q3 is the median of x highest points of data
Example of Calculating IQR
Example to find the IQR value in case of odd data points-
Consider the following dataset and calculate the IQR-
10, 25, 90, 30, 5
Solution:
Step 1- Arrange the dataset in increasing order: 5, 10, 25, 30, 90
Step 2- Identify the Median
5, 10, 25, 30, 90
Here, the median is 25
Step 3- Put the numbers before and after the median inside a bracket-
(5, 10), 25, (30, 90)
This step helps identify Q1 and Q3
Step 4- Find the median of Q1 and Q3
(5, 10), 25, (30, 40)
Q1 Median= 10
Q3 Median= 40
Step 5– Minus Q3 from Q1 to get the IQR-
Q3- Q1= 40- 10= 30
Therefore, IQR= 30
Example to find the IQR with even data points-
Consider the following dataset and find the IQR-
50, 35, 25, 70
Solution:
Step 1- Arrange the dataset in increasing order: 25, 35, 50, 70
Step 2– Place a mark in the center of the dataset:
25, 35, – 50, 70
Step 3- Put a bracket around the data points before and after the mark:
(25, 35) – (50, 70)
Step 4- Find Q1 and Q3:
Q1= 35
Q3= 70
Step 5- Subtract Q1 from Q3:
Q3- Q1= 70- 35= 35
Therefore, IQR= 35
Where is Interquartile Range Used in Machine Learning?
In Machine Learning, the best use of the Interquartile Range is to measure the variability of the distribution of data that contains outliers. IQR does not get affected by outliers. Hence, it is ideal to use IQR to detect the outliers for it is a value that lies in the middle of a dataset distribution.
What are Outliers?
While performing a measurement, Outliers point out the errors made in an experiment, during the measurements of variables, or any irregularity. One must note that not all outliers are bad and that some can be helpful too-
- Good Outliers- These outliers signify whether a data is distinct or unique from other data. For example, an outlier may point out an irregularity like an illegal occurrence or spread of a virus.
- Bad Outliers- These outliers need to be taken out of a dataset immediately. For example, if a record has mistakenly noted the date of birth of a person to be 1/09/1987 instead of 1/09/1997, the outlier will point it out for its immediate execution.
The following are some more features of Outliers-
- Outliers can have a bad effect on the mean and standard deviation of a dataset. These can generate wrong results statistically.
- For efficient and successful use of Machine Learning algorithms, one must first detect and take out the outliers as most of the machine learning algorithms do not work properly in the presence of outliers.
- The outliers are efficient in detecting irregularities like fraud actions.
Identification of Outliers using IQR
To identify the Outliers, one needs to first calculate the IQR. Once the IQR is calculated it becomes easier to point out the outliers.
An outlier can be identified if it fulfills one of the following conditions-
- If it is more than the 75th percentile + 1.5 IQR
- If it is below the 25th percentile – 1.5 IQR
Using these two conditions, one can easily identify the outliers. The 1.5 coefficient helps to rely upon the normal distribution. Quartiles and percentiles are ideal tools to find out the outliers because it is based on counts, unlike standard deviation which cannot always generate efficient results.
Machine Learning is the most used application by developers and organizations for its dynamic qualities. Edureify with its online coding courses teaches all the important technicalities of Machine Learning like-
- The Algorithms of Machine Learning
- Azure of Machine Learning
- A-Z Statistics of Machine Learning
- Ways to Master No-Code Machine Learning
- Descriptive Statistics of Machine Learning
- Standard and Variable Deviation of Machine Learning
- 3 Measures of Central Tendency of Machine Learning
With Edureify’s coding courses students can also benefit from the following-
- Get 200+ learning hours
- Attend live lectures and learn from the industry experts
- Access the recorded lectures
- Have doubts cleared instantly
- Get professional career guidance and access to Edureify’s job portal
Join Edureify’s certified coding courses and learn all the hooks of Machine Learning and more.
Some FAQs on Range and Interquartile Range-
1. What is Range?
The range is the difference between the maximum value and the minimum value in a dataset. The range is one of the properties of Measures of Dispersion that gives information on how data spreads across a distance.
2. What is the formula to calculate to Range?
Range= Maximum value – Minimum Value
3. What is IQR?
Interquartile Range, IQR is a property that is used to measure variability. The IQR divides a dataset into quartiles. These quartiles store data after the data gets sorted in ascending order and split into 4 equal parts.
4. What are Outliers?
While performing a measurement, Outliers point out the errors made in an experiment, during the measurements of variables, or any irregularity.
5. From where can I learn more about Range, Interquartile Range, and Outliers?
Edureify has the best online coding courses that teach students everything regarding Machine Learning like Range, Interquartile Range, and Outliers.