Introduction: Data Types in ML
Data Types are a way of classification that specifies which type of value a variable can store and what type of mathematical operations, relational, or logical operations can be applied to the variable without causing an error. In Machine learning, it is very important to know appropriate datatypes of independent and dependent variable.
as it provides the basis for selecting classification or regression models. Incorrect identification of data types leads to incorrect modeling which in turn leads to an incorrect solution.
Here I will be discussing different types of data types with suitable examples.
Different Types of data types
The Data type is broadly classified into
- Quantitative
- Qualitative
1. Quantitative data type: –
This type of data type consists of numerical values. Anything which is measured by numbers.
E.g., Profit, quantity sold, height, weight, temperature, etc.
This is again of two types
A.) Discrete data type: –
The numeric data which have discrete values or whole numbers. This type of variable value if expressed in decimal format will have no proper meaning. Their values can be counted.
E.g.: – No. of cars you have, no. of marbles in containers, students in a class, etc.
B.) Continuous data type: –
The numerical measures which can take the value within a certain range. This type of variable value if expressed in decimal format has true meaning. Their values can not be counted but measured. The value can be infinite
E.g.: – height, weight, time, area, distance, measurement of rainfall, etc.
2. Qualitative data type: –
These are the data types that cannot be expressed in numbers. This describes categories or groups and is hence known as the categorical data type.
This can be divided into:-
a. Structured Data:
This type of data is either number or words. This can take numerical values but mathematical operations cannot be performed on it. This type of data is expressed in tabular format.
E.g.) Sunny=1, cloudy=2, windy=3 or binary form data like 0 or1, Good or bad, etc.
b. Unstructured data:
This type of data does not have the proper format and therefore known as unstructured data.This comprises textual data, sounds, images, videos, etc.
Besides this, there are also other types refer as Data Types preliminaries or Data Measures:-
- Nominal
- Ordinal
- Interval
- Ratio
These can also be refer different scales of measurements.
I. Nominal Data Type:
This is in use to express names or labels which are not order or measurable.
E.g., male or female (gender), race, country, etc.
II. Ordinal Data Type:
This is also a categorical data type like nominal data but has some natural ordering associated with it.
E.g., Likert rating scale, Shirt sizes, Ranks, Grades, etc.
III. Interval Data Type:
This is numeric data which has proper order and the exact zero means the true absence of a value attached. Here zero means not a complete absence but has some value. This is the local scale.
E.g., Temperature measured in degree Celsius, time, Sat score, credit score, pH, etc. difference between values is familiar. In this case, there is no absolute zero. Absolute
IV. Ratio Data Type:
This quantitative data type is the same as the interval data type but has the absolute zero. Here zero means complete absence and the scale starts from zero. This is the global scale.
E.g., Temperature in Kelvin, height, weight, etc.
Conclusion
In the above discussion, we have learned different types of data types and data preliminaries which are very useful for building machine learning models.
I hope I was able to explain the concept well. If you have any queries please post them in the comment section. I welcome you for feedback and suggestions. Thank you for reading.
Written By: Nabanita Paul
Reviewed By: Krishna Heroor
If you are Interested In Machine Learning You Can Check Machine Learning Internship Program
Also Check Other Technical And Non Technical Internship Programs