Data Transformation for Machine Learning

Title : Data Transformation for Machine Learning

Blog #1

Name : Devina
Student ID : D0731576
Source : https://insidebigdata.com/2020/05/07/data-transformation-for-machine-learning/
Written By : Damian Chan, Technical Success Manager, Matillion


Nowadays, everyone is talking about machine learning including industry experts, competitors, and customers. Human can use machine learning that usually understood to be a form of artificial intelligence to build and train model to process data, which can help us to have a better prediction and also allows computer systems to learn from data to make decisions without being explicitly programmed to do so.

Machine learning can learn and improve their analysis without reprogramming based on the given data. Deep learning is also a subset of machine learning that involving artificial neural networks.

In order to get a good insight you have to provide a good data to support machine learning to analyze your model. Data can be really messy or even they called it as a daunting tasks. Data transforming can be time consuming and tedious without the right technology stack in place.

How data transformation can improve machine leaning, here are some key aspects :
1. Remove Unused and Repeated Columns not only improve speed on model trains but also when        you analyze it
2. Change Data Types to save memory usage
3. Handle Missing Data by considering imputation to replacing the missing value with a simple         placeholder or another value, based on some kind of assumption
4. Remove String Formatting and Non-Alphanumeric Characters although removing formatting and other characters makes the sentence less readable for humans, this approach helps the algorithm to better digest the data
5. Convert Categorical Data to Numerical converting values such as yes and no to 1 and 0
6. Convert Timestamps define specific data/time format and convert all timestamps to the defined format

Machine Leaning can help your business process and understand data insight faster. But in order to transforming data for analysis it can be hard based on the growing volume, variety and velocity of big data, so to overcome this you need ETL software that is purpose-built for the cloud.

Comments

Popular posts from this blog

How Big Data Can Boost Weather Forecasting

How Big Data is Changing the Production Industry

Big Data case study: 5 relevant examples from the airline industry