A Complete Tutorial to learn Data Science in R from Scratch


Student Name: Praewwanit Tiwasing (倪婉欣)

Student ID: M0798633

Blog#5: A Complete Tutorial to learn Data Science in R from Scratch
Website: https://www.analyticsvidhya.com/blog/2016/02/complete-tutorial-learn-data-science-scratch/?fbclid=IwAR3Zoh5-s5_0ttC8u3qbbKSsYmKUUXUFUb9qbkI_PKmypGsxpciXwDHYSGA

The author provides tutorial on data exploration, data visualization, data manipulation, building models using Regression, Decision Trees, and Random Forest Algorithms which used in predictive modeling in R.

Here are what I have learned so far from this blog:

1. Basic of R programming: Understand the benefits of R, learn how to install R Studio and R Packages, and basic computations in R.

2. Essentials of R programming 

- Data Types: Realize the difference of data types which includes vector, matrices, data frames and list.

- Control Structures: Control the flow of code written inside a function.

- Useful R package: The author suggests packages for importing data, data visualization, data manipulation, and modeling.

3. Exploratory Data Analysis: Get to know the meaning of response variable (Dependent Variable) and predictor variable (Independent Variable).

- Graphical Representation of Variables: By using graph, we can analyze the data in two ways which are Univariate Analysis and Bivariate Analysis. Univariate Analysis is done with one variable and Bivariate Analysis is done with two variables.

4. Data Manipulation

- Label Encoding and One Hot Encoding: The use of numerically encoding different levels of a categorial variables.

5. Predictive Modeling using Machine Learning

- Linear (Multiple) Regression: Suitable for response variable is continuous in nature and predictors are many.

- Decision Trees: Utilize a complexity parameter to measure the tradeoff between model complexity and accuracy on training set. A smaller complexity parameter leads to a bigger tree. On the other hand, a larger complexity parameter might underfit the model.

- Random Forest: Focuses on missing values, outliers, and other non-linearities in the data set.

Comments

Popular posts from this blog

How Big Data Can Boost Weather Forecasting

How Big Data is Changing the Production Industry

Big Data case study: 5 relevant examples from the airline industry