Unit No. 2 Supervised Machine Learning

 

1. The Core Concept: What is Supervised Learning?

Supervised Learning is a sub-branch of artificial intelligence where algorithms are trained using labeled datasets. Think of it like a teacher supervising a student. The teacher provides the student with example problems along with the correct answers. Over time, the student learns the underlying pattern and can solve new, unseen problems.

In technical terms, the algorithm receives input features (often denoted as X) paired with their corresponding target outputs (denoted as Y). The goal of the model is to learn a mapping function from the input to the output, allowing it to accurately predict the targets for entirely new, unlabeled data points.

LabeledData (X, Y)TrainingAlgorithmTrainedModelNew Data

Regression

Regression algorithms are used when the target variable is continuous or numerical. The goal is to predict a specific value based on historical trends. Examples include predicting house prices, temperature forecasting, or estimating a company's future revenue.

Classification

Classification algorithms are used when the target variable is categorical or discrete. The goal is to sort data points into distinct classes. Examples include determining if an email is "Spam" or "Not Spam," or identifying whether an image contains a "Cat" or a "Dog."

3. Popular Supervised Learning Algorithms

Linear & Logistic Regression

Linear Regression attempts to model the relationship between two variables by fitting a linear equation to the observed data. It is the fundamental algorithm for regression tasks.

Logistic Regression, despite its name, is a classification algorithm. It applies a mathematical transformation (the sigmoid function) to a linear model to output a probability between 0 and 1, making it perfect for binary classification (e.g., Yes/No outcomes).

Decision Trees

This algorithm acts like a flowchart. It breaks down a dataset into smaller and smaller subsets by asking a series of "True/False" questions about the features, culminating in a prediction at the "leaves" of the tree. It is highly interpretable.

Income > 50k?YesNoApprove LoanHas Debt?

Support Vector Machines (SVM)

SVMs are powerful models that attempt to find the optimal "hyperplane" (a line in 2D, a plane in 3D) that separates different classes with the maximum possible margin. They are robust in high-dimensional spaces and effective when clear boundaries exist between groups.

4. Model Evaluation & Overfitting

Training vs. Testing

We never test a machine learning model on the exact same data it used to learn. Doing so is like giving a student the exam paper to study from. Instead, we split our labeled dataset into two distinct parts:

  • Training Set: (Usually 70-80% of data) Used by the algorithm to learn the patterns.
  • Testing Set: (Remaining 20-30%) Used to evaluate how well the model generalizes to completely new, unseen information.

The Danger of Overfitting

Overfitting occurs when a model learns the training data too perfectly, memorizing the noise and random fluctuations rather than the true underlying trend. When an overfitted model faces new test data, its performance drops drastically.

Conversely, Underfitting happens when the model is too simple to capture the underlying pattern in the data, resulting in poor performance on both training and testing data

Comments

Popular posts from this blog

Unit No. 1 Introduction to ML

Unit No. 3 Unsupervised Machine Learning