Q: What is the difference between classification and regression?

Classification predicts a discrete label (e.g., spam or not spam), while regression predicts a continuous value (e.g., house price). Classification outputs categories, whereas regression outputs numerical quantities. Both are supervised learning tasks.

Q: What is a random forest?

A random forest is an ensemble learning method that constructs multiple decision trees during training and outputs the mode (classification) or mean (regression) of the individual trees' predictions. It reduces overfitting by using bagging (bootstrap aggregating) and random feature selection. In scikit-learn: RandomForestClassifier() .

Q: What is a Support Vector Machine (SVM)?

An SVM is a supervised learning algorithm that finds the optimal hyperplane that maximally separates classes in feature space. The data points closest to the hyperplane are called support vectors . SVMs can handle non-linear boundaries using the kernel trick (e.g., RBF, polynomial kernels). In scikit-learn: SVC() .

Q: What is k-means clustering?

K-means is an unsupervised learning algorithm that partitions data into k clusters by iteratively assigning each point to the nearest centroid and then updating centroids to the mean of assigned points. It requires the number of clusters k to be specified in advance. In scikit-learn: KMeans(n_clusters=k) .

Q: What is gradient descent?

Gradient descent is an optimization algorithm used to minimize the loss function by iteratively updating model parameters in the direction of the negative gradient . The learning rate controls the step size of each update. Variants include batch , stochastic (SGD) , and mini-batch gradient descent.

Question 1

What is supervised learning?

Accepted Answer

Supervised learning is a type of machine learning where the model is trained on labeled data, meaning each training example includes an input and its corresponding correct output. The algorithm learns a mapping from inputs to outputs, which it then uses to make predictions on unseen data. Common tasks include classification and regression.

Question 2

What is unsupervised learning?

Accepted Answer

Unsupervised learning is a type of machine learning where the model is trained on unlabeled data and must discover hidden patterns or structures on its own. Common tasks include clustering (grouping similar data points), dimensionality reduction, and anomaly detection. Examples include k-means clustering and PCA.

Question 3

What is the difference between classification and regression?

Accepted Answer

Classification predicts a discrete label (e.g., spam or not spam), while regression predicts a continuous value (e.g., house price). Classification outputs categories, whereas regression outputs numerical quantities. Both are supervised learning tasks.

Question 4

What is linear regression?

Accepted Answer

Linear regression is a supervised learning algorithm that models the relationship between a dependent variable and one or more independent variables by fitting a straight line (or hyperplane). The model minimizes the sum of squared residuals between predicted and actual values. In scikit-learn, it is implemented as LinearRegression().

Question 5

What is logistic regression?

Accepted Answer

Logistic regression is a supervised learning algorithm used for binary classification despite its name containing "regression." It uses the sigmoid function to map predictions to probabilities between 0 and 1. A threshold (typically 0.5) is applied to convert probabilities into class labels.

Question 6

What is a decision tree?

Accepted Answer

A decision tree is a supervised learning model that makes predictions by learning if-then-else rules from the data, forming a tree-like structure of decisions. Each internal node represents a test on a feature, each branch represents an outcome, and each leaf represents a class label or value. They are easy to interpret but prone to overfitting.

Question 7

What is a random forest?

Accepted Answer

A random forest is an ensemble learning method that constructs multiple decision trees during training and outputs the mode (classification) or mean (regression) of the individual trees' predictions. It reduces overfitting by using bagging (bootstrap aggregating) and random feature selection. In scikit-learn: RandomForestClassifier().

Question 8

What is a Support Vector Machine (SVM)?

Accepted Answer

An SVM is a supervised learning algorithm that finds the optimal hyperplane that maximally separates classes in feature space. The data points closest to the hyperplane are called support vectors. SVMs can handle non-linear boundaries using the kernel trick (e.g., RBF, polynomial kernels). In scikit-learn: SVC().

Question 9

What is k-means clustering?

Accepted Answer

K-means is an unsupervised learning algorithm that partitions data into k clusters by iteratively assigning each point to the nearest centroid and then updating centroids to the mean of assigned points. It requires the number of clusters k to be specified in advance. In scikit-learn: KMeans(n_clusters=k).

Question 10

What is gradient descent?

Accepted Answer

Gradient descent is an optimization algorithm used to minimize the loss function by iteratively updating model parameters in the direction of the negative gradient. The learning rate controls the step size of each update. Variants include batch, stochastic (SGD), and mini-batch gradient descent.

Question 11

What is overfitting in machine learning?

Accepted Answer

Overfitting occurs when a model learns the noise and details of the training data too well, resulting in high accuracy on training data but poor performance on unseen data. Signs include a large gap between training and validation accuracy. Remedies include regularization, cross-validation, pruning, and using more training data.

Question 12

What is underfitting in machine learning?

Accepted Answer

Underfitting occurs when a model is too simple to capture the underlying patterns in the data, resulting in poor performance on both training and test data. This often happens when using an overly simple model or insufficient features. Remedies include using a more complex model, adding features, or reducing regularization.

Machine Learning Programming

🎯 What You'll Learn

Preview Questions