Master Machine Learning Programming with 50 free flashcards. Study using spaced repetition and focus mode for effective learning in Programming.
Supervised learning is a type of machine learning where the model is trained on labeled data, meaning each training example includes an input and its corresponding correct output. The algorithm learns a mapping from inputs to outputs, which it then uses to make predictions on unseen data. Common tasks include classification and regression.
Unsupervised learning is a type of machine learning where the model is trained on unlabeled data and must discover hidden patterns or structures on its own. Common tasks include clustering (grouping similar data points), dimensionality reduction, and anomaly detection. Examples include k-means clustering and PCA.
Classification predicts a discrete label (e.g., spam or not spam), while regression predicts a continuous value (e.g., house price). Classification outputs categories, whereas regression outputs numerical quantities. Both are supervised learning tasks.
Linear regression is a supervised learning algorithm that models the relationship between a dependent variable and one or more independent variables by fitting a straight line (or hyperplane). The model minimizes the sum of squared residuals between predicted and actual values. In scikit-learn, it is implemented as LinearRegression().
Logistic regression is a supervised learning algorithm used for binary classification despite its name containing "regression." It uses the sigmoid function to map predictions to probabilities between 0 and 1. A threshold (typically 0.5) is applied to convert probabilities into class labels.
A decision tree is a supervised learning model that makes predictions by learning if-then-else rules from the data, forming a tree-like structure of decisions. Each internal node represents a test on a feature, each branch represents an outcome, and each leaf represents a class label or value. They are easy to interpret but prone to overfitting.
A random forest is an ensemble learning method that constructs multiple decision trees during training and outputs the mode (classification) or mean (regression) of the individual trees' predictions. It reduces overfitting by using bagging (bootstrap aggregating) and random feature selection. In scikit-learn: RandomForestClassifier().
An SVM is a supervised learning algorithm that finds the optimal hyperplane that maximally separates classes in feature space. The data points closest to the hyperplane are called support vectors. SVMs can handle non-linear boundaries using the kernel trick (e.g., RBF, polynomial kernels). In scikit-learn: SVC().
K-means is an unsupervised learning algorithm that partitions data into k clusters by iteratively assigning each point to the nearest centroid and then updating centroids to the mean of assigned points. It requires the number of clusters k to be specified in advance. In scikit-learn: KMeans(n_clusters=k).
Gradient descent is an optimization algorithm used to minimize the loss function by iteratively updating model parameters in the direction of the negative gradient. The learning rate controls the step size of each update. Variants include batch, stochastic (SGD), and mini-batch gradient descent.
Overfitting occurs when a model learns the noise and details of the training data too well, resulting in high accuracy on training data but poor performance on unseen data. Signs include a large gap between training and validation accuracy. Remedies include regularization, cross-validation, pruning, and using more training data.
Underfitting occurs when a model is too simple to capture the underlying patterns in the data, resulting in poor performance on both training and test data. This often happens when using an overly simple model or insufficient features. Remedies include using a more complex model, adding features, or reducing regularization.
Flashcards
Flip to reveal
Focus Mode
Spaced repetition
Multiple Choice
Test your knowledge
Type Answer
Active recall
Learn Mode
Multi-round mastery
Match Game
Memory challenge