Binary Classification
Binary classification is a core machine learning task that sorts data into one of two distinct categories, answering yes/no questions like whether an email is spam, a patient has a disease, or a transaction is fraudulent. The model learns from labeled examples, then outputs a probability between 0 and 1, using a decision threshold to assign the final class.
More information: https://en.wikipedia.org/wiki/Binary_classification
Details
Binary classification is a fundamental machine learning task where the goal is to sort data into one of two distinct categories. Unlike regression, which predicts continuous numbers, binary classification answers yes/no questions: Is this email spam or not spam? Does this patient have a disease or not? Is this transaction fraudulent or legitimate? These two categories are often called the "positive" and "negative" classes.
The process works by training a model on labeled examples—data points where we already know the correct answer. The model learns patterns that distinguish between the two classes, then applies these patterns to new, unseen data to make predictions. A key concept is the decision threshold: the model typically outputs a probability between 0 and 1, and we choose a cutoff (commonly 0.5) to decide which class to assign. For instance, if a medical test model outputs 0.8 probability of disease, we'd likely classify it as positive.
Binary classification powers countless real-world applications across healthcare, finance, and technology. Common algorithms include logistic regression (despite its name, it's for classification), decision trees, and support vector machines. Evaluating these models involves special metrics like accuracy, precision, and recall, helping us understand not just how often the model is right, but what types of mistakes it tends to make.
Modules on Binary Classification