The MNIST Dataset
The MNIST dataset contains images of handwritten numbers and can be used to train handwriting recognition systems.
https://csvbase.com/mdfarragher/mnist-handwriting
10,000
Details
Optical Character Recognition (OCR) systems are machine learning models that are trained to recognize written text. These systems have many real-world applications, for example in scanning books, printed documents and receipts, processing bank checks and forms, reading car license plates and much more.
Processing handwriting is an expecially hard challenge to solve, because the letters and numbers are not always the same size and the writing style tends to differ from person to person. In this field, the MNIST dataset is famous. Since its release in 1999, this classic dataset of handwritten digits has served as the basis for benchmarking OCR systems.
The dataset was created in 1999 by mixing handwriting samples from American Census Bureau employees and American high school students. The black and white images of handwritten digits were normalized to fit into a 28x28 pixel bounding box and anti-aliased to introduce grayscale levels.
Data Schema
The dataset can be downloaded in CSV, Parquet, XLSX or JSONL format and has the following schema:
| Column name | Column type | Missing data? |
|---|---|---|
| Row ID | Integer | Not allowed |
| label | Integer | Allowed |
| 1x1 | Integer | Allowed |
| 1x2 | Integer | Allowed |
| 1x3 | Integer | Allowed |
| 1x4 | Integer | Allowed |
| 1x5 | Integer | Allowed |
| 1x6 | Integer | Allowed |
| ... | ... | ... |
| 1x27 | Integer | Allowed |
| 1x28 | Integer | Allowed |
| 2x1 | Integer | Allowed |
| 2x2 | Integer | Allowed |
| ... | ... | ... |
| 28x26 | Integer | Allowed |
| 28x27 | Integer | Allowed |
| 28x28 | Integer | Allowed |