Dataset

The MNIST Dataset

The MNIST dataset contains images of handwritten numbers and can be used to train handwriting recognition systems.


Number of rows:
10,000

Details

Optical Character Recognition (OCR) systems are machine learning models that are trained to recognize written text. These systems have many real-world applications, for example in scanning books, printed documents and receipts, processing bank checks and forms, reading car license plates and much more.

Processing handwriting is an expecially hard challenge to solve, because the letters and numbers are not always the same size and the writing style tends to differ from person to person. In this field, the MNIST dataset is famous. Since its release in 1999, this classic dataset of handwritten digits has served as the basis for benchmarking OCR systems.

The dataset was created in 1999 by mixing handwriting samples from American Census Bureau employees and American high school students. The black and white images of handwritten digits were normalized to fit into a 28x28 pixel bounding box and anti-aliased to introduce grayscale levels.

Data Schema

The dataset can be downloaded in CSV, Parquet, XLSX or JSONL format and has the following schema:

Column nameColumn typeMissing data?
Row IDIntegerNot allowed
labelIntegerAllowed
1x1IntegerAllowed
1x2IntegerAllowed
1x3IntegerAllowed
1x4IntegerAllowed
1x5IntegerAllowed
1x6IntegerAllowed
.........
1x27IntegerAllowed
1x28IntegerAllowed
2x1IntegerAllowed
2x2IntegerAllowed
.........
28x26IntegerAllowed
28x27IntegerAllowed
28x28IntegerAllowed

Labs Exploring The MNIST Dataset