Dataset

The MNIST Dataset

The MNIST dataset contains images of handwritten numbers and can be used to train handwriting recognition systems.

Download link: https://csvbase.com/mdfarragher/mnist-handwriting

Number of rows: 10,000

Details

Optical Character Recognition (OCR) systems are machine learning models that are trained to recognize written text. These systems have many real-world applications, for example in scanning books, printed documents and receipts, processing bank checks and forms, reading car license plates and much more.

Processing handwriting is an expecially hard challenge to solve, because the letters and numbers are not always the same size and the writing style tends to differ from person to person. In this field, the MNIST dataset is famous. Since its release in 1999, this classic dataset of handwritten digits has served as the basis for benchmarking OCR systems.

The dataset was created in 1999 by mixing handwriting samples from American Census Bureau employees and American high school students. The black and white images of handwritten digits were normalized to fit into a 28x28 pixel bounding box and anti-aliased to introduce grayscale levels.

Data Schema

The dataset can be downloaded in CSV, Parquet, XLSX or JSONL format and has the following schema:

Column name	Column type	Missing data?
Row ID	Integer	Not allowed
label	Integer	Allowed
1x1	Integer	Allowed
1x2	Integer	Allowed
1x3	Integer	Allowed
1x4	Integer	Allowed
1x5	Integer	Allowed
1x6	Integer	Allowed
...	...	...
1x27	Integer	Allowed
1x28	Integer	Allowed
2x1	Integer	Allowed
2x2	Integer	Allowed
...	...	...
28x26	Integer	Allowed
28x27	Integer	Allowed
28x28	Integer	Allowed

Download Dataset

Labs Exploring MNIST

Supervised Machine Learning with C# and ML.NET

Lab

Recognize Handwriting

In course: Supervised Machine Learning with C# and ML.NET