Dataset

The California Housing Dataset

This real-world housing dataset from Google contains census data from housing blocks across the state of California.

Download link: https://csvbase.com/mdfarragher/California-Housing

Number of rows: 17,000

Details

In machine learning circles, the California Housing dataset is a bit of a classic. It’s the dataset used in the second chapter of Aurélien Géron’s excellent machine learning book Hands-On Machine learning with Scikit-Learn and TensorFlow.

The dataset serves as an excellent introduction to building machine learning apps because it requires rudimentary data cleaning, has an easily understandable list of variables and has the perfect size for fast training and experimentation. it was compiled by Pace, R. Kelley and Ronald Barry for their 1997 paper titled Sparse Spatial Autoregressions. They built it using the 1990 California census data.

The dataset contains one record per census block group, with a census block group being the smallest geographical unit for which the U.S. Census Bureau publishes sample data. A census block group typically has a population of around 600 to 3,000 people.

Data Schema

The dataset can be downloaded in CSV, Parquet, XLSX or JSONL format and has the following schema:

Column name	Column type	Missing data?
Row ID	Integer	Not allowed
longitude	Float	Allowed
latitude	Float	Allowed
housing_median_age	Integer	Allowed
total_rooms	Integer	Allowed
total_bedrooms	Integer	Allowed
population	Integer	Allowed
households	Integer	Allowed
median_income	Float	Allowed
median_house_value	Integer	Allowed

Download Dataset

Labs Exploring California Housing

Supervised Machine Learning with C# and ML.NET

Lab

Predict House Prices In California

In course: Supervised Machine Learning with C# and ML.NET

Lab

Process The California Housing Dataset

In course: Supervised Machine Learning with C# and ML.NET