Boston Data Set Analysis

Hi today we will see another example for data exploration , analysis and visualization. Boston data set is the collection of house prices, age , crime rate, average rooms per dwelling , tax etc. We also have a ‘target’ which is the price. It consists of 506 rows and 14 columns.

Now what we will do is : load the data , convert it into a data frame , we’ll do some data exploration, visualize the data, perform data wrangling i.e. cleaning the data (of null values, empty data ) and making it useful.

Lets get started!

Import the basic libraries

pandas is a python package for providing real world data analysis. Pandas library is built on top of Numpy, meaning Pandas needs Numpy to operate. Pandas provide an easy way to create, manipulate and wrangle the data. Pandas is also an elegant solution for time series data.

pandas used for converting data in form of rows and columns. pandas provides fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive.

df.head() returns the five rows (by default) of the dataframe

df.tail() returns the last 5 rows of the dataframe. Hence we can observe that the last row is 505 and the corresponding columns.

df.describe() gives the statistical information of the data set like mean , standard deviation, minimum and maximum value. Various quartile values that lie in 25%, 50%, and 75%

we check the null values in the data set, and find that there are no null values.

we check the data type of the attributes i.e the columns and find all values are in float.

Data Visualization

histogram gives the frequency count. Here in the price range of 18-23 maximum houses are bought. We can get a vague idea about the pricing of the houses.

we observe from the joint plot that the price range and DIS (weighted distance from 5 Boston employment centers) is highest in the range 15-23 on Y axis of prices and for DIS range is highest from 1-3 .

We can conclude that people want houses near their employment centers the possible reasons might be reduce commutation expense, save time, spend more time with family and others.

we observe that higher the pollutants lower is the price, where less pollutants are concentrated people tend to buy more houses and prices also tend to increase. we see a negative correlation between NOX and Prices.

we observe that the prices of the house drops where Crime rate is higher and increases where crime is low. People want a secure environment to be able to live life peacefully and comfortably. Crime also effects the pricing of the houses.

This is how we explore and visualize the data. You can try it with more features and combinations.

Thanks for reading! Happy Learning.

Leave a Reply