Cities need to have good housing that is accessible in price to residents relative to their mean annual income. However, this is an ideal situation and is oftentimes the exception and not the rule. In most cases, safe housing with access to public utilities, roads and enough living space are affordable only for the wealthy and elite of the community.
In this study, I want to explore what factors affect the price of a house and how corellated are they to each other. Examples of questions I wish to explore:
- What factors are most corellated with the price of a house?
- Do extraneous variables like year or month of sale affect the price?
- What neighbourhoods have the most expensive houses
- Do houses with elevators, pools or other "luxury" items tend to have all other attributes of good housing like access to amenities etc. ?
- What is the distribution of houses in different price categories.
- Do all houses have basic emenities like heaters?
The data collected and cleaned in 2011, provides indicators of house condition such as access to public utilities, housing cost, condition of the house, number of bedrooms, lot size etc.
- I am concerned that the data of the housing across Ames might not be balanced and might include houses from more urban locations where data is easier to collect. This might skew the results of the data analaysis.
- Some considerations are what I should do with Nan values. Should I discard them? or should I make them into a dummy variable where Nan means 0?
- If certain factors have high corellation, they can be minimized to use only one of them.