RSS Feed for This PostCurrent Article

The perils of analyzing big real estate data

View the article’s original source
Author: Brian

Two leaders of Zillow recently wrote Zillow Talk: The New Rules of Real Estate which is a sort of Freakanomics look at all the real estate data they have. While it is an interesting book, it also illustrates the difficulties of analyzing big data:

1. The key to the book is all the data Zillow has harnessed to track real estate prices and make predictions on current and future prices. They don’t say much about their models. This could be for two good reasons: this is aimed at a mass market and the models are their trade secrets. Yet, I wanted to hear more about all the fascinating data – at least in an appendix?

2. Problems of aggregation: the data is analyzed usually at a metro area or national level. There are hints at smaller markets – a chapter on NYC for example and another looking at some unusual markets like Las Vegas – but there are not different chapters on cheaper/starter homes or luxury homes. An unanswered questino: is real estate within or across markets more similar? Put another way, are the features of the Chicago market so unique and patterned or are cheaper homes in the Chicago region more like similar homes in Atlanta or Los Angeles compared to more expensive homes across markets?

3. Most provocative argument: in Chapter 24, the authors suggest that pushing homeownership for lower-income Americans is a bad idea as it can often trap them in properties that don’t appreciate. This was a big problem in the 2000s: Presidents Clinton and Bush pushed homeownership but after housing values dropped in the late 2000s, poorer neighborhoods were hit hard, leaving many homeowners to default or seriously underwater. Unfortunately, unless demand picks up in these neighborhoods (and gentrification is pretty rare), these homes are not good investments.

4. The individual chapters often discuss small effects that may be significant but don’t have large substantive effects. For example, there is a section on male vs. female real estate agents. The effects for each gender are small: at most, a few percentage points difference in selling price as well as slight variations in speed of sale. (Women are better in both categories: higher prices, faster sales.)

5. The authors are pretty good at repeatedly pointing out that correlation does not mean causation. Yet, they don’t catch all of these moments and at other times present patterns in such a way that distort the axes. For example, here is a chart from page 202:


These two things may be correlated (as one goes up so does the other and vice versa) but why fix the axes so you are comparing half percentages to five percentage increments?

6. Continuing #4, I supposed a buyer and seller would want to use all the tricks they can but the tips here mean that those in the real estate market are supposed to string along all of these small effects to maximize what they get. On the final page, they write: “These are small actions that add up to a big difference.” Maybe. With margins of error on the effects, some buyers and sellers aren’t going to get the effects outlined here: some will benefit more but some will benefit less.

7. The moral of the whole story? Use data to your advantage even as it is not a guarantee:

In the new realm of real estate, everyone faces a rather stark choice. The operative question now is: Do you wield the power of data to your advantage? Or do you ignore the data, to your peril?

The same is true of the housing market writ large. Certainly, many macro-level dynamics are out of any one person’s control. And yet, we’re better equipped than ever before to choose wisely in the present – to make the kinds of measured judgments that can prevent another coast-to-coast bubble and calamitous burst. (p.252)

In the end, this book is aimed at the mass market where a buyer or seller could hope to string together a number of these small advantages. Yet, there are no guarantees and the effects are often small. Having more data may be good for markets and may make participants feel more knowledgeable (or perhaps more overwhelmed) but not everyone can take advantage of this information.

Trackback URL

Post a Comment