| By: Andreas M. Olligschlaeger, Ph.D.
Extrapolating forecasts too far into the future is, IMHO not always
a good idea, especially when it comes to crime data. Crime forecasting
models, both univariate and multivariate, are really only good for
one-step-ahead forecasts. In other words, if your time frame is
monthly, then your best forecast is only going to be good for the
next month. Extrapolating further across the years can be quite
risky. Crime statistics are subject to many short term changes (for
a variety of reasons).
To take a simple example, let's (just for argument's sake) assume
that in 1994 there were 10 million juveniles. In 1994 these juveniles
committed 10,000 homicides. Now, let's say that in 2000 there were
15 million juveniles, and these juveniles in 2000 committed 9,000
homicides, i.e. the homicide coefficient for juveniles was much
smaller in 2000 than in 1994. The key here is that while it is reasonable
to project demographic trends six years into the future, it is entirely
unreasonable to project coefficients six years into the future,i.e.,
assume that they remain the same. Why? Because demographic trends
tends to be stable over relatively long periods of time. If you
know the death rate, and you know the birth rate, and you know the
proportion of the population within a certain age group you can
quite safely project 10 or even 20 years into the future. However,
crime data tend to be much less stable over time, as do the relationships
between crime and demographic data.
Let's say we had 100 years of demographic and crime related data.
If we run a multiple regression on this data to forecast, say, homicides,
we would have a pretty good chance of fairly accurately forecasting
the 101st year's homicide numbers. What we are doing, though, is
assuming that the regression coefficients are stable across time.
In my experience in most cases, they are not. If instead of using
all data at once, we used a moving window of, say 20 years, and
ran a regression on each of those windows (100 - 20 = 80< windows
= 80 regressions) then in all likelihood we would see a change in
the coefficients over time. The implication is that forecasting
the 101st year's homicide number based on the last regression window
(i.e., years 81-100) would likely yield more accurate results than
using coefficients arrived at by regressing all of the data because
the last regression window contains the most recent trends. So,
the more recent your data, the more accurate your forecast will
be.
Now, let me throw a monkey wrench into everything: coefficients
do not just change over time, but they also vary across space! The
previous example used national data to arrive at aggregate predictions
for the entire US. Now let's take the regression coefficients and
estimate homicide numbers for each state. The likely result would
be a large variation in forecast accuracy, even if the aggregate
forecast is quite good. I have, in fact, observed this phenomenon
not just for state-to-state forecasts, but also within smaller areas,
such as counties or even beat sectors. So, for really accurate local
forecasts of crime you need to use space/time forecasting models
with temporally AND spatially varying parameters.
The bottom line is this: if you try to predict more than one time
period into the future, you're in trouble. Worse, if you try to
predict not just too far into the future, but use aggregate findings
to predict at the local level, then you're in big trouble. The latter
is known as the "Ecological Fallacy", a mistake frequently
made by not just criminologists, but also by economists and other
social scientists.
The policy implications are obvious: using national figures, or
studies conducted at the national level to formulate long term local
public policy is almost never a good idea. In fact, it can make
things worse rather than better. It is essential to factor in local
context, as well as short and long term changes in the relationship
between crime and other variables.
|
|
|