Simultaneous Spatial and Functional Form
Transformations
By
R. Kelley Pace
Louisiana Real Estate Commission Chair
of Real Estate
E.J. Ourso College of Business
Administration
Louisiana State University
Baton Rouge, LA 70803
(225)-388-6256
FAX: (225)-334-1227
Ronald Barry
Associate Professor of Statistics
Department of Mathematical Sciences
University of Alaska
Fairbanks, Alaska 99775-6660
(907)-474-7226
FAX: (907)-474-5394
FFRPB@aurora.alaska.edu
V. Carlos Slawson, Jr.
Assistant Professor of Real Estate
E.J. Ourso College of Business
Administration
Louisiana State University
Baton Rouge, LA 70803
(225)-388-6238
FAX: (225)-388-6366
clawson@lsu.edu
C.F. Sirmans
Professor of Real Estate and Urban
Economics
Director, Center for Real Estate and
Urban Studies
368 Fairfield Rd, Rm 426, U-41RE
Storrs, CT 06269-2041
(860) 486-3227
FAX: (860) 486-0349
cf@sba.uconn.edu
Forthcoming, Advances in Spatial
Econometrics, edited by Raymond Florax and Luc Anselin.
Abstract:
Parsimonious regression models using
locational data often yield non-normal, heteroskedastic, and spatially dependent
residuals. This manuscript develops a model which simultaneously performs spatial and
functional form transformations to mitigate these problems. We apply the model to 11,006
observations on housing prices in Baton Rouge. For these data, the model reduces the
interquartile range of the errors in the untransformed variables space by 38.38%
relative to a simple model using the untransformed variable. In addition, the pattern of
residuals improves dramatically and the generalized additive model exhibits interesting
transformations. Finally, the manuscript documents the computational innovations which
make it possible to perform the maximization of the likelihood in under 10 seconds.
Keywords: Splines, Spatial
Statistics, Non-parametric regression, Jacobians, Generalized Additive Models, GIS,
Spatial Autoregression, House prices.
Acknowledgements:
We would like to thank Paul Eilers
and Brian Marx for their comments, as well as the LSU Statistics Department Seminar
participants. In addition, Pace and Barry would like to thank the University of Alaska for
its generous research support. Pace and Sirmans would like to thank the Center for Real
Estate and Urban Studies, University of Connecticut for their support. Pace and Slawson
would like to thank Louisiana State University and the Greater Baton Rouge Association of
Realtors for their support.
Simultaneous Spatial and Functional Form
Transformations
I. Introduction
Technological advances such as the global positioning system (GPS) and low-cost, high-quality geographic information systems (GIS) have led to an explosion in the volume of large data sets with locational coordinates for each observation. For example, the Census provides large amounts of data for over 250,000 locations in the US (block groups). Moreover, geographic information systems can often provide approximate locational coordinates for street addresses (geocoding). Given the volume of business information which contains a street address field, this allows the creation of extremely large spatial data sets. Such data, as well as other types of spatial data, often exhibit spatial dependence and thus require spatial statistical methods for efficient estimation, valid inference, and optimal prediction.
Several barriers exist to performing spatial statistics with large data sets. Spatial statistical methods require the computation of determinants or inverses of n by n matrices. Allowing for space does not necessarily cure all of the problems encountered in typical data. For example, simple models fitted to housing and other economic data often exhibit heteroskedasticity, visible problems of misspecification for extreme observations, and non-normality (e.g., Goodman and Thibodeau (1995), Subramanian and Carson (1988), Belsley, Kuh and Welsch (1980)). Simultaneously attacking these problems along with spatial dependence for large data sets presents a challenge.
Functional form transformations provide one technique which can simultaneously ameliorate all of these problems. For example, better specification of the functional form could reduce spatial autocorrelation of errors given spatial clustering of similar observations. While not guaranteed, functional form transformations often simultaneously reduce heteroskedasticity and residual non-normality. Because of the potential interaction between the spatial transformation and the functional form transformation, it seems desirable to fit these simultaneously.
Accordingly, we wish to examine the following transformation of the dependent variable,
![]()
where D
represents a n by n spatial weight matrix, a represents the
spatial autoregressive parameter, and
represents the dependent variable transformation
parameterized by a vector of o parameters,
. Least squares would not work for this problem as it would
reduce the sum-of-squared errors by reducing the range of the transformed variable. As an
extreme case OLS could choose
to make
almost constant for a sufficiently flexible form
and a regression with an intercept term would yield almost no error. Hence, this problem
requires maximum likelihood with a Jacobian for the spatial transformation and a Jacobian
for the functional form transformation.
The above form of the problem involves transformation of the
functional form of the dependent variable first and the spatial transformation second.
This seems a more natural formulation than transformation of the functional form of
since the
functional form of the dependent variable often has an interesting subject matter
interpretation. However, spatial transformation first followed by functional form
transformation is feasible and may offer some advantages.
The Box-Cox transformation is the most frequently used in regression. Recently, Griffith, Paelinck, and van Gastel (1998) discussed the importance of transformations for spatial data and examined bivariate Box-Cox/Box-Tidwell transformations of the dependent and independent variable in a spatial autoregression. The use of a parameter for the dependent variable as well as a parameter for the independent variable provided substantial flexibility in the potential transformation. Note, the Box-Cox/Box-Tidwell approach has an additional overhead in spatial problems as one must compute the spatially lagged value of the new transformed variables at each iteration.
We take a different route in modeling the functional form of the variables in a spatial autoregression. Specifically, we use B-splines (De Boor (1978), Ramsay (1988)) which are piecewise polynomials with conditions enforced among the pieces. The knots specify where each local polynomial begins and ends and the degree specifies the amount of smoothness among the pieces. A spline of degree 0 has no smoothness, a spline of degree 1 is piecewise linear, a spline of degree 2 is piecewise quadratic, and so forth.
Relative to the common Box-Cox transformation, the B-spline transformations do not require strictly positive untransformed variables and can assume more complicated shapes (Box and Cox (1964)). The standard one-parameter Box-Cox transformation either has a concave or convex shape. The B-spline transformation can yield convex shapes over part of the domain and concave shapes over the rest of the domain. Moreover, B-splines can yield more severe transformations of the dependent variable than the Box-Cox transformation. Burbidge (1988) discusses the deficiencies of the Box-Cox transformation and the need for more severe transformations of the extreme values of the untransformed dependent variable. These beneficial features do have a price. Relative to transformations such as the Box-Cox, splines may require substantially more degrees-of-freedom. This could create problems for small data sets or those with low amounts of signal-to-noise (i.e., low R2).
Computationally, there are three components to the log-likelihood for this problem. These include (1) a spatial Jacobian; (2) a functional form Jacobian; and (3) the log of the sum-of-squared errors term.
To address the spatial Jacobian part of the log-likelihood, we
use the techniques proposed by Pace and Barry (1997a,b,c) to
quickly compute the Jacobian of the spatial transformation (
).
This involves the computation of
across a grid of values of
.
With sparse D, special techniques exist which
make this computational tractable.
To address the functional form Jacobian part of the likelihood, we employ two additional techniques to greatly accelerate computational speed. First, we use an intermediate transformation of the dependent variable. Intermediate transformations are often used in nonparametric regression (regression with very flexible functional forms). By adopting a transformation which partially models the nonlinearity, it requires less flexibility (fewer degrees-of-freedom) to model the remaining nonlinearity. The goal of our particular intermediate transformation is to make the dependent variables histogram approximately symmetric.
Second, given an approximately symmetric dependent variable, we can employ evenly spaced knots. Equally spaced knots result in more observations between the central knots and fewer observations between the extreme knots. This makes the spline transformation more flexible in the tails and less flexible in the center, a desirable result. Such evenly spaced knots have often been used with B-splines (Hastie and Tibshirani (1990, p. 24)). Evenly spaced knots lead to a very simple functional form Jacobian (Eilers and Marx (1996), Shikin and Plis (1995, p. 44)) suitable for rapid computation.
To address the log of sum-of-squared errors portion of the log-likelihood, we use the linearity of the B-spline and spatial transformations to write the overall sum-of-squared errors as a series of the sum-of-squared errors from regressions on the individual parts of the transformation. This allows us to recombine the sum-of-squared errors from a set of regressions rather than recompute the sum-of-squared errors fresh each iteration.
Cumulatively, these computational techniques accelerate the log-likelihood computations so that each iteration takes little time. Each estimate requires around 1,000 iterations. Yet, these could be computed in less than 10 seconds on a 200 megahertz Pentium Pro computer, even though the data set had 11,006 observations.
We apply this to a housing data set from Baton Rouge, Louisiana. The Real Estate Research Institute at Louisiana State University estimates regressions periodically to form an index of real estate prices over time. Since each house does not sell each quarter, the regression controls for the differences in sample composition over time by using a variety of independent variables such as age, living area, other area, number of bathrooms, number of bedrooms, and date of sale. In addition, variants of these data have been used to examine prediction accuracy of regression models (e.g., Knight, Sirmans and Turnbull (1994)).
In real estate, predictions of the price of unsold homes have been extensively used for tax assessments. In fact, the majority of the districts in the country (and many foreign countries) use some form of statistical analysis to predict the prices of unsold homes (Eckert (1990)). In addition, the secondary mortgage markets have begun exploring the use of statistical appraisal for determining the value of collateral for loans (Gelfand et al. (1998), Eckert and O'Connor (1992)). Note, both of these applications give rise to very large spatial data sets.
To handle these needs, we estimated a general model which includes the previously discussed transformations of the dependent variable, transformations of the independent variables, spatially lagged independent variables, time indicator, and miscellaneous variables. As an illustration of the efficacy of the proposed techniques, the general model reduced the interquartile range of the residuals by 38.38% relative to a simple model using the untransformed dependent variable. Moreover, the resulting dependent variable transformation greatly improved the pattern of the residuals.
Most estimates of the Box-Cox parameters yield a model somewhere between a linear and logarithmic transformation. The estimated dependent variable transformation also fell between a linear and a logarithmic transformation it was close to a linear transformation for low-priced properties but approached a logarithmic transformation for the high-priced properties. In fact, it actually provided more damping than the logarithmic transformation for extremely high-priced properties. Finally, the estimated functional forms of the independent variables seemed plausible and of interest.
Section II develops the joint spatial and dependent variable transformation estimator while section III applies the estimator to the Baton Rouge data. Section IV concludes the paper.
II. Simultaneous Spatial and Variable Transformations
This overall section presents the estimator and the various techniques facilitating computation. Section A sets up the log-likelihood, section B discusses the application of splines to the problem, section C shows how to simplify the SSE, section D provides a computational simplification of the spatial Jacobian, section E gives a simple way of computing the functional form Jacobian, and section F extends the model to transformations of the independent variables.
A. A Transformed Dependent Variable with Spatial Autoregression
Suppose the transformed variable follows a spatial autoregressive process,
where
denotes the transformed dependent variable n element vector which depends upon the o element vector of parameters q. In
addition, X denotes an n by p
matrix of the independent variables, D denotes
an n by n
spatial weight matrix,
represents the autoregressive parameter (
), b denotes the
p element vector of regression parameters, u denotes the spatially autocorrelated error term,
while e denotes a
normal iid error term.
The spatial weight matrix D
has some special structure. First, it has zeros on the main diagonal which prevents an
observation from predicting itself. Second, it is a non-negative matrix and positive
entries in the jth column of the ith row means observation j directly affects observation i. We do not assume symmetry and so the converse
does not necessarily hold. Third, we assume each observation is only directly affected by
its m closest neighbors. This makes D very sparse (high proportion of zeros) which
greatly aids computational performance. Fourth, D
is row-stochastic and so each row sums to 1. This gives D a smoothing or linear filter interpretation (Davidson and MacKinnon (1993)).
Intuitively,
provides a construct similar to a lag in time
series for
.
To estimate (1), we rewrite it as,
For a known
and q, one
could proceed to apply OLS to (2). Unfortunately, estimating
and q via OLS
results in biased estimates.
To motivate the defect in using OLS to estimate the parameters in
this situation, consider momentarily the very simple model
where
represents a scalar parameter. If we employ OLS to
estimate both
and
, the
estimated value of the parameter
would equal 0 for any value of
.
This would turn the dependent variable vector
into a vector of zeros that a model with an
intercept would fit perfectly.
To prevent this form of extreme behavior, one must employ maximum
likelihood which explicitly penalizes such pathological transformations using the Jacobian
of the transformation. The Jacobian of the transformation measures the n-dimensional volume change caused by stretching or
compressing any or all of the potential n
dimensions. By premultiplying Y via the matrix
, we are performing a linear transformation. In this case we
are compressing or stretching each of the n
dimensions of Y by a factor
.
Relative to a unit value for
, values of
correspond to
more singular transformations. The Jacobian of the transformation is the determinant of
the matrix of derivatives which in this instance is
(
).[1] To make the
example even simpler, we are dealing with a cube when n is 3. If we multiply each dimension of the cube
by a factor of 2, we increase the volume of the cube by a factor of 8 (
). The need for the Jacobian is not specific to the normal
maximum likelihood, but arises whenever making transformations with proper, continuous
densities (Davidson and MacKinnon (1993, p. 489), Freund and Walpole (1980, p. 230-252)).
Assuming normality, the profile log-likelihood for this example
equals a constant plus the log of the Jacobian less
. Taking as a
reference point the sum-of-squared error when
(
), then
. As an example, multiplying Y by a constant of
would multiply SSE by a constant of
.
Hence, the profile log-likelihood becomes
. A simple
expansion shows that the likelihood will be the same for any choice of
. Hence, the maximum likelihood choice for
does not depend
upon
. Thus, one cannot affect the maximum likelihood estimate by
a simple scaling of the dependent variable, a highly desirable result.[2]
In this simple case, the role of the Jacobian in maximum
likelihood is clear. The Jacobian continues to play a similar role in more complicated
transformations such as those arising from spatial transformations or from functional form
transformations. Successive transformations result in Jacobians multiplied by each other
in the multivariate density. Hence, for simultaneous transformations the log of the
Jacobian of
would equal the
sum of the logs of the individual Jacobians (e.g.,
where J denotes the relevant Jacobian).
Hence, the profile log-likelihood for estimation using a spatial and a functional form transformation equals,
where
and
represent the Jacobians of the spatial and
dependent variable transformations and
represents an arbitrary constant.
Attacking the maximization of the above log-likelihood in the most straightforward way would likely result in very long execution times. We show methods for greatly accelerating the computation of each of these terms. We detail these computational accelerations in the sections below.
B. Linear Expansions of Non-Linear Functions
If one computed
and subsequently
for every iteration of the maximization of the
log-likelihood, this could greatly reduce the speed of the algorithm as
is an n by n
(albeit sparse) matrix. Hence, we first seek ways of avoiding this step. Fortunately, if
we can expand
linearly, we can avoid this set of computations. A
number of ways of linearly expanding a function exist. We could use indicator variables,
polynomials, or splines. For our computations we chose B-splines (De Boor (1978, 1999)).
In fact, splines generalize both indicator variables and polynomials. Indicator variables provide a locally constant fit to a function for their non-zero portions. B-splines of degree 0 yield indicator variables. The advantage of indicator variables or B-splines of degree 0 is their local fit. Their disadvantage is that locally constant approximations are not necessarily continuous or smooth.
Polynomials, however, exhibit both continuity and smoothness. Polynomials attempt to approximate a function globally and gyrations of the function over parts of the domain can cause the polynomial to poorly fit other parts of the domain. A polynomial equates to a high degree B-spline with few knots.
Specifically, B-splines are piecewise polynomials with conditions enforced among the pieces. The knots specify where each local polynomial begins and ends and the degree specifies the amount of smoothness among the pieces. A spline of degree 0 has no smoothness, a spline of degree 1 is piecewise linear, a spline of degree 2 is piecewise quadratic, and so forth.
To provide some physical intuition, a spline was a thin strip of wood used in constructing ships. The spline attached to two points separated by less than its length would cause the spline to produce a curve. By introducing supports (ribs of the ship), the curve could be modified into many shapes. Hence, the spline knots act similar to the ships ribs. Moreover, the flexibility of the strip of wood would determine the smoothness (affect the degree of the spline). The piecewise linear splines used here correspond to laying a string across the ribs of the ship.
Also, one can restrict B-splines to yield strictly monotonic transformations. One must have monotonic transformations of dependent variables for prediction in the original dependent variable space (Ramsay (1988)). Finally, B-splines can interpolate a given set of values (assuming satisfaction of the Schoenburg-Whitney conditions (De Boors (1999, p. 1.10)). The Schoenburg-Whitney conditions essentially require that each of the B-spline basis vectors have at least one non-zero value. Hence, given a set of values, some weighting of the associated B-spline basis vectors could return the same set of values.
To explain splines in detail is beyond the scope of this article. However, a specific example greatly aids in understanding some of their features. In example 1 we consider four values for the dependent variable Y of 1, 1.5, 2.25, and 3.0. Given knots of 1, 2, and 3 (with 1 and 3 being repeated), we used the SPCOL function in the Matlab Spline Toolbox 2.01 to produce the following matrix B(Y) comprised of three basis vectors. The exact values of the basis vectors depend upon Y and hence we emphasize this by writing B(Y).
Example 1 |
|||
Y |
B(Y) |
||
1.00 |
1.00 |
0.00 |
0.00 |
1.50 |
0.50 |
0.50 |
0.00 |
2.25 |
0.00 |
0.75 |
0.25 |
3.00 |
0.00 |
0.00 |
1.00 |
In example 1, B(Y) illustrates a couple of B-spline features. First, B(Y), the
collection of basis vectors, contains only non-negative numbers. Second, each row sums to
one. Third, the basis vectors have zero elements for elements of Y sufficiently far away from the knots. If we
compute
for
, we find it yields Y
exactly. For other
such that
, the plots of
versus Y show a monotonically increasing piecewise linear
relations. Figures 1a, 1b, 1c, and 1d show four such plots. In every case, the selected
satisfied the
monotonicity constraints. Figure 1a shows how the function
exactly replicated the original Y (interpolated). Figure 1b shows a slightly
concave transformation while 1c shows a more severe concave transformation. Figure 1d
shows a convex transformation. With more points, one could generate combinations of convex
and concave transformations (over different domains).
Assuming satisfaction of the Schoenburg-Whitney conditions, with B-splines our transformed dependent variable becomes,
where
represents the n by o
matrix containing the basis vectors and
represents the o by 1 parameter vector. The number of basis
vectors, o, depends upon the number of knots and
the degree of smoothness required. As o rises,
the transformed dependent variable
can assume progressively more flexible forms.
Substituting into yields .
Hence, one can linearly expand the joint spatial and dependent variable into the product of a n by 2o matrix and a 2o by 1 parameter vector.
C. SSE Simplifications
Let M represent the
idempotent least squares matrix
. We can write
the residuals from the regression of
on X as ,
where the n by 2o matrix E
contains all the residuals from the individual regressions and the vector r
represents the 2o element parameter vector. The
linearity of the problem means the least squares residuals e on the overall transformed variable
are simply a
linear combination of the least squares residuals from regressing each basis vector in
and their spatial lags
on X.
Hence, forming parameterized sum-of-squared errors yields .
Note, the 2o by 2o error cross-product matrix
is only computed once. Subsequent iterations of
involve only
order of
operations, a
very small number which does not depend upon n,
the number of observations or k, the number of
regressors. Moreover, o is usually much less
than k and strictly less than n. This reduction in the dimensionality of the
sum-of-squared errors leads to an low dimensional profile likelihood (Meeker and Escobar (1995)).
D. Spatial Jacobian Simplifications
Historically, the spatial Jacobian,
,
constituted the main barrier to fast computation of spatial estimators (e.g., Li (1995)). However, the
use of a limited number of spatial neighbors lead to sparse matrices. Pace and Barry (1997b, c)
show how various permutations of the rows and columns of such sparse matrices
can vastly
accelerate the computation of
. Although
computation of
is inexpensive for a particular value of
, one can further accelerate the computations by computing
for a large
numbers of values of
(e.g.,
100) and interpolating intermediate values. Insofar as
has a limited range (for stochastic D) and the function
is quite smooth, the interpolation exhibits very
low error.
Moreover, these computations are performed only when changing the weight matrix D. Hence, one can reuse the grid of values (and interpolated points) when fitting different models involving Y and X for a given D.
Pace and Barry have released a public domain Matlab-based package, Spatial Toolbox 1.0 available at www.finance.lsu.edu/re which implements these spatial Jacobian simplifications and contains copies of the articles which describe the implementation details.
E. Functional Form Jacobian Simplifications
The functional form log-Jacobian has a particularly simple form for piecewise linear splines with evenly spaced knots,