Simultaneous Spatial and Functional Form Transformations

By

 

R. Kelley Pace

Louisiana Real Estate Commission Chair of Real Estate

E.J. Ourso College of Business Administration

Louisiana State University

Baton Rouge, LA 70803

(225)-388-6256

FAX: (225)-334-1227

kelley@pace.am

www.spatial-statistics.com

 

Ronald Barry

Associate Professor of Statistics

Department of Mathematical Sciences

University of Alaska

Fairbanks, Alaska 99775-6660

(907)-474-7226

FAX: (907)-474-5394

FFRPB@aurora.alaska.edu

 

V. Carlos Slawson, Jr.

Assistant Professor of Real Estate

E.J. Ourso College of Business Administration

Louisiana State University

Baton Rouge, LA 70803

(225)-388-6238

FAX: (225)-388-6366

clawson@lsu.edu

 

C.F. Sirmans

Professor of Real Estate and Urban Economics

Director, Center for Real Estate and Urban Studies

368 Fairfield Rd, Rm 426, U-41RE

Storrs, CT 06269-2041

(860) 486-3227

FAX: (860) 486-0349

cf@sba.uconn.edu

 

 

Forthcoming, Advances in Spatial Econometrics, edited by Raymond Florax and Luc Anselin.


 

Abstract:

Parsimonious regression models using locational data often yield non-normal, heteroskedastic, and spatially dependent residuals. This manuscript develops a model which simultaneously performs spatial and functional form transformations to mitigate these problems. We apply the model to 11,006 observations on housing prices in Baton Rouge. For these data, the model reduces the interquartile range of the errors in the untransformed variable’s space by 38.38% relative to a simple model using the untransformed variable. In addition, the pattern of residuals improves dramatically and the generalized additive model exhibits interesting transformations. Finally, the manuscript documents the computational innovations which make it possible to perform the maximization of the likelihood in under 10 seconds.

 

Keywords: Splines, Spatial Statistics, Non-parametric regression, Jacobians, Generalized Additive Models, GIS, Spatial Autoregression, House prices.

Acknowledgements:

We would like to thank Paul Eilers and Brian Marx for their comments, as well as the LSU Statistics Department Seminar participants. In addition, Pace and Barry would like to thank the University of Alaska for its generous research support. Pace and Sirmans would like to thank the Center for Real Estate and Urban Studies, University of Connecticut for their support. Pace and Slawson would like to thank Louisiana State University and the Greater Baton Rouge Association of Realtors for their support.


Simultaneous Spatial and Functional Form Transformations

I. Introduction

Technological advances such as the global positioning system (GPS) and low-cost, high-quality geographic information systems (GIS) have led to an explosion in the volume of large data sets with locational coordinates for each observation. For example, the Census provides large amounts of data for over 250,000 locations in the US (block groups). Moreover, geographic information systems can often provide approximate locational coordinates for street addresses (geocoding). Given the volume of business information which contains a street address field, this allows the creation of extremely large spatial data sets. Such data, as well as other types of spatial data, often exhibit spatial dependence and thus require spatial statistical methods for efficient estimation, valid inference, and optimal prediction.

Several barriers exist to performing spatial statistics with large data sets. Spatial statistical methods require the computation of determinants or inverses of n by n matrices. Allowing for space does not necessarily cure all of the problems encountered in typical data. For example, simple models fitted to housing and other economic data often exhibit heteroskedasticity, visible problems of misspecification for extreme observations, and non-normality (e.g., Goodman and Thibodeau (1995), Subramanian and Carson (1988), Belsley, Kuh and Welsch (1980)). Simultaneously attacking these problems along with spatial dependence for large data sets presents a challenge.

Functional form transformations provide one technique which can simultaneously ameliorate all of these problems. For example, better specification of the functional form could reduce spatial autocorrelation of errors given spatial clustering of similar observations. While not guaranteed, functional form transformations often simultaneously reduce heteroskedasticity and residual non-normality. Because of the potential interaction between the spatial transformation and the functional form transformation, it seems desirable to fit these simultaneously.

Accordingly, we wish to examine the following transformation of the dependent variable,

                                                                  

where D represents a n by n spatial weight matrix, a represents the spatial autoregressive parameter, and  represents the dependent variable transformation parameterized by a vector of o parameters, . Least squares would not work for this problem as it would reduce the sum-of-squared errors by reducing the range of the transformed variable. As an extreme case OLS could choose  to make  almost constant for a sufficiently flexible form and a regression with an intercept term would yield almost no error. Hence, this problem requires maximum likelihood with a Jacobian for the spatial transformation and a Jacobian for the functional form transformation.

The above form of the problem involves transformation of the functional form of the dependent variable first and the spatial transformation second. This seems a more natural formulation than transformation of the functional form of  since the functional form of the dependent variable often has an interesting subject matter interpretation. However, spatial transformation first followed by functional form transformation is feasible and may offer some advantages.

The Box-Cox transformation is the most frequently used in regression. Recently, Griffith, Paelinck, and van Gastel (1998) discussed the importance of transformations for spatial data and examined bivariate Box-Cox/Box-Tidwell transformations of the dependent and independent variable in a spatial autoregression. The use of a parameter for the dependent variable as well as a parameter for the independent variable provided substantial flexibility in the potential transformation. Note, the Box-Cox/Box-Tidwell approach has an additional overhead in spatial problems as one must compute the spatially lagged value of the new transformed variables at each iteration.

We take a different route in modeling the functional form of the variables in a spatial autoregression. Specifically, we use B-splines (De Boor (1978), Ramsay (1988)) which are piecewise polynomials with conditions enforced among the pieces. The knots specify where each local polynomial begins and ends and the degree specifies the amount of smoothness among the pieces. A spline of degree 0 has no smoothness, a spline of degree 1 is piecewise linear, a spline of degree 2 is piecewise quadratic, and so forth.

Relative to the common Box-Cox transformation, the B-spline transformations do not require strictly positive untransformed variables and can assume more complicated shapes (Box and Cox (1964)). The standard one-parameter Box-Cox transformation either has a concave or convex shape. The B-spline transformation can yield convex shapes over part of the domain and concave shapes over the rest of the domain. Moreover, B-splines can yield more severe transformations of the dependent variable than the Box-Cox transformation. Burbidge (1988) discusses the deficiencies of the Box-Cox transformation and the need for more severe transformations of the extreme values of the untransformed dependent variable. These beneficial features do have a price. Relative to transformations such as the Box-Cox, splines may require substantially more degrees-of-freedom. This could create problems for small data sets or those with low amounts of signal-to-noise (i.e., low R2).

Computationally, there are three components to the log-likelihood for this problem. These include (1) a spatial Jacobian; (2) a functional form Jacobian; and (3) the log of the sum-of-squared errors term.

To address the spatial Jacobian part of the log-likelihood, we use the techniques proposed by Pace and Barry (1997a,b,c) to quickly compute the Jacobian of the spatial transformation (). This involves the computation of  across a grid of values of . With sparse D, special techniques exist which make this computational tractable.

To address the functional form Jacobian part of the likelihood, we employ two additional techniques to greatly accelerate computational speed. First, we use an intermediate transformation of the dependent variable. Intermediate transformations are often used in nonparametric regression (regression with very flexible functional forms). By adopting a transformation which partially models the nonlinearity, it requires less flexibility (fewer degrees-of-freedom) to model the remaining nonlinearity. The goal of our particular intermediate transformation is to make the dependent variable’s histogram approximately symmetric. 

Second, given an approximately symmetric dependent variable, we can employ evenly spaced knots. Equally spaced knots result in more observations between the central knots and fewer observations between the extreme knots. This makes the spline transformation more flexible in the tails and less flexible in the center, a desirable result. Such evenly spaced knots have often been used with B-splines (Hastie and Tibshirani (1990, p. 24)). Evenly spaced knots lead to a very simple functional form Jacobian (Eilers and Marx (1996), Shikin and Plis (1995, p. 44)) suitable for rapid computation.

To address the log of sum-of-squared errors portion of the log-likelihood, we use the linearity of the B-spline and spatial transformations to write the overall sum-of-squared errors as a series of the sum-of-squared errors from regressions on the individual parts of the transformation. This allows us to recombine the sum-of-squared errors from a set of regressions rather than recompute the sum-of-squared errors fresh each iteration.

Cumulatively, these computational techniques accelerate the log-likelihood computations so that each iteration takes little time. Each estimate requires around 1,000 iterations. Yet, these could be computed in less than 10 seconds on a 200 megahertz Pentium Pro computer, even though the data set had 11,006 observations.

We apply this to a housing data set from Baton Rouge, Louisiana. The Real Estate Research Institute at Louisiana State University estimates regressions periodically to form an index of real estate prices over time. Since each house does not sell each quarter, the regression controls for the differences in sample composition over time by using a variety of independent variables such as age, living area, other area, number of bathrooms, number of bedrooms, and date of sale. In addition, variants of these data have been used to examine prediction accuracy of regression models (e.g., Knight, Sirmans and Turnbull (1994)).

In real estate, predictions of the price of unsold homes have been extensively used for tax assessments. In fact, the majority of the districts in the country (and many foreign countries) use some form of statistical analysis to predict the prices of unsold homes (Eckert (1990)). In addition, the secondary mortgage markets have begun exploring the use of statistical appraisal for determining the value of collateral for loans (Gelfand et al. (1998), Eckert and O'Connor (1992)). Note, both of these applications give rise to very large spatial data sets.

To handle these needs, we estimated a general model which includes the previously discussed transformations of the dependent variable, transformations of the independent variables, spatially lagged independent variables, time indicator, and miscellaneous variables. As an illustration of the efficacy of the proposed techniques, the general model reduced the interquartile range of the residuals by 38.38% relative to a simple model using the untransformed dependent variable. Moreover, the resulting dependent variable transformation greatly improved the pattern of the residuals.

Most estimates of the Box-Cox parameters yield a model somewhere between a linear and logarithmic transformation. The estimated dependent variable transformation also fell between a linear and a logarithmic transformation – it was close to a linear transformation for low-priced properties but approached a logarithmic transformation for the high-priced properties. In fact, it actually provided more damping than the logarithmic transformation for extremely high-priced properties. Finally, the estimated functional forms of the independent variables seemed plausible and of interest.

Section II develops the joint spatial and dependent variable transformation estimator while section III applies the estimator to the Baton Rouge data. Section IV concludes the paper.

II. Simultaneous Spatial and Variable Transformations

This overall section presents the estimator and the various techniques facilitating computation. Section A sets up the log-likelihood, section B discusses the application of splines to the problem, section C shows how to simplify the SSE, section D provides a computational simplification of the spatial Jacobian, section E gives a simple way of computing the functional form Jacobian, and section F extends the model to transformations of the independent variables.

A. A Transformed Dependent Variable with Spatial Autoregression

Suppose the transformed variable follows a spatial autoregressive process,

                                                                                                                                  

where  denotes the transformed dependent variable n element vector which depends upon the o element vector of parameters q. In addition, X denotes an n by p matrix of the independent variables, D denotes an n by n spatial weight matrix,  represents the autoregressive parameter (), b denotes the p element vector of regression parameters, u denotes the spatially autocorrelated error term, while e denotes a normal iid error term.

The spatial weight matrix D has some special structure. First, it has zeros on the main diagonal which prevents an observation from predicting itself. Second, it is a non-negative matrix and positive entries in the jth column of the ith row means observation j directly affects observation i. We do not assume symmetry and so the converse does not necessarily hold. Third, we assume each observation is only directly affected by its m closest neighbors. This makes D very sparse (high proportion of zeros) which greatly aids computational performance. Fourth, D is row-stochastic and so each row sums to 1. This gives D a smoothing or linear filter interpretation (Davidson and MacKinnon (1993)). Intuitively,  provides a construct similar to a lag in time series for .

To estimate (1), we rewrite it as,

                                                                                                                  

For a known  and q, one could proceed to apply OLS to (2). Unfortunately, estimating  and q via OLS results in biased estimates.

To motivate the defect in using OLS to estimate the parameters in this situation, consider momentarily the very simple model  where  represents a scalar parameter. If we employ OLS to estimate both  and , the estimated value of the parameter  would equal 0 for any value of . This would turn the dependent variable vector  into a vector of zeros that a model with an intercept would fit perfectly.

To prevent this form of extreme behavior, one must employ maximum likelihood which explicitly penalizes such pathological transformations using the Jacobian of the transformation. The Jacobian of the transformation measures the n-dimensional volume change caused by stretching or compressing any or all of the potential n dimensions. By premultiplying Y via the matrix , we are performing a linear transformation. In this case we are compressing or stretching each of the n dimensions of Y by a factor . Relative to a unit value for , values of  correspond to more singular transformations. The Jacobian of the transformation is the determinant of the matrix of derivatives which in this instance is  ().[1] To make the example even simpler, we are dealing with a cube when n is 3. If we multiply each dimension of the cube by a factor of 2, we increase the volume of the cube by a factor of 8 (). The need for the Jacobian is not specific to the normal maximum likelihood, but arises whenever making transformations with proper, continuous densities (Davidson and MacKinnon (1993, p. 489), Freund and Walpole (1980, p. 230-252)).

Assuming normality, the profile log-likelihood for this example equals a constant plus the log of the Jacobian less . Taking as a reference point the sum-of-squared error when  (), then . As an example, multiplying Y by a constant of  would multiply SSE by a constant of . Hence, the profile log-likelihood becomes . A simple expansion shows that the likelihood will be the same for any choice of . Hence, the maximum likelihood choice for  does not depend upon . Thus, one cannot affect the maximum likelihood estimate by a simple scaling of the dependent variable, a highly desirable result.[2]

In this simple case, the role of the Jacobian in maximum likelihood is clear. The Jacobian continues to play a similar role in more complicated transformations such as those arising from spatial transformations or from functional form transformations. Successive transformations result in Jacobians multiplied by each other in the multivariate density. Hence, for simultaneous transformations the log of the Jacobian of  would equal the sum of the logs of the individual Jacobians (e.g.,  where J denotes the relevant Jacobian).

Hence, the profile log-likelihood for estimation using a spatial and a functional form transformation equals,

                                                              

where  and  represent the Jacobians of the spatial and dependent variable transformations and  represents an arbitrary constant.

Attacking the maximization of the above log-likelihood in the most straightforward way would likely result in very long execution times. We show methods for greatly accelerating the computation of each of these terms. We detail these computational accelerations in the sections below.

B. Linear Expansions of Non-Linear Functions

If one computed  and subsequently  for every iteration of the maximization of the log-likelihood, this could greatly reduce the speed of the algorithm as  is an n by n (albeit sparse) matrix. Hence, we first seek ways of avoiding this step. Fortunately, if we can expand  linearly, we can avoid this set of computations. A number of ways of linearly expanding a function exist. We could use indicator variables, polynomials, or splines. For our computations we chose B-splines (De Boor (1978, 1999)).

In fact, splines generalize both indicator variables and polynomials. Indicator variables provide a locally constant fit to a function for their non-zero portions. B-splines of degree 0 yield indicator variables. The advantage of indicator variables or B-splines of degree 0 is their local fit. Their disadvantage is that locally constant approximations are not necessarily continuous or smooth.

Polynomials, however, exhibit both continuity and smoothness. Polynomials attempt to approximate a function globally and gyrations of the function over parts of the domain can cause the polynomial to poorly fit other parts of the domain. A polynomial equates to a high degree B-spline with few knots.

Specifically, B-splines are piecewise polynomials with conditions enforced among the pieces. The knots specify where each local polynomial begins and ends and the degree specifies the amount of smoothness among the pieces. A spline of degree 0 has no smoothness, a spline of degree 1 is piecewise linear, a spline of degree 2 is piecewise quadratic, and so forth.

To provide some physical intuition, a spline was a thin strip of wood used in constructing ships. The spline attached to two points separated by less than its length would cause the spline to produce a curve. By introducing supports (ribs of the ship), the curve could be modified into many shapes. Hence, the spline knots act similar to the ship’s ribs. Moreover, the flexibility of the strip of wood would determine the smoothness (affect the degree of the spline). The piecewise linear splines used here correspond to laying a string across the ribs of the ship.

Also, one can restrict B-splines to yield strictly monotonic transformations. One must have monotonic transformations of dependent variables for prediction in the original dependent variable space (Ramsay (1988)). Finally, B-splines can interpolate a given set of values (assuming satisfaction of the Schoenburg-Whitney conditions (De Boors (1999, p. 1.10)). The Schoenburg-Whitney conditions essentially require that each of the B-spline basis vectors have at least one non-zero value. Hence, given a set of values, some weighting of the associated B-spline basis vectors could return the same set of values.

To explain splines in detail is beyond the scope of this article. However, a specific example greatly aids in understanding some of their features. In example 1 we consider four values for the dependent variable Y of 1, 1.5, 2.25, and 3.0. Given knots of 1, 2, and 3 (with 1 and 3 being repeated), we used the SPCOL function in the Matlab Spline Toolbox 2.01 to produce the following matrix B(Y) comprised of three basis vectors. The exact values of the basis vectors depend upon Y and hence we emphasize this by writing B(Y).

 

Example 1

Y

B(Y)

1.00

1.00

0.00

0.00

1.50

0.50

0.50

0.00

2.25

0.00

0.75

0.25

3.00

0.00

0.00

1.00

 

In example 1, B(Y) illustrates a couple of B-spline features. First, B(Y), the collection of basis vectors, contains only non-negative numbers. Second, each row sums to one. Third, the basis vectors have zero elements for elements of Y sufficiently far away from the knots. If we compute  for , we find it yields Y exactly. For other  such that , the plots of  versus Y show a monotonically increasing piecewise linear relations. Figures 1a, 1b, 1c, and 1d show four such plots. In every case, the selected  satisfied the monotonicity constraints. Figure 1a shows how the function  exactly replicated the original Y (interpolated). Figure 1b shows a slightly concave transformation while 1c shows a more severe concave transformation. Figure 1d shows a convex transformation. With more points, one could generate combinations of convex and concave transformations (over different domains).

Assuming satisfaction of the Schoenburg-Whitney conditions, with B-splines our transformed dependent variable becomes,

                                                                                                                                  

where  represents the n by o matrix containing the basis vectors and  represents the o by 1 parameter vector. The number of basis vectors, o, depends upon the number of knots and the degree of smoothness required. As o rises, the transformed dependent variable  can assume progressively more flexible forms.

Substituting into yields .

                                                              

Hence, one can linearly expand the joint spatial and dependent variable into the product of a n by 2o matrix and a 2o by 1 parameter vector.

C. SSE Simplifications

Let M represent the idempotent least squares matrix . We can write the residuals from the regression of  on X as ,

                                              

where the n by 2o matrix E contains all the residuals from the individual regressions and the vector r represents the 2o element parameter vector. The linearity of the problem means the least squares residuals e on the overall transformed variable  are simply a linear combination of the least squares residuals from regressing each basis vector in and their spatial lags  on X. Hence, forming parameterized sum-of-squared errors yields .

                                                                                                            

Note, the 2o by 2o error cross-product matrix  is only computed once. Subsequent iterations of  involve only order of  operations, a very small number which does not depend upon n, the number of observations or k, the number of regressors. Moreover, o is usually much less than k and strictly less than n. This reduction in the dimensionality of the sum-of-squared errors leads to an low dimensional profile likelihood (Meeker and Escobar (1995)).

D. Spatial Jacobian Simplifications

Historically, the spatial Jacobian,, constituted the main barrier to fast computation of spatial estimators (e.g., Li (1995)). However, the use of a limited number of spatial neighbors lead to sparse matrices. Pace and Barry (1997b, c) show how various permutations of the rows and columns of such sparse matrices  can vastly accelerate the computation of . Although computation of  is inexpensive for a particular value of , one can further accelerate the computations by computing  for a large numbers of values of  (e.g., 100) and interpolating intermediate values. Insofar as  has a limited range (for stochastic D) and the function  is quite smooth, the interpolation exhibits very low error.

Moreover, these computations are performed only when changing the weight matrix D. Hence, one can reuse the grid of values (and interpolated points) when fitting different models involving Y and X for a given D.

Pace and Barry have released a public domain Matlab-based package, “Spatial Toolbox 1.0” available at www.finance.lsu.edu/re which implements these spatial Jacobian simplifications and contains copies of the articles which describe the implementation details.


E. Functional Form Jacobian Simplifications

The functional form log-Jacobian has a particularly simple form for piecewise linear splines with evenly spaced knots,