The use of species distribution models (SDM) to map and monitor animal and plant distributions has become increasingly important in the context of awareness of environmental change and its ecological consequences. Although increasingly sophisticated statistical methods are being used in SDM, the vast majority has been developed without considering spatial autocorrelation in the data. When spatial autocorrelation is ignored and nonspatial statistical methods are used, coefficient estimates are less precise and overall the models can be poorly specified. Explicitly spatial statistical methods not only can improve upon these model calibration issues, but they can also incorporate information on spatial processes such as competition, dispersal and disturbance. Although there has been a recent increase in SDM studies that address explicitly spatial statistical methods, results have been incongruous and difficult to synthesize. The location- , data-, or scale-specific nature of these studies has impeded efforts to disentangle the effects of spatial structure in the data, sampling strategy, the scale of the study, and statistical methods used. This project addresses each of these issues specifically with the general research question: how does spatial autocorrelation affect species distribution models? This research focuses on using multi-resolution simulated distribution maps and novel assessment measures in order to analyze how differences in each of the four issues- spatial structure, sampling strategy, scale, and statistical methods- impact SDM both separately and in concert.
The main outcome of this project will be a framework that can be used to guide all aspects of model conceptualization and development (sampling strategy, statistical method(s) used, appropriate spatial scale) when using binary data with spatial autocorrelation. This research will make an important contribution to a greater understanding of how spatial autocorrelation affects inductive models used with binary response data. Beyond predicting species distributions, these models have become an important and widely used decision-making tool for a variety of biogeographical applications, such as studying the effects of climate change, identifying potential protected areas, determining locations potentially susceptible to invasion, and mapping vector-borne disease spread and risk. Outside of biogeography, similar binary response models are used in medical/health applications (e.g., diagnostic tests) and economic and social sciences (e.g. labor market status, credit scoring, voting behavior).