Spatial Regression Using Panel Data

Spatial Regression Using Panel Data

I am analyzing a spatial panel dataset using the XSMLE package in Stata. My units are a subset of US states (11) and my panel is strongly balanced.

  • The package returns estimations for Main, Spatial, Variance, Direct and Indirect betas. Is there a resource someone can recommend explaining how to interpret these statistics? I've read the presentation that Belotti, Hughes and Mortari (2013) created on XSMLE, but they don't discuss why the estimations are presented in this fashion.
  • If a standard Hausman test indicates that an RE model is appropriate, does that results extend to a spatial panel?
  • What's the best way to assess high leverage observations when running a spatial panel? Should I just estimate a pooled OLS model and then calculate DFBETA and Cook's D or is there another approach?

Urban Spatial Decision Support System for Municipal Solid Waste Management of Nagpur Urban Area Using High-Resolution Satellite Data and Geographic Information System

Solid waste generation is undoubtedly related to anthropogenic activities and culture. The management of solid waste becomes more challenging in densely populated heterogeneous cultures in developing countries. Proper management of solid waste may be done only after proper characterization of the waste, which varies with anthropogenic activities and population densities, which in turn are related to land use patterns of urban areas. The spatial characterization of waste in urban areas is mandatory for designing management strategies, including waste segregation, collection, transportation, and disposal. In this paper, solid waste is characterized within the Nagpur urban area, and coefficients are generated through regression analysis for spatially estimating quantities of waste components for various land use classes. Satellite data are used to generate population indexes for various land use classes. Simple linear regression analysis is used to generate indexes for computing physical and chemical characteristics of solid waste for residential land use classes. Based on the indexes generated, a spatial representation of solid waste components using a geographical information system (GIS) is obtained. This spatial representation acts as a tool to understand the status of parameters within any urban area. A spatial decision support system has also been developed, which can assist a decision maker involved in management of solid waste in urban areas.

Geography (GEOG)

This is an archived copy of the 2018-2019 catalog. To access the most recent version of the catalog, please visit

GEOG 1101 Phys Geography Meteorology Lab 1 SCH (2)

A laboratory experience that focuses on laboratory techniques, data collection and analysis. The experience reinforces and promotes greater understanding of concepts of meteorology presented in GEOG 1301. Prerequisite: credit or registration in GEOG 1301.

GEOG 1102 Phys Geog Climate and Mankind 1 SCH (2)

A laboratory experience that focuses on laboratory techniques, data collection and analysis. The experience reinforces and promotes greater understanding of concepts of climatology and its effect on human civilization, as presented in GEOG 1302. Prerequisite or corequisite: GEOG 1302.

GEOG 1301 Physical Geography Meteorology 3 SCH (3)

Earth motions and their meanings system of location and time composition and structure of the earth's atmosphere. Meteorology and weather prediction, including storms. Air pollution meteorology. Field trips will be arranged.

GEOG 1302 Phy Geog Climate and Mankind 3 SCH (3)

Climatic classification, types and world regions. Climatic change, fluctuations and their effects on human ecology (e.g., droughts). Agricultural and urban climatology. Microclimates. The distribution of soils and natural vegetation as related to climate. Field trips will be arranged.

GEOG 1303 World Geography 3 SCH (3)

Major geographic regions of the world. Landscapes and peoples of continents major culture realms and nations, resources, land-use and industries. Contrasts between developed and emerging nations.

GEOG 2472 Intro to Geographic Info Sys 4 SCH (3-3)

Principles and experience of Geographic Information Systems. Acquisition, management, processing, and interpretation of geographic data. Spatial data structure and the display, manipulation, and analysis of geographic data. Field trip required. Field trip fee required. Prerequisite: 3 hours of natural science or permission of instructor. Field Trip Fee required.

GEOG 3302 Intro to Broadcast Meteorology 3 SCH (3)

Principles and practice of broadcast meterology, a joint effort of the Department of Physics and Geosciences and Department of Communication and Theatre Arts. Introduction to television weather broadcasting with emphasis on creating accurate forecasts and on the techniques of communicating weather information to the public. Prerequisites: GEOG 1301 with GEOG 1302 recommended.

GEOG 3305 Environmental Geography 3 SCH (3)

The nature, geographic distribution, use and misuse of global resources with emphasis on those of North America. Ecosystems, air, water, soil, mineral and energy resources will be considered. Prerequisites: 3 semester credit hours of Geography or a science course (see General Education Requirements natural sciences component).

GEOG 3310 World in Chg Crucial Topics 3 SCH (3)

Intensive study of the geography of selected world "crisis" regions. Examples include the Middle East, Sub-Saharan Africa and the former U.S.S.R. and Eastern Europe. May be repeated for credit as the topic changes. Prerequisite: 6 semester hours of geography or 12 semester hours of social sciences.

GEOG 3331 United States and Canada 3 SCH (3)

The regional aspects of landforms, climate, resources and peoples of United States and Canada. Prerequisite: 6 hours of geography or 12 hours of social science.

GEOG 3421 Geomorphology 4 SCH (3-3)

Description, classification and quantitative analysis of landforms and surface processes in relation to human development. Regional physiography of the United States and topographic map interpretation. May be used as geology credit. Field trip required. Field trip fee required. Prerequisite: GEOL 1302/GEOL 1102 or GEOL 1303/GEOL 1103, MATH 1316. Field Trip Fee will be required.

GEOG 3450 Field Mapping Cartography 4 SCH (3-3)

The principles and practice of plane surveying and the global positioning system (GPS) and their interface with geographic information systems (GIS). Basic principles of cartography and use of cartographic tools and software. Management of cartographic data and GPS data. Local field trips required. Field trip fee required. Prerequisite: MATH 1316 or MATH 1324. Field Trip Fee will be required.

GEOG 3460 GIS in Nat Res and Envir Mgmt 4 SCH (3-3)

GIS and other geospatial technologies (including GPS and remote sensing) as applied to natural resources and environmental management. Technologies and techniques used to acquire geographic information, spatial data and location analysis, and applications of geospatial technology within the natural and environmental sciences. Case studies, labs, and field exercises. Prerequisite: GEOG 2472 (preferred), or six hours of physical or life science, or permission of instructor.

GEOG 3470 Quant. Methods in Geography 4 SCH (3-3)

Quantitative methods commonly used to describe, characterize, model, and analyze geo-spatial data. Geographic data description and summary, used of interential statistics as exploratory and descriptive tools, different spatial statistics to explore geographic patterns, geographical correlation analysis, and geo-spatial regression analysis. Prerequisite: junior standing.

GEOG 4305 Geographic Research Methods 3 SCH (3)

Review of scientific techniques used in geographic research, independent review of literature, and a research problem yielding a formal report on the research. Prerequisites: senior standing and 12 hours of Geography and Geology.

GEOG 4420 Spec Topics in Geoscience 1-4 SCH (1-4)

Concepts, developments or discoveries in geography. May be repeated for a maximum of six semester hours credit. Prerequisite: 12 semester hours of geography and/or geology.

GEOG 4429 Advanced GIS 4 SCH (3-3)

Advanced techniques and applications of Geographic Information Systems. GIS data structure and conversions, advanced spatial analysis, data visualization, hydrological modeling. Basic and intermediate GIS programming for customizing and manipulating GIS applications. May be used as a geology credit. Prerequisite: GEOG 2472 or permission of instructor.

GEOG 4435 Remote Sensing 4 SCH (3-3)

The technology and interpretation of aerial photography and satellite imagery, including multi-spectral, thermal and radar images. Digital image processing using a raster geographic information system. Applications of remote sensing and guided projects in areas of student interest. May be used as a geology credit. Prerequisite: MATH 1314 and 6 hours of science, engineering or agriculture.

GEOG 4436 Advanced Remote Sensing 4 SCH (3-3)

Advanced topics in remote sensing. Recently emerged remote sensing systems, including high-resolution multi-spectral imaging systems, thermal remote sensing and airborne LiDAR remote sensing systems. Cutting-edge remote sensing data processing and analysis techniques. May be used as Geology credit. Prerequisite: GEOG 4435 or equivalent.

GEOG 4441 GIS for Business 4 SCH (3-3)

GIS and spatial analysis applied to organizations. Geographic information, locational decision-making, spatial data, investment in and value of GIS, ethical aspects, and GIS strategies. Case studies and lab practice with spatial data. Prerequisite: GEOG 2472.


is knowledge about where they are located, how they are connected, and how their locations possibly impact buying decisions and behavior.

The data needed for this – geospatial data – is all around us. Just think of weather reports, suggested routes on Google Maps, or geotagged posts on Facebook. Simply put, geospatial data is any data with a spatial identifier referring to a position on the earth. Check out this video for some use cases.

GIS + ERP = A Game Changer

To change the game, you need real-time analysis, reporting, forecasting, threat detection, and planning. To do this, geospatial data stored in a geographic information system (GIS) needs to be combined with data from the Internet of Things and sensor data, data from your partners and ecosystem, and data from your ERP system – operations, customers, finance and marketing. This way, you can outpace competitors through smarter use of your business data and geospatial data to gain an unbeatable competitive edge.

Geospatial analysis brings GIS, the system of record for maps, and ERP, the system of record for business data, closer together. It involves gathering, displaying, and using GIS data, geographic coordinates, street addresses, postal codes, and other identifiers to create geographical models – data visualizations that help making complex relationships more understandable. These geographical models can reveal historical changes, shifts that are underway, where they are taking place, and they can even predict what’s about to happen.

Business applications can be enriched with geographic data from GIS. Business data on maps with multiple layers of different types of data can be overlaid with detailed geographic information, like topography and satellite imagery. This allows business users to visualize information in various ways, such as heat map layers to visualize data density and or to highlight statistically significant geographic areas. This example predicts landslides, including impact to key points of interest.

Munich RE, one of the world’s largest reinsurance companies, uses the spatial data processing, predictive analytics, and simulation capabilities in SAP HANA to assess risk “in the moment”. The company also uses the cloud-based Earth Observation Analysis Service powered by SAP HANA to analyze natural disaster data with its customer data to make more informed decisions about insurance risks. Its customers benefit as the company pushes down costs based on accurate, timely, historical and real-time information.

To further help companies leverage the power of location to uncover new insights from business, social, and sensor data, SAP and Esri, the global leader in geographic information systems (GIS) and spatial analytics, teamed up. Both companies provide the platforms, applications, and ecosystems, covering the breadth and depth of geo-enabled business processes needed by organizations of all sizes and across all industries, greatly reducing or eliminating the need for multiple business or geospatial systems. Developers can extend those applications or create entirely new ones that work easily within the SAP and Esri environments. Interested in learning more? Check out our solution brief here.

Geospatial data moves to SAP HANA

Building on this long-standing partnership, SAP just recently announced that Esri now supports SAP HANA as a certified enterprise geodatabase – an achievement both companies have been striving for.

Together, SAP and Esri are now making GIS, mapping, advanced visualizations and spatial analytics available to everyone across the enterprise. Adding SAP HANA to Esri allows customers to gain deeper insights, take smarter business decisions, and innovate rapidly. Whether on premise or in the cloud, with the Esri geodatabase powered by SAP HANA, spatial data can be integrated and delivered across organizations and accessed from one place for true IT landscape consolidation. Customers running SAP, non-SAP and Esri solutions can now streamline their IT architecture with one underlying platform powered by SAP HANA. You can expect more innovation and integration from both companies throughout the year, allowing you to build a modern data foundation that can rapidly extend to new use cases, including machine learning and graph databases.

First customers are already benefitting from spatial analytics, advanced visualizations and embedding geospatial data into core business processes. The Metropolitan Utilities District (M.U.D.), a political subdivision and public corporation, is deploying Esri’s ArcGIS on SAP HANA to perform real-time analytics on their business data. By eliminating the manual tasks of data preparation and conversion processes, employees can concentrate on more creative and strategic tasks.

Geospatial analysis has arrived, but we have only just begun to realize its potential. With both volume and variety of geospatial data growing, companies choosing to harness the power of various forms of spatial data can outpace competitors by tapping into new methodologies to analyze existing data. Geospatial analysis can lead to improved decision making, better results, new revenue opportunities, and overall a better view of the company’s data, thus opening up new areas for growth.

If you are savvy developer interested in testing your skills in geospatial, stay tuned to news on the SAP + Esri Spatial Hackathon. Due to unprecedented demand, we’ve already reached maximum capacity. However, you can use the learning resources and Ask the Experts sessions to learn more. Also, we’re looking at having more hackathons across the globe — keep an eye on the Twitter #SAPEsri tag, or follow the Hackathon website to stay in the know. For more details, check out the blog.

Data and Methods

Study area

Our study domain spanned 36,500 km 2 of wildland and developed areas in Southern California within Santa Barbara, Ventura, Los Angeles, San Bernardino, Orange, Riverside and San Diego counties. Southern California's Mediterranean-type climate is characterized by a dry summer followed by a relatively brief and mild rainy season (Bailey 1966 ). Spatial gradients of temperature and rainfall result in a variety of vegetation habitats (Franklin 1998 ). Widespread vegetation types include chaparral shrubland, coastal sage shrubland, valley grassland, open oak woodland, oak woodland, and coniferous forest (Di Castri et al. 1981 , Arroyo et al. 1995 , Davis and Richardson 1995 ). Southern California has experienced intense population pressure and urban growth around the major metropolitan areas during the past five decades this has created widespread urban communities interspersed with wildland areas and connected by an extensive road network. Over 22 million people lived in Southern California in 2010 (source: US Census Bureau 2012 ). We focused our analysis on predicting the regional burned area patterns throughout Southern California after excluding dense urban areas and deserts. Urban areas in the study domain represented less than 8% of the total land area, while the wildland–urban interface (WUI) accounted for 17%. Two-thirds of the area within the WUI consisted of housing in the vicinity of contiguous wildland vegetation and the remaining third was interspersed housing and vegetation.

Data sets: Wildfire data

We assessed burned area using the digitized perimeter for all reported fires >40 ha compiled by the California Department of Forestry – Fire and Resource Assessment Program (FRAP 2010 ). We focused on the 50-yr period from 1960 to 2009 the fire records during this period were more reliable than earlier records, and the period overlapped with the availability of information on human and biophysical factors.

We carried out our analysis at a 3 × 3 km resolution to match the spatial resolution of complementary downscaled meteorological data sets that were important for characterizing regional variations in fire weather (Faivre et al. 2014 ). A sensitivity test was done during a preliminary analysis stage to quantify the effect of spatial resolution. We found that the 3-km resolution did not produce results that were systematically different from those using a finer resolution of 1 km. The 3-km resolution resulted in a sample total of 3590 grid cells that had a large and well-distributed range of burned area fractions, which aided model development. We considered burned area fraction, defined as the ratio of total area burned summed during 1960–2009 within each 3 × 3 km grid cell divided by the grid cell area as the dependent variable. Multiple human and environmental variables, which are described below, were the predictors. The ArcGIS overlay geoprocessing tool was used to intersect the polygon layer of the grid cell boundaries with all fire polygons during 1960–2009, and the areas of all intersected new polygons within each individual grid were then summed. For example, if two different fires during the study period each burned half of the grid area, the resulting burned area fraction would be one. We classified the historic record of fire perimeters into SA fires and non-SA fires using the start date reported in the FRAP database and a continuous historic time series of days with Santa Ana conditions (Jin et al. 2014 , Fig. 2). Santa Ana days were determined using a downscaled meteorological time series that was obtained by driving the MM5 mesoscale model with the ERA-40 and North American Regional Reanalysis data sets. Santa Ana days were identified when the northeasterly component of the daily mean wind speed was greater than 6 m/s at the exit of the largest gap across the Santa Monica Mountains (Hughes and Hall 2010 ).

Data sets: Human factors

Humans can influence wildland fire regimes through several different pathways (Hammer et al. 2007 , Radeloff et al. 2010 ). WUI areas and road networks, for example, influence fuel continuity, the patterns of ignition and access for suppression (Lloret et al. 2002 , Rollins et al. 2002 , Ryu et al. 2007 ). We defined the WUI as areas with less than 50% vegetation and at least 6.2 houses/km 2 (1 house per 40 acres) that are located within 2.4 km of a 5 km 2 (or greater) area that is more than 75% vegetated (Stewart et al. 2007 ).

We considered seven variables to describe the human influence on burned area: (1) distance of cell center to a major road, (2) distance of cell center to a minor road, (3) road density, (4) population density, (5) distance of cell center to low-density housing, (6) wildland–urban land fragmentation, and (7) ignition frequency. We derived these seven variables using the best available statewide data. Geographic Encoding and Referencing road data (TIGER US Census Bureau 2000 ) was used to calculate the road density per grid cell and the distance to nearest road from the cell centroid. We computed average population and housing density per 3 × 3 km grid for 1960–2009 using the 1990 and 2000 U.S. decennial census spatial data, along with consistent decadal projections of past growth trends for 1960, 1970, and 1980 (see Hammer et al. 2004 , 2007 for details). We used the distance from cell centroid to the nearest housing area with a density greater than 6.2 housing units/km 2 as an indicator of the proximity to low-density housing within the WUI.

Wildland–urban land fragmentation was calculated using an edge density metric that represented the degree of spatial heterogeneity in the landscape. We used a land cover data set at 100 m resolution from the California Department of Forestry and Fire Protection's Fire Resource Assessment Program (FRAP 2002 ) to aggregate vector-based layers describing urban and non-urban land cover types. The resulting binary map was then processed using the FRAGSTATS software package (McGarigal and Marks 1995 ) to analyze the spatial arrangement of wildland–urban patterns. We tested several landscape metrics including the patch density, mean patch size, mean shape index, edge density, mean nearest neighbor distance between similar patches and the interspersion and juxtaposition index (see McGarigal and Marks 1995 for a definition of metrics). We found that edge density was the best proxy for quantifying the complexity of wildland patches imbricated within urban areas. Edge density (ED) is a shape index that indicates whether the wildland–urban boundary is simple and compact (low value) or irregular and convoluted (high value). We computed the mean for each predictor within each 3 × 3 km grid cell by applying the zonal statistics tool in ArcGIS Spatial Analyst (Fig. 3).

Consistent region-wide ignition data outside of the National Forests were unavailable, and we estimated ignition frequency for each 3 × 3 km grid using the spatial modeling approach developed in Faivre et al. ( 2014 ). Poisson regression analyses were used to model ignition frequency as a function of the dominant human and biophysical covariates (Syphard et al. 2008 , Faivre et al. 2014 ).

Data sets: Vegetation and biophysical factors

We used a set of 12 environmental variables that were expected to influence the physical characteristics of fuel, including continuity, moisture, or loading (Fig. 1). These variables can be sorted into three main categories: topography (1-elevation, 2-slope), land cover (fractional cover of 3-forest, 4-shrubland, 5-grassland, and 6-other), and meteorology (annual average daily 7-maximum and 8-minimum temperature, 9-cumulative winter precipitation, 10-wind speed, 11-relative humidity, and 12-Fosberg fire weather index (FFWI) (Table 1)). The FFWI is a non-linear construct of meteorological conditions (i.e., temperature, relative humidity, and wind speed), which is widely used to infer wildfire potential from the short-term weather conditions (Fosberg 1978 ). FFWI values range from 0 to 100 values ≥50 indicate a significant threat of wildfire incidence and spread.

Burned area drivers and input variables Variable name Data resolution Data source
Human accessibility
Distance to major roads (km) d.majR 1:100,000 Census Bureau's TIGER road data (Topologically Integrated Geographic Encoding and Referencing) (US Census 2000 )
Distance to minor roads (km) d.minR 1:100,000 Census Bureau's TIGER road data (Topologically Integrated Geographic Encoding and Referencing) (US Census Bureau 2000 )
Distance to low-density housing (km) d.hou NA Census block-group data for 2000 (US Census Bureau 2001 )
Urban development
Population density (Mpers./km 2 ) pop.den NA Census block-group data for 2000 (US Census Bureau 2001 )
Ignition frequency (No. ignitions/km 2 ) pred.ign 3 km Ignition frequency estimates for Southern California (Faivre et al. 2014 )
Land fragmentation
Edge density index (0–100) ed.den 30 m WUI maps computed from the 1990 and 2000 US Census block datasets (Radeloff et al. 2005 )
Road density (km roads/km 2 ) rd.den 1:100,000 Census Bureau's TIGER road data (Topologically Integrated Geographic Encoding and Referencing) (US Census Bureau 2000 )
Elevation (m) elev 90 m Digital elevation data from the United States Geological Survey—National Elevation Dataset
Slope (%) slope 90 m Digital elevation data from the United States Geological Survey—National Elevation Dataset
Land cover
Tree cover (%) tree 100 m California Department of Forestry and Fire Protection's Fire Resource Assessment Program (FRAP 2002 )
Shrub cover (%) shrub 100 m California Department of Forestry and Fire Protection's Fire Resource Assessment Program (FRAP 2002 )
Grass cover (%) grass 100 m California Department of Forestry and Fire Protection's Fire Resource Assessment Program (FRAP 2002 )
Temperature maximum (°C) tmax 800 m Monthly estimates of average daily maximum temperature from PRISM (Daly et al. 2008 )
Temperature minimum (°C) tmin 800 m Monthly estimates of average daily minimum temperature from PRISM (Daly et al. 2008 )
Precipitation (mm/yr) prec 800 m Monthly estimates of mean cumulative precipitation from PRISM (Daly et al. 2008 )
Fosberg fire weather index ffwi 6 km Daily estimates of a mesoscale model version 5 (MM5)—Penn State/National Center for Atmospheric Research
Relative humidity rel.h 6 km Daily estimates of a mesoscale model version 5 (MM5)—Penn State/National Center for Atmospheric Research
Wind speed (m/s) wind.s 6 km Daily estimates of a mesoscale model version 5 (MM5)—Penn State/National Center for Atmospheric Research

The topographic variables (elevation and slope) were calculated for each 3 × 3 km grid cell using the three arc-second digital elevation model from the U.S. Geological Survey National Elevation Dataset (NED). We assessed vegetation characteristics using the recent and comprehensive land cover data set at 100 m resolution from the California Department of Forestry and Fire Protection's Fire Resource Assessment Program (FRAP 2002 ). We classified the mixed vegetation of wildland areas into three major types: “shrubland” (comprising 52% of the study area), “forest/woodland” (19%), and “grassland” (8%). The remaining non-vegetated land cover types (21%) were grouped as “other” this category included agricultural land, urban, desert, wetland, water, and barren soil. We calculated the fraction of each class within each 3 × 3 km grid cell.

We derived several of the meteorological variables from the monthly gridded Parameter-Elevation Regressions on Independent Slopes Model (PRISM) data set that has a native resolution of 800 m (Daly et al. 2002 Oregon State University PRISM Group). Winter precipitation was estimated using monthly mean of precipitation during September through March for each 3 × 3 km cell over the 1960–2009 period. Similarly, variables representing the annual mean of daily maximum temperature and the annual mean of daily minimum temperature were calculated over the period by averaging all the available monthly files.

To capture the spatial pattern of meteorological conditions that typically occur during SA and non-SA fires, we estimated daily relative humidity, wind speed, and the Fosberg Fire Weather Index using 3-hourly model outputs from the Mesoscale Model version 5 (MM5) forced with reanalysis data sets as described by Jin et al. ( 2014 ). Santa Ana days were identified using winds at the exit of the largest gap in the Santa Monica Mountains (Hughes and Hall 2010 ). Most of the Santa Ana events occurred in late autumn and early winter in Southern California, and most SA fires occurred in a 3-month window from September to November (Jin et al. 2014 ). We therefore quantified the meteorological conditions that typically occur during SA fires by averaging each of these three variables from the MM5 daily time series during Santa Ana days from September to November. For non-SA fires, we calculated these same variables during non-Santa Ana days from June to August. We resampled all meteorological data to the common 3 × 3 km grids.

Modeling approaches to predict burned area

We built, tested, and compared five modeling approaches separately for SA and non-SA fires: multiple linear regression (MLR), generalized additive models (GAMs), GAMs incorporating spatial autocorrelation (GAMspA), non-linear multiplicative models (NMM), and random forest models (RF).

MLR has been used extensively to analyze the relationship between burned area and environmental controls (Larsen 1996 , Carvalho et al. 2008 , Camia and Amatulli 2009 ). Empirical studies often predict a high proportion of the variation in burned area using MLR (Flannigan and Harrington 1988 , Turner and Romme 1994 , Turner et al. 1994 , Larsen 1996 ). However, linear regression assumes the variance of the response variable is constant across observations and the errors follow a normal (Gaussian) distribution these assumptions may be invalid for the estimation of burned area or other ecological variables (Viegas and Viegas 1994 , Li et al. 1997 , McCarthy et al. 2001 ). We therefore also considered generalized additive models (GAMs), which are comparatively flexible and often are better-suited for analyzing ecological data based on non-linear responses to predictor variables (Hastie and Tibshirani 1986 ).

These modeling approaches assume spatial stationarity (i.e., effects of environmental correlates are constant across the region) and isotropic spatial autocorrelation (i.e., the process resulting in spatial autocorrelation acts in the same way in all directions). Anisotropic spatial autocorrelation arises when the variables of interest in nearby sample units are not independent of each other (Griffith 1987 ), i.e., in ecological data. Such spatial patterns are usually explained by environmental features such as climatic or habitat structure variables that are themselves spatially structured (e.g., directionality and intensity of wind patterns). It is often impossible to measure all spatially structured variables, and this issue affects the uncertainty of statistical models (Legendre 1993 , Legendre et al. 2002 ). A positive spatial autocorrelation (i.e., closer locations having more similar residual values than others) tends to underestimate the true standard error of parameters, which leads to an over estimation of the regression coefficients.

We thus constructed a version of the GAM model accounting for spatial autocorrelation to better represent gradually changing spatial variability in environmental correlates. We implemented these autocovariate GAMs by calculating locally weighed regressions within a moving window spanning the entire study domain. We included a two-dimensional smoothing function f(xi,yi) in the GAMs, using the two geographic coordinates (i.e., latitude and longitude) as a single variable, along with the other terms in the model (Wood and Augustin 2002 , Wood 2003 ).

(1) (2)

Variable selection and model validation

We performed initial univariate regressions between the response variable and all predictors with the goal of identifying the relative importance of each predictor independent of its interactions with others (Table 2). We also examined the correlation matrix among explanatory variables for high pairwise correlations to detect multicollinearity issues and to narrow the selection of useful covariates.

  • Table shows the explained variance by each predictor independently from the influence of other explanatory variables.

We used the following methods to select the most relevant predictors from the entire set. The selection of terms for deletion from the MLR model was based on Akaike's Information Criterion (AIC). The selection of terms for the GAM analysis used the automatic term selection procedure (Wood and Augustin 2002 ), which imposed a penalty to smooth functions and thus effectively removed terms from the model. The selection of terms in the multiplicative models relied on sequentially adding terms based on an incremental improvement to model fit (i.e., minimizing cross-validated R 2 ).

We used 70% of the data (n = 2495), randomly selected, for the development of each model. The remaining data in reserve (30%, n = 1095) were used to quantify model performance, using cross-validated R 2 values (model predictions against the validation data subset), root mean square errors (RMSE), percent bias and AIC values. We repeated this process 500 times for each model type (except for RF where the iteration process is integrated) while maintaining the 70:30 ratio to ensure the statistically meaningful mean and accuracy of the results. Finally, we estimated the number of degrees of freedom (Table 3 and 4). Model building and statistical analyses were carried out using R software (R Development Core 2012 “mgcv” package for GAM, “rpart” and “randomForest” packages for RF).

Var. Coef ± SE Var. imp. Var. dff Var. imp. Var. dff Var. imp. Var. Coef ± SE Var. imp. Var. Var. imp.
Intercept −5.31 ± 0.33 NA Intercept 1 NA Intercept 1 NA C1 1.36 ± 0.08 NA ffwi 42%
ffwi 0.31 ± 0.01 28% s(ffwi) 5.3 34% s(lon,lat) 8.9 31% ffwi −0.59 ± 0.07 26% elev 20%
wind.s −0.83 ± 0.03 19% s(wind.s) 5.9 31% s(ffwi) 5.5 20% wind.s 1.73 ± 0.22 19% rel.h 16%
rel.h 0.08 ± 0.005 10% s(rel.h) 3.3 12% s(wind.s) 5.8 16% prec −2.66 ± 0.26 16% d.hou 12%
shrub 0.28 ± 0.04 9% s(shrub) 2.8 6% s(rel.h) 3.7 3% shrub −1.78 ± 0.22 6% shrub 11%
d.hou −0.01 ± 0.001 13% s(d.hou) 3.9 10% s(shrub) 2.9 10% d.hou 0.23 ± 0.02 14% tmin 7%
tree 0.19 ± 0.05 5% s(prec) 3.5 5% s(d.hou) 3.9 13% tmax −0.02 ± 0.03 6% pred.ign 3%
prec −0.07 ± 0.009 15% s(tree) 3.3 1% s(prec) 3.5 3% rel.h 0.07 ± 0.01 12%
s(tree) 2.8 4%


  • Variable importance (Var. imp.) is specified as the contribution to the explained variance for the Multiple Linear Regression (MLR), as the reduction in the generalized cross-validation (GCV) estimate of error for Generalized Additive Models (GAM), as the contribution to model deviance for the Non-linear Multiplicative Model (NMM) or as the decrease in Mean Square Error (MSE) for Random Forest (RF). Please refer to Table 1 for a full description of the explanatory variables retained in the models. For the MLR, the AIC = 3696, the adjusted R2 = 0.39 [0.34, 0.42], the percent bias = 0.21, the RSME = 0.49, df = 8 for the GAM, the AIC = 3474, the adjusted R2 = 0.43 [0.39, 0.46], the percent bias = 0.05, the RSME = 0.46, df = 29 for the GAMspA, the AIC = 3046, the adjusted R2 = 0.51 [0.48, 0.54], the percent bias = 0.06, the RSME = 0.43, df = 38 for the NMM, the AIC = 3399, the adjusted R2 = 0.44 [0.39, 0.45], the percent bias = 0.013, the RSME = 0.26, df = 8 for the RF model, the adjusted R2 = 0.63, the percent bias = 0.023, the RSME = 0.2.
Variable Coef. ± SE Var. imp. Variable dff Var. imp. Variable dff Var. imp. Variable Coef.± SE Var. imp. Variable Var. imp.
Intercept 2.05 ± 0.15 NA Intercept 1 NA Intercept 1 29% C1 12.9 ± 4.7 NA rel.h 17%
shrub 0.41 ± 0.03 34% s(shrub) 1 14% s(lon,lat) 8.2 28% shrub 1.04 ± 0.08 20% shrub 13%
rel.h −0.02 ± 0.001 23% s(rel.h) 4.4 32% s(shrub) 1 2% rel.h −0.05 ± 0.005 17% tmin 13%
tmin 0.07 ± 0.006 12% s(tmin) 3.9 23% s(rel.h) 3.5 15% tmin 0.15 ± 0.01 14% wind.s 12%
rd.den −0.05 ± 0.008 16% s(rd.den) 1.8 7% s(tmin) 3.8 9% rd.den −0.13 ± 0.02 2% ed.den 10%
wind.s −0.06 ± 0.01 3% s(wind_s) 5.6 8% s(rd.den) 1.1 9% wind.s −0.16 ± 0.02 18% d.hou 11%
tmax 0.03 ± 0.005 3% s(d.hou) 4.8 4% s(wind.s) 5.5 6% tmax −0.06 ± 0.01 8% prec 10%
d.hou 0.01 ± 0.001 7% s(tmax) 4.8 10% s(d.hou) 4.5 4% d.hou 0.02 ± 0.002 21% pred.ign 13%
pred.ign 0.03 ± 0.01 2% s(pred.ign) 3.6 1% s(tmax) 4.6 3% pred.ign 0.10 ± 0.02 1%
s(pred.ign) 3.8 3%


  • Variable importance (Var. imp.) is specified as the contribution to the explained variance for the Multiple Linear Regression (MLR), as the reduction in the generalized cross-validation (GCV) estimate of error for Generalized Additive Models (GAM), as the contribution to model deviance for the Non-linear Multiplicative Model (NMM) or as the decrease in Mean Square Error (MSE) for Random Forest (RF). Please refer to Table 1 for a full description of the explanatory variables retained in the models. For the MLR model, the AIC = 3892, the adjusted R2 = 0.21 [0.16, 0.24], the percent bias = 0.28, the RSME = 0.52, df = 9 for the GAM, the AIC = 3714, the adjusted R2 = 0.27 [0.25, 0.34], the percent bias = 0.29, the RSME = 0.49, df = 31 for the GAMspA, the AIC = 3585, the adjusted R2 = 0.32 [0.27, 0.36], the percent bias = 0.30, the RSME = 0.48, df = 37 for the NMM, the AIC = 3655, the adjusted R2 = 0.23 [0.18, 0.28], the percent bias = 0.22, the RSME = 0.51, df = 37 for the RF model, the adjusted R2 = 0.48, the percent bias = 0.28, the RSME = 0.31.

Evaluation of relative importance of variables

We estimated the contribution of predictors by analyzing the deviance (AIC value) of nested models (i.e., models excluding successively the less relevant predictor) for all modeling approaches except Random Forest. In the RF approach, we used the 70:30 ratio to split the data sets for model calibration and validation (Breiman 2001 ), and used the percent decrease in accuracy (i.e., decrease in mean square error) as a measure of variable importance. Then, we conducted several analyses to better understand the relationship between driver variables, important splitting points, and the predicted spatial pattern of burned area. First, we ran an additional regression tree using the average (final) predictions from the random forest as input data. We then pruned this tree using a complexity parameter of 0.01 (see the documentation of R package “rpart” for an explanation of this parameter). This “summary tree” explained significantly more variance in the input data (P < 0.001) than any random regression tree of equal complexity generated from the random forest (Rejwan et al. 1999 ). The tree structure enabled to us to investigate the explanatory nature of the dominant controls on burned area. We analyzed the splits and nodes of this regression tree and determined the combinations of human and biophysical conditions resulting in high and low burned area fractions across the region. Finally, we used predictive maps to spatially characterize the combined influence of climate, fuel, and human conditions.


Exposure science has developed rapidly and there is an increasing call for greater precision in the measurement of individual exposures across space and time. Social science interest in an individual’s environmental exposure, broadly conceived, has arguably been quite limited conceptually and methodologically. Indeed, in social science, we appear to lag behind our exposure science colleagues in our theories, data, and methods. In this article, we discuss a framework based on the concept of spatial polygamy to demonstrate the need to collect new forms of data on human spatial behavior and contextual exposures across time and space. Adopting new data and methods will be essential if one wants to better understand social inequality in terms of exposure to health risks and access to health resources. We discuss the opportunities and challenges focusing on the potential seemingly offered by focusing on human mobility and, specifically, the utilization of activity space concepts and data. A goal of the article is to spatialize social and health science concepts and research practice vis-à-vis the complexity of exposure. The article concludes with some recommendations for future research, focusing on theoretical and conceptual development promoting research on new types of places and human movement, the dynamic nature of contexts, and training.

Crowdsourced Data Mining for Urban Activity: Review of Data Sources, Applications, and Methods

The penetration of devices integrated with location-based services and internet services has generated massive data about the everyday life of citizens and tracked their activities happening in cities. Crowdsourced data, such as social media data, points of interest (POIs) data, and collaborative websites, generated by the crowd, have become fine-grained proxy data of urban activity and widely used in research in urban studies. However, due to the heterogeneity of data types of crowdsourced data and the limitation of previous studies mainly focusing on a specific application, a systematic review of crowdsourced data mining for urban activity is still lacking. In order to fill the gap, this paper conducts a literature search in the Web of Science database, selecting 226 highly related papers published between 2013 and 2019. Based on these papers, the review first conducts a bibliometric analysis identifying underpinning domains, pivot scholars, and papers around this topic. The review also synthesizes previous research into three parts: main applications of different data sources and data fusion application of spatial analysis in mobility patterns, functional areas, and event detection and application of sociodemographic and perception analysis in city attractiveness, demographic characteristics, and sentiment analysis. The challenges of this type of data are also discussed in the end. This study provides a systematic and current review for both researchers and practitioners interested in the applications of crowdsourced data mining for urban activity.

The development of technologies such as Information and Communications Technology (ICT) and Web 2.0 technology has brought a data revolution to the world (Kitchin 2014, p. 26). As an emerging type of big data, the interest in crowdsourced data has grown in many disciplines (Gray et al. 2015 Garcia-Molina et al. 2016). Two core technologies supporting crowdsourced data have emerged from the multitude of approaches and clustered around two main themes: device/platform-captured data and user/system-interaction data. The former is the current wave of ICTs, such as digital devices, mobile phones, and the Internet of Things which have penetrated into almost every aspect of daily activities such as work, residency, commuting, communication, consumption, leisure, travel, and so forth, which has been captured with explicit or implicit content at unprecedented spatial and temporal resolutions (Kitchin 2014, p. xv). The second one is the emergence of Web 2.0 technology, which encourages internet users to generate and interact with, rather than only consume, online content (Batty 2012). This allows internet users to create, modify, and supply content to websites, boosting the production of user-generated content related to activities of the public. The penetration of these technologies undoubtedly has led to the explosion of crowdsourced data that are highly related to people's everyday life behavior (Kitchin 2014, p. 80). Consequently, crowdsourced data have been used in a large body of research, which quickly become an essential source of data-driven analysis in geography and urban studies (Miller and Goodchild 2015).

In the field of geography and urban studies, several scholars have added different perspectives to the basic concept of crowdsourced data, and therefore, it is essential to place crowdsourced data in their context. For example, one perspective adds to the discussion by Crooks et al. (2015) stating that the term crowdsourcing, coined by Howe (2006), implied a coordinated bottom-up grassroots effort to contribute information, which is not necessarily limited to geographical information. Adopting this principle, Goodchild (2007) introduced the term volunteered geographical information (VGI) to refer to the geographical content generated by nonexpert users. However, Harvey (2013, p. 34) questioned the misuse by researchers who use VGI to refer to data sets that are contributed rather than volunteered by people. He argued that both volunteered and contributed data should be aggregated into the concept of crowdsourced data. Sui et al. (2013, p. 2) also pointed out that VGI is referred to as a type of crowdsourced data for geographic knowledge production. In the book of The Data Revolution, Kitchin (2014, p. 96) reviews concepts of various data types in the context of humanities and social sciences and mentions that data that are sourced from a large group of people could be recognized as being crowdsourced, for example, social media data. When applied to urban studies, Crooks et al. (2015) argued that crowdsourced data include explicit sources of collaborative, user-generated mapping and an implicit source such as social media. Over time, given the spread and depth of the type of data that have been generated from devices and platforms, the definition has broadened. Supporting this, See et al. (2016) reviewed the abstracts of 25,338 scientific papers about citizen-derived geographic information published between 1990 and 2015. The literature described this phenomenon using a multitude of terms, which have emerged from different disciplines some focused on the spatial nature of the data such as volunteered geographic information (VGI) and neogeography, while other terms have much broader applicability, e.g., crowdsourcing, citizen science, and user-generated content, to name but a few. After identifying the sharp rise of the term crowdsourcing among other 27 relevant terms in academia, See et al. used the term crowdsourced geographic information as an umbrella term to represent different types of terms mentioned previously. Building on research by See et al. (2016), the concept of crowdsourced data in this paper refers to data both volunteered and contributed by individuals through ICT-integrated devices and user/system interaction with Web 2.0 technology. The term crowdsourced emphasizes the process of data collection, which refers to data sourced by the crowd, rather than the process of data generation. In this context, the main types of crowdsourced data in this paper cover social networking data, points of interest (POIs), and collaborative websites.

In the age of big data, digital data and cities have formed a wide-ranging, diverse, and complex relationship (Kitchin et al. 2017a, p. 44). Crowdsourced data have shown potential in understanding urban activity and its underlying patterns and have been used to solve complex problems or fill important gaps in data analysis that traditional data sets could not cover in urban analysis (Long and Liu 2016 Thakuriah et al. 2016). First of all, since crowdsourced data emerged with location-based services, they are able to provide geographic information such as geotag or geolocation, which is the most rudimentary and vital attribute for urban spatial analysis (Kitchin et al. 2017b, p. 6 Thatcher et al. 2018, p. 123). Second, crowdsourced data are characterized as high-frequency, which updates information that reflects what is happening at present. Furthermore, crowdsourced data are far more cost-effective than traditional data such as surveys or government censuses. Most importantly, this type of data has been collected from volunteered individuals, and their content includes rich information related to urban activity. It should be noted that although the aforementioned advantages of crowdsourced data have been widely perceived by scholars, they are still far from the point of view, abandoning traditional data sets such as traditional census and questionnaire-based data for understanding urban activity. When considering the total number of users and producers of crowdsourced data, they represent only a small fraction of the population therefore, it would be erroneous to even consider replacing robust census data collection methods with crowdsourced data harvest as a solution for all data problems.

Although the advantages of crowdsourced data have been widely recognized and applied widely, it is apparent that a systematic understanding of how crowdsourced data contribute to urban activity analysis is still lacking. Previous studies either examined crowdsourced data in a general context or focused on specific application of crowdsourced data in urban studies. It is still difficult for researchers in urban studies to have an overall understanding of crowdsourced data in terms of data types, metrics, and methodologies, and furthermore to apply the data in their studies (Shelton et al. 2015 Chen et al. 2017 Xu et al. 2017). Particularly in the current context of big data, how to engage powerful techniques from computer science in terms of data mining is also an obstacle for the majority of researchers in urban dynamics (French et al. 2017). Therefore, this paper aims to investigate how crowdsourced data mining helps understand urban activity and understand how the established perception of crowdsourced data will replace other types of data collection. In order to achieve these goals, this paper not only focuses on the types and characteristics of crowdsourced data but also critically presents how the methods are applied to data processing. Therefore, it is anticipated that this paper will offer urban researchers the opportunity to develop more robust applications while analyzing urban activity. This study reviews the literature of crowdsourced data applications in the domain of urban activity since 2013. It first introduces review methods, especially for literature inclusion and bibliometric analysis. Based on the cocitation analysis of included papers, it then identifies the fundamental domains, key researchers, and papers on the topic of crowdsourced data. This is followed by a qualitative review of synthesizing data sources, applications, and methods engaged in spatial analysis and sociodemographic and perception analysis. This review also summarizes the potential challenges of crowdsourced data mining.

Master of Computer Science

The Master of Science in Computer Science aims at equipping students with advanced skills in Computer Science.

The objectives of the program are :

  1. To provide students with in-depth knowledge of the theoretical and practical aspects of Computer Science so as to satisfy the technological needs in the private and public sector.
  2. To provide students with advanced knowledge and special skills set in the key areas of computer security, computer programming, data science, and cloud computing.
  3. To equip students with the knowledge and skills necessary to meet the ever-evolving demands of the Computer Science profession.
  4. To provide students with skills to deploy and manage Computer infrastructure in organizations so as to improve their effectiveness.
  5. To provide students with research skills which will help them grow with the technological advancements as well as help them participate in the development of new technologies.

Main Features in the program

Areas of Specialization

The MSc in Computer Science program is an evening program that is completely privately sponsored. The curriculum has two areas of specialization:

A student pursuing a MSc in Computer Science will be required to specialize in one of these tracks. The choice of the areas of specialization was dictated by the current trends and needs in the Computing fifield in the region and internationally.

Software and Systems Security

Uganda and the rest of the African continent have witnessed a tremendous increase in the adoption and use of automated computing systems. The region has also seen increase in the usage of the Internet and online IT systems. Computerization increases precision, speed, reliability, availability and reduces cost. Computerization has been applied in sensitive/critical areas like finance (e.g., mobile money and online banking), records keeping, monitoring and tracking.

Design and implementing secure computer systems is an ever increasing challenge worldwide. Unfortunately, most organizations put emphasis on the functionality of the computerized systems but pay less attention on the susceptibility of the systems to malicious attack by intruders. Without proper implementation of security, organizations could suffer from high security risks including fifinancial losses. In some cases, businesses can be thrown several years back and rendered uncompetitive.

There is shortage of computer security professionals in Uganda and internationally. The Software and Systems Security track therefore aims at producing computer security experts who will be able to design, develop, implement and manage secure computing systems and networks. The graduates will also be able to critically evaluate threats and vulnerabilities and integrate appropriate security strategies in computing systems and networks.

Artificial Intelligence and Data Science

The Artificial Intelligence and Data Science option aims at producing graduates equipped with skills to process, analyse and extract insight from huge amounts of data. It draws upon our world-leading expertise in the areas of machine learning, computer vision and image processing, visual analytics, high-performance computing, data mining and information retrieval.

There is a growing demand of professionals with this skills set because individuals and organizations are continuously producing vast amounts of real-time heterogeneous data (known as Big Data). Big Data challenge in areas such as health, business, security,intelligent transport, energy efficiency, education, retail and the creative industries.

This option will equip students with advanced knowledge and hands-on experience in algorithms, tools, and techniques for managing and processing big data.

Emphasis on Research, Problem Solving and Transdisciplinarity

The MSc in Computer Science program puts strong emphasis on research, transdisciplinarity and problem solving using advanced computational thinking skills.

This is because Computer Science is a highly ever-evolving field that demands keeping up with the most up-to-date research and trending advancements not only in Computer Science but also in other fields. Most graduates of Computer Science work in fast changing and technically challenging environments that require continuous research and learning.

In order to produce graduates who can work successful in the field of Computer Science, the curriculum ensures that the two offered options have a heavy component of research and problem solving. The modes of delivery and research problems are designed to equip students with skills to tackle inter and trans-disciplinary research abilities

Career Options Arising from the MSc in Computer Science program

The graduates from the MSc. in Computer Science find themselves in different kinds of environments for example, academia, research, industry, government, private and business organizations. The list below provides some of the possible career options for a graduate of the MSc. in Computer Science:

  • Computer/Cyber Security Expert
  • Software Engineers/programrs
  • Data Scientist
  • ICT Project Consultants
  • Systems Security Analyst
  • Researcher
  • Systems Analyst, and Business Intelligence Analyst
  • Database, Systems and Network Administrators

The Program

Target Group

The program is designed for graduates from computing (Computer Science, Computer Engineering, and Software Engineering) and closely related fields, who wish to gain advanced knowledge in Computer Science. The broad target groups include but not limited to:

  • those interested in pursuing both academic and professional careers requiring advanced Computer Science knowledge.
  • professionals interested pursuing careers in the fields of computer security, data science, network security, analytical, programming, software engineering/ software development, and cloud computing, among others.
  • those interested in pursuing PhD research in Computer Science.

Program Duration

The duration of the program is four semesters spread in two years. Each semester has fifteen weeks of studying and two weeks of examinations

Tuition Fees

Tuition fees for privately sponsored students is 5,000,000 Uganda Shillings per academic year for Ugandans and 12,780,000 Uganda Shillings for international students.


Admission requirements

To qualify for admission to the MSc. in Computer Science, a candidate must fulfill the general Makerere University entry requirements for Masters Degrees, and in addition, the candidate must be a holder of either:

  1. A minimum of Second Class (lower division) undergraduate degree in Computer Science, Computer Engineering, Software Engineering, or a closely related field from a recognized university/institution.
  2. A minimum of Second Class (lower division) postgraduate diploma in Computer Science, Computer Engineering, Software Engineering or a closely related field from a recognized university/institution.

Candidates from closely related fields should have taken core computer science courses in undergraduate or postgraduate diploma studies including: compiler design, automata and complexity, object-oriented programming languages, data structures and algorithms, computer architecture, mathematics particularly in linear algebra, statistics and calculus.

Upgrading from Postgraduate Diploma

If a candidate holds a Postgraduate Diploma in Computer Science of Makerere University of at least a Lower Second class, he/she may apply to join in the second year of the Master of Science in Computer Science provided they have followed equivalent courses in the post graduate Diploma. In such a case, the applicant is expected to undertake research in the second year and any remaining course units to meet the minimum requirement for the award of the MSc. in Computer Science Degree.

The upgrade of the PGD Computer Science to the MSc. Computer Science described above must be supported by relevant academic documents attained for the PGD Computer Science of Makerere University. This must be done for purposes of analyzing the relevant academic courses that must have been attempted as per the current MSc. in Computer Science curriculum. Any courses that were not attempted by the applicant as per the first years course load of the current MSc. in Computer Science curriculum must be taken. When a student graduates with a Postgraduate Diploma of Computer Science of Makerere University with a classification of Pass, s/he can apply for the Master of Science in Computer Science but is admitted to the first year of the MSc. in Computer Science program.

The curriculum is made up of two plans - Plan A and Plan B.

  • Plan A: Plan A is made up of two semesters of coursework (40 credit units) and two semesters of research and writing of a dissertation (20 credit units) and Seminar Series (2 credit units). The minimum total credit units (graduation load) for Plan A is 62.
  • Plan B: Plan B is made up of three semesters of coursework (58 credit units) and one semester of developing a project (10 credit units). The minimum total credit units (graduation load) for Plan B is 68.

Weighting System

Semester Load

The normal load for Year one is 20 credit units per semester. The normal semester load for Year Two Semesters One is 22 credit units for Plan A while that for Year two Semester one is 18 credit units for Plan B. The minimum semester load for Year two Semester two is 20 for Plan A while that for Year two Semester two for Plan B is 10 credit units.

Minimum Graduation Load

The minimum graduation load for Plan A is 62 credit units while the minimum graduation load for Plan B is 68.

Evaluation and Grading

Every course unit will be graded within and at the end of the semester in which it is covered. The progressive assessment will constitute 40% of the overall mark and the final examination will constitute 60%. The form of progressive assessment and final examination (e.g., project-based/research-based or written) may vary depending on the course unit. Grade points will be allocated to the nal mark got in every course unit according to the table below:

Marks Letter Grade Grade Point (GP) Interpretation
90 - 100 A+ 5 Exceptional
80 - 89 A 5 Excellent
75 - 79 B+ 4.5 Very good
70 - 74 B 4 Good
65 - 69 C+ 3.5 Fairly good
60 - 64 C 3 Pass
55 - 59 D+ 2.5 Marginal fail
50 - 54 D 2 Clear fail
45 - 49 E 1.5 Bad fail
40 - 45 E- 1 Qualified fail
Below 45 F 0 Qualified fail

The following additional letters are used where appropriate:

  • W - Withdraw from Course
  • I - Incomplete
  • AU - Audited course only
  • P - Pass
  • F - Failure

Calculation of Cumulative Grade Point Average (CGPA)

The CGPA is calculated as follows:

Where GPi is the Grade Point score of a particular course unit i CUi is the number of Credit Units of course unit i and n is the number of course units done so far.


Progression is regarded as normal, probationary or discontinuation as per the standard Makerere University senate guidelines:

  1. Normal Progress: This occurs when a student passes each course unit taken with a minimum Grade Point of 3.0.
  2. Probationary: This is a warning stage and occurs if either the Cumulative Grade Point Average (CGPA) is less than 3.0 and/or the student has failed a core course unit. Probation is waved when these conditions cease to hold.
  3. Discontinuation: When a student accumulates three consecutive probations based on the CGPA or the same core course unit(s), s/he shall be discontinued. A student who has failed to obtain at least the pass mark of 60% or Grade Point of 3.0 after the third attempt in the same course unit(s) s/he had retaken shall be discontinued from his/her studies at the University. A student who has overstayed in an academic program by more than two (2) years shall be discontinued from his/her studies at the University.

Retaking a Course Unit

A student will retake all courses where he/she has a grade point less than 3.0.

Data Collections

Here are the subcategories under “Data Collections” on this page.

  • Access Tools to Multiple Data Series — tools (often query-based) for gathering data from a variety of sources
  • Statistical Compendia — handy single volume databooks, often downloadable
  • Indices, Rankings, and Comparisons — indices, rankings, and comparisons of states and metro areas by various criteria
  • Economic Analysesand Forecasts — reviews of recent and projections of future economic conditions and trends
  • Guides to Data on the Web — sites (in addition to ours!) with annotated links to sources of socioeconomic data
  • Data Intermediaries — organizations assisting users in regional data access and interpretation
  • Search Engines — tools for searching government data sources
  • Microdata — data series with observations for individual firms and people
  • Mapping Resources — GIS and related tools for mapping socioeconomic data and geographic features
  • Geographic Classifications and Codes — Classifications, definitions, and numerical codes for states, metro areas, counties, places, and smaller geographic units.

Access Tools to Multiple Data Series

MapStats, Interagency Council on Statistical Policy
Access to data from multiple federal statistical agencies, for states, counties, federal judicial districts, and Congressional districts.

    — Clickable-map access to data profiles for states and counties. Data topics include population, income and poverty, housing, business activity, and geography. Also provides access to data tables from Economic Census, County Business Patterns, USA Counties, Small Area Poverty and Income Estimates, and Decennial Census. — Query-based access to wide variety of Census Bureau data, including Annual Survey of Manufactures, residential building permits, Census Tract Street Locator, Consolidated Federal Funds Report, County and Zip Business Patterns, USA Counties, and occupation by race and sex. — Query-based access to data from 1990 Decennial Census, 1997 Economic Census, and American Community Survey, for states and metro areas.

Bureau of Labor Statistics

    — Labor force and nonfarm wage and salary employment for states and metro areas, latest six months. — Summaries of press releases concerning employment, wages and benefits, and union membership for states and metro areas.

Geospatial and Statistical Data Center, University of Virginia
Query-based access to several Federal regional economic databases, including County and City Data Book, County Business Patterns, Regional Economic Information System, Regional Economic Projections, and 1990 Census Public Use Microdata Samples.

National Atlas of the United States, U.S. Geological Survey
On-line tool for mapping physical, social, and economic data from multiple federal sources.

Regional Economic Conditions, Federal Deposit Insurance Corporation
Data on labor force, employment, income, housing, and real estate, for states, metro areas, and counties, updated eight times annually. Tools provided to build maps, tables, and charts.

State Economic Data, Northeast Midwest Institute
State tables on a variety of economic topics, including demographics, prices, economic growth, employment, income, housing, energy, and federal spending. Based primarily on federal data.

Cluster Mapping Project, Harvard Business School
Economic profiles of states, metro areas, and economic regions regarding overall performance and the composition of the economy. State profiles, with narrative, also available in .pdf format (February 2002).

Socioeconomic Data, Population Reference Bureau
Access to wide variety of U.S. and international data and articles on socioeconomic topics. Of particular interest:

    — Access to regional data and analyses on a variety of topics. — Query-based access to data on multiple topics, by state. — State-level analyses of population, labor force participation, unemployment, health insurance coverage, and other social and economic characteristics.

Focus on States, Center on Budget and Policy Priorities
Access to state-level reports on multiple topics of concern to low-income families (e.g., tax and budget, health, welfare, housing, food).

Child Trends DataBank, Child Trends
State and local indicators on child and youth well-being. Topics include health social and emotional development income, assets and work education and skills demographics and family and community.

Regional Information Clearinghouse, National Association of Regional Councils
Variety of socioeconomic data provided for substate regions (typically, areas covered by regional councils).

State of the Cities Data Systems, Department of Housing and Urban Development
Query-based access to metro, central city, and suburban data from Decennial Census, County Business Patterns, and Local Area Unemployment Statistics.

Census Reports, Federal Financial Institutions Examination Council
Population, housing and income data by census tract, for metro and balance-of-state areas, based on Decennial Census and the annual Department of Housing and Urban Development estimates of median family incomes., ERsys
Information on demographics, environmental factors, economic indicators, housing, media, schools, and transportation for over 1,300 cities.

Community Demographics, Council for Urban Economic Development and Conway Data, Inc.
Query-based access to about 150 single-point data elements for states, metro areas, and counties.

American Community Network
Query-based access to 200 single-point data elements for counties and metro areas.

RealEstateJournal, Wall Street Journal

    — Wide variety of data on the 100 largest U.S. cities. Topics include housing, weather, crime, cost of living, taxes, health care, education, and transportation. Data drawn from public and private sources. — Access to data tools for use in community relocation choices, covering cost of living, schools, crime, and moving costs.

National Neighborhood Indicators Partnership, The Urban Institute
Links to multi-source neighborhood and city data provided by NNIP partners. The purpose of NNIP is to further the development and use of neighborhood information systems in local policymaking and community building. (As of 2002, partners in 12 major cities.)

Regional Reports and Maps, Metropolitan Area Research Corporation
Analytic reports and geographically detailed color-coded maps describing social, racial, public educational, fiscal, land use, and political trends, for each of the largest 25 metropolitan areas. Network
A series of free and fee-based web sites provide data on numerous topics for states and metro areas. Of particular interest:

    — Access to’s library of time series data at the national, state, and metro level from a wide variety of federal sources. — On-line access to wide variety of historical time series of economic data, for states, metro areas, and counties. $ — On-line time series and forecasts for states and metro areas, for employment and occupation. $

Demographics Daily,, American City Business Journals
Data and analysis for states, metro areas, counties, and zipcodes on various socioeconomic topics, including economic conditions, business activity, population, race, wealth, education, environment, families and children, health, housing and construction, and transportation.

Economic Time Series, $
Free access to a variety of Federal government regional data time series, including those provided by the Bureau of Labor Statistics (labor force, employment and unemployment, Consumer Price Index), the Bureau of the Census (building permits), the Bureau of Economic Analysis (Gross State Product), and Regional Federal Reserve Banks. For fee, forecasts provided on-line for any variable.

EconData, University of Maryland
Access to .zip files of Gross State Product and state personal income data from Bureau of Economic Analysis, and state employment, hours and earnings data (CES/790) from BLS.

Regional Federal Reserve Banks
Each of 12 Federal Reserve Banks provides access to economic data for its region.

Economic Insight, Economic Information Systems, Inc. $
Clickable-map access to economic profiles of states and metro areas. Detailed analysis and performance indicators available for purchase, including personal income and average wages employment and earnings concentration, displacement, and competitiveness by detailed industry and employment and income growth and volatility. Data from 1970-present, with twenty-year trend forecast of Gross State Product, employment, income, and population. For states, 330 metro areas, and counties.

The Right Site, Easy Analytic Software, Inc. $
Wide variety of demographic, economic, consumer, and quality of life data and forecasts, for states, metro areas, counties, cities, zip codes, and census tracts. Provided on CD-ROM.

Socioeconomic Estimates and Projections, Woods and Poole $
Estimates and projections for a wide variety of demographic and economic variables, 1970 to 2020, for states, metro areas, and counties. Provided on CD-ROM and in print.

Economic Data, Global Insight $
On-line access to a wide variety of economic databases at all levels of geography.

Social Sciences Data Collection, University of California at San Diego
Search engine and web links for reaching a very large number of on-line socioeconomic data bases. Links provided to over 400 sites for specific data series, 100 on-line data archives and libraries, 150 data gateways (i.e., sites with links to on-line data sites), searchable catalogs of data vendors and libraries, and a list of data distributors and vendors.

Center for International Earth Science Information Network, Columbia University
Access to a variety of demographic, economic, and land use data bases, as well as mapping resources.

Inter-University Consortium for Political and Social Research, University of Michigan
Access to wide range of socioeconomic data and reports, covering several decades.

Statistical Compendia

U.S. Statistical Abstract, Bureau of the Census
Wide variety of social and economic data for states, metro areas, and cities. Tables available in .pdf format, 1995-latest edition.

Guide to State Statistical Abstracts, Bureau of the Census $
Bibliography, and on-line access where available, of statistical abstracts produced by states.

Small Business State Profiles, Small Business Administration
Data on on each state’s small business economy – number of firms, small business income, industrial composition, job growth and data on minority and women-owned businesses, 1995-latest year.

Green Book, Ways and Means Committee, U.S. House of Representatives
Historical state data on a wide variety of social and economic topics, including Social Security, employment, earnings, welfare, child support, health insurance, the elderly, families with children, poverty and taxation. Biannual publication (latest 2000).

State and Metropolitan Area Data Book, Bureau of the Census
Wide variety of social and economic data for states and 273 metro areas. State and metro area rankings, with data, available in .html format. Full publication available in .pdf format.

County and City Data Book, Bureau of the Census $
Wide variety of social and economic data for all 3,141 counties, 1,078 cities with 25,000 or more inhabitants, and 11,097 places of 2,500 or more inhabitants. Rankings, with data, available on-line for cities of 200,000 or more. Full publication available in print or on CD-ROM.

USA Counties
Query-based access to wide variety of social and economic data for all counties.

State of the Nation’s Cities, Center for Urban Policy Research, Rutgers University
Data on employment and economic development, demographic measures, housing and land use, income and poverty, fiscal conditions, and other health, social, and environmental variables, for 77 cities and their suburbs. Downloadable file only.

Community Profiles, Knight Foundation
In-depth data analyses of 26 communities in which the foundation makes grants. Data from secondary and primary sources. Communities include major metropolitan areas such as Miami, San Jose, Detroit and Philadelphia, as well as smaller communities such as Boulder, Colo., State College, Pa., Myrtle Beach, S.C., and Grand Forks, N.D. (Released March 2000.)

Industry and Trade Outlook, International Trade Administration and DRI/McGraw-Hill
Data, analysis, and forecasts regarding U.S. competitiveness, employment, productivity, investment, and trade, by industry. Primarily national data.

Data Tables, Economic Report of the President 2000, Council of Economic Advisors
Access to statistical tables concerning income, employment and production, for the nation as a whole.

Indices, Rankings, and Comparisons

Ranking Tables, American Community Survey, Bureau of the Census
Annual tables ranking states, counties, and places on a variety of demographic, social, economic, and housing variables (2000-latest year).

Census 2000 Population and Housing Rankings, Bureau of the Census
Ranking tables of Census 2000 data, for states, metro areas, counties, and incorporated places of 100,000 or more.

KIDS COUNT Census Data Online, Annie E. Casey Foundation
Rankings of demographic data regarding children and their families, for states, counties, cities, metro areas, and Congressional districts. Topics include age, sex, race and ethnicity, and living arrangements. Based on Census 2000 short form data.

State and Metropolitan Area Data Book: 1997-98, Bureau of the Census
Wide variety of social and economic data for states and 273 metro areas. State and metro area rankings, with data, available in .html format. Full publication available in .pdf format.

Economic Insight, Economic Information Systems, Inc. $
Clickable-map access to comparative analyses of states and metro areas. For subscription, on-line access to data analysis, comparative analysis, and forecasts for states, 330 metro areas, and all counties. Data include employment, income, population, and Gross State Product.

Demographics Daily,, American City Business Journals
Rankings of states, metro areas, and counties on various socioeconomic topics, including business, education, environment, families and children, health, housing and construction, and transportation.

    — Monthly economic ratings for 276 metro areas, based on population, employment, and income trends. — Monthly economic ratings for states and DC, based on population, employment, and income trends. — Listing of comparisons of states, metro areas, and counties.

Area Vitality Profiles, Brandow Company, Inc. $
Series of data reports ranking states, metro areas, and counties regarding high growth firms, startup activity, business retention, and business attraction. Annual lists of top-ranked areas available free.

State Ranking Publications, Morgan Quitno Press $
Annual, monthly, and customized publications that rank states on a variety of characteristics, including agriculture, crime, defense, government finance, health, economy, education, energy, environment, geography, housing, population, social welfare, and transportation.

Economic Climate Study, Office of the Forecast Council, Washington State
Annual study ranking states in a wide variety of categories, including economic performance, quality of life, education, workforce skills, infrastructure, and the cost of doing business. (September 2001)

Development Report Card for the States, Corporation for Enterprise Development
Indicators and rankings of economic performance, business vitality, development capacity, and tax and fiscal system, for states. Annual report, includes over 50 measures.

State Competitiveness Report, The Beacon Hill Institute, Suffolk University
Index of state climate for business competitiveness, examined in nine areas, including government and fiscal policy, institutions, infrastructure, human resources, technology, finance, openness, domestic competition, and environmental policy. (December 2001)

State Asset Development Report Card, Corporation for Enterprise Development
Comparison of states on asset outcomes and policies, 68 indicators. Examples of topics include financial assets, homeownership, human capital, health insurance, wage protection, and business development. (October 2002)

Asset Index, Asset Development Institute, Center on Hunger and Poverty, Brandeis University
State-by-state comparative study of individual assets important for economic success. Categories for the 39 indicators are job-based and related income assets, human capital, and financial assets. (September 2002)

State of Caring Index, United Way of America
State-level indicators on well-being, covering economic and financial well-being, education, health, civic engagement, safety, and the environment. Reports and query-based tables and graphs available.

State Science and Engineering Profiles and R&D Patterns, National Science Foundation
Rankings of states in terms of a variety of technology-related indicators, by state.

State Science and Technology Indicators, Office of Technology Competitiveness, Technology Administration
Indicators of science and technology activity, by state (.pdf file). (October 2001) Published biannually, prior edition available.

State New Economy Index, Progressive Policy Institute
Comparison of states in terms of 17 indicators of New Economy assets and activity. Categories include knowledge jobs, globalization, economic dynamism and competition, transformation to digital economy, and technological innovation capacity.

State New Economy Index, Milken Institute
Ranking of states in terms of 12 indicators of New Economy assets and activity.

Advanced Technology Statistics, American Electronics Association $

  • CyberStates and CyberCities — Data on advanced technology activity for states and metro areas. See press releases for free data summary by state and metro area. — Data-based assessment, by state ( with rankings), of extent to which K-12 systems and universities are preparing students for technology-related jobs. (January 2002)

State Technology and Science Index, Milken Institute $
Ranking of states on 73 measurements in five categories. Summary available free, detail for fee. (September 2002)

Education Watch Online, The Education Trust
For states, data, charts, and comparisons on educational achievement, curricula, teacher qualifications, and expenditures. Data disaggregated by race, ethnicity, and income. Data available through query-based access and .pdf reports.

State Report Cards, Education Week
Grades and rankings for public education systems, by state, in the areas of student achievement, standards and accountability, teacher quality, school climate, resources, and equity.

Measuring Up, National Center for Public Policy and Higher Education
Biennial state report card for higher education. Rankings in five categories, including preparation, participation, affordability, completion, and benefits.

Grading the States, Governing
Letter grades for each state in five areas — financial management, capital management, human resources, managing for results, and information technology. (February 2001)

Economic Freedom in the States, Clemson University
Index of economic freedom by state, 1999. Areas examined include fiscal, social welfare, size of government, regulation, and judiciary.

Digital State Survey, Center for Digital Government and Government Technology
Annual ranking of state governments in eight dimensions of digital technology utilization: law enforcement and the courts, social services, e-commerce/business regulation, taxation/revenue, digital democracy, management/administration, education and GIS/transportation. (January 2002)

State Health Facts Online, The Henry J. Kaiser Family Foundation
State comparisons on demographics, health status, and health policy, including health coverage, access, financing, and state legislation.

State Health Ranking� edition, UnitedHealth Group
Annual rankings of states on various dimensions of health status, including lifestyle, mortality, disease, occupational safety, and access to health care.

Children in the States, Children’s Defense Fund
State rankings regarding the condition of children and their families. Rankings cover health insurance and health care, natality, poverty, child support, and education.

KIDS COUNT, Annie E. Casey Foundation
A national and state-by-state effort to track the status of children in the United States. Includes on-line access to annual KIDS COUNT Data Book, which provides state data on the educational, social, economic, and physical well-being of children.

The Right Start, Annie E. Casey Foundation and Child Trends
Indicators of the well-being of infants, for states and metro areas, with rankings. Annual report.

Profiles of Individual Charitable Contributions, National Center for Charitable Statistics, The Urban Institute
Rankings of charitable giving by state, including percentage of returns with contributions, average contribution per return, and contributions as a percentage of income.

State of the States, Center for Policy Alternatives
Index of state indicators concerning voter turnout, welfare, women, Medicaid enrollment, and Hispanic population.

Energy Data Rankings, Energy Information Administration
Rankings of energy production, prices, and consumption, by state.

Workers’ Compensation Premium Rates, Workers Compensation Division, Oregon Department of Consumer and Business Services
Biannual ranking of states by workers’ compensation insurance premium rates.

State Business Climate Rankings, Site Selection
Annual state rankings based on measures concerning new and expanded corporate facilities. (November 2002)

County and City Data Book, Bureau of the Census
Wide variety of social and economic data for all 3,141 counties, 1,078 cities with 25,000 or more inhabitants, and 11,097 places of 2,500 or more inhabitants. Rankings, with data, available on-line for cities of 200,000 or more. Full publication available in print or on CD-ROM.

Best Places to Live, Money Magazine
Annual ranking of 300 metro areas regarding quality of life. Data provided on 20 factors, including those pertaining to pollution, crime, weather, arts and culture, housing, and the economy.

Yahoo! City Comparison
Comparisons and rankings of cities in terms of quality of life (e.g., crime, weather, pollution), real estate (e.g., property tax and home purchase cost), and the economy (e.g., unemployment, cost of living).

The City Report,
Comparison of cities in terms of cost of living, crime, and other information.

Sperling’s BestPlaces, Fast Forward, Inc.
Tools for comparing metro areas regarding cost of living, schools, crime, climate, housing, health, and other topics.

Ranking the Nation’s Big Cities, Cleveland Plain-Dealer
Rankings of the 65 cities with more than 250,000 population for seven characteristics–median household income, college education, homeownership, percent income spent on housing, poverty, and commute time. Based on Census 2000 Supplemental Survey data.

Metropolitan Racial and Ethnic Change, Lewis Mumford Center, State University of New York at Albany
Rankings of 331 metro areas in terms of racial and ethnic integration and segregation, for whole population and children.

State of the Cities (National Urban Policy Report), Department of Housing and Urban Development
Tables on jobs, business establishments, and average annual pay, for 77 cities and their suburbs. Annual report.

Economic Strength of Metro Areas, Policom Corporation
Annual rankings of 318 metro areas regarding economic strength, defined in terms of speed and consistency of growth over the last 25 years.

Places Left Behind in the New Economy, Department of Housing and Urban Development
Examination of unemployment, poverty, and population loss in 539 central cities.

Metropolitan New Economy Index, Progressive Policy Institute
Comparison of the 50 largest metro areas in terms of 16 indicators of New Economy assets and activities. Indicator categories include knowledge jobs, globalization, economic dynamism and competition, transformation to digital economy, and technological innovation capacity. Published April 2001.

Entrepreneurial Hot Spots, Inc.
Annual ranking of metro areas as location to start and grow a new business. Separate rankings for 50 large and 50 small metro areas. Prepared with Cognetics, Inc. Published December 2000.

Best Cities for Entrepreneurship, Entrepreneur
Annual ranking of metro areas as location for entrepreneurs. Separate rankings for mid-sized and large metro areas, five regions of country, and individual criteria (e.g., entrepreneurial activity, failure rate). Prepared with Dun & Bradstreet.

Best Places for Business and Career, Forbes and Milken Institute
Annual rankings of top 200 metro areas as places to start a new business or career. Rankings based on salary and job growth and high technology activity. Milken Institute press release. Published May 2001.

High Tech and Info Tech Metros, Hubert H. Humphrey Institute of Public Affairs, University of Minnesota
Rankings of metro areas in terms of high tech and information technology industry activities (.pdf file). Published August 2001.

High-Growth Company Index, National Commission on Entrepreneurship
Index of high-growth business activity in 394 Labor Market Areas (covering the entire country), 1992-1997. Published July 2001.

America’s High-Tech Economy: Growth, Development, and Risks for Metropolitan Areas, Milken Institute
Rankings of 315 metro areas regarding high technology activity. Published July 1999.

High-Tech Hot Spots, Entrepreneur’s Business Start-ups
Annual ranking of top 50 metro areas as location for new high technology businesses. Prepared with Dun & Bradstreet. Published November 2000.

Cincinnati’s Benchmarking Technology Study, Institute of Advanced Manufacturing Sciences, Inc.
Comparison of the technology assets and performance of 24 urban areas, using 17 indicators (.pdf file).

America’s Most Wired Cities, Yahoo Internet Life
Annual ranking of 86 metro areas regarding various dimensions of Internet use (e.g., percent households online, domains registered per 1,000 businesses, quality of local government sites). (May 2002)

Best States for E-Commerce, Progressive Policy Institute
Ranking of states regarding extent to which state governments facilitate Internet use for e-commerce, through regulatory, tax, and administrative policy.

World Class Communities for Manufacturing, Industry Week
Annual rankings of 315 metro areas as locations for manufacturing. Prepared by Cleveland State University. Published April 2001.

Social Capital Community Benchmark Survey, Kennedy School of Government, Harvard University
For 39 regions, scores and discussion regarding the presence of 11 dimensions of social capital, for example, civic leadership, association involvement, social trust, inter-racial trust, diversity of friendships, informal socializing, and faith-based engagement. Published March 2001.

Kid-Friendly Cities Report Card, Zero Population Growth
Data and grades for 239 cities regarding the health and well-being of children. Indicators cover health, education, public safety, economics, environment, and community life. Published 2001.

Competitive Cities Report Card, Reason Public Policy Institute
Rankings of 44 cities regarding the efficiency of 11 types of public services (e.g., libraries, street repair, police). (April 2001)

Economic Analyses and Forecasts

Current Economic Conditions (Beige Book), Federal Open Market Committee, Board of Governors of the Federal Reserve
By Federal Reserve District, summary of anecdotal information on current economic conditions through reports from Federal Reserve Bank directors and interviews with key business contacts, economists, market experts, and other sources. Prepared eight times a year.

Regional Outlook, Federal Deposit Insurance Corporation
Quarterly analysis of current national and regional trends that may affect
the risk exposure of insured depository institutions. Analyses prepared for each of eight FDIC regions.

State of Latinos, William C. Velasquez Institute
Profiles of the socioeconomic well-being of Latinos, in terms of education, employment, entrepreneurship, housing, and health, for states with large Latino populations.

Regional Economic Analysis and Forecasts, Global Insight
Region-specific reports and forecasts covering a variety of economic topics, for states, metro areas, and counties. Individual quarterly reports for states and metro areas can be purchased on-line from Northern Light.

Regional Economic Analysis and Forecasts, [email protected] $
State and metro area economic analyses and employment and occupational forecasts for states and 315 metro areas.

Economic Insight, Economic Information Systems, Inc. $
Data analysis, comparative analysis, and 20-year forecasts for states, 330 metro areas, and all counties. Data include employment, income, population, and Gross State Product.

The Right Site, Easy Analytic Software, Inc. $
Wide variety of demographic, economic, consumer, and quality of life data and forecasts, for states, metro areas, counties, cities, zip codes, and census tracts. Provided on CD-ROM.

Socioeconomic Estimates and Projections, Woods and Poole $
Estimates and projections for a wide variety of demographic and economic variables, 1970 to 2020, for states, metro areas, and counties. Provided on CD-ROM and in print.

Economic Forecasts, $
Mechanical forecasts provided on-line for labor force, employment and unemployment, Consumer Price Index, building permits, and Gross State Product.

Area Profiles: Business Migration, Startup, and Retention, Brandow Company $
Customized reports on regional business migration, startup, and retention patterns.

Guides to Data on the Web

FedStats, Federal Interagency Council on Statistical Policy
Access to Web sites of over 70 Federal statistical agencies.

Guide to On-Line Sources for Economic Development Data, University of Minnesota
Links to economic data sites maintained by federal and state agencies.

Business and Economics Numeric Data, Mansfield University
Links to a substantial number of economic data Web sites.

State Workforce Development Agencies, ICESA
Links to state workforce development agency Web sites.

State Economic Data Sources, Association of University Business & Economic Research Centers
Links to sources of state-specific data, including labor market information agencies, economic development organizations, and university-based business and economic research centers.

Census State Data Centers, Bureau of the Census
Links to Census Bureau-sponsored State Data Centers providing Web access to state-specific data.

State and Local Government on the Net, Piper Resources
Links to state and local government Web sites, as well as to national trade associations in public administration.

Resources for Economists on the Internet, University of Southern Mississippi
Links to over 700 economic data and information sources.
Links to economics-related Web sites.

Data Intermediaries

State Economic Data Centers, Association of University Business & Economic Research Centers
List of university business & economic research centers that aid data users in accessing and using federal and state socioeconomic data. Also State Data Centers and Business and Industry Data Centers, which aid in accessing and using Census Bureau data.

    — State Data Centers provide training and technical assistance in accessing and using Census Bureau data. Includes Business and Industry Data Centers, created to meet needs of local businesses for economic data. Additional information available through SDC/BIDC Network. — Listing of non-profit organizations that serve as repositories of Census data and reports in underserved communities. — Contact information for the 12 regional offices of the Census Bureau. Offices provide access to libraries with agency publications going back several decades, and to personnel available to provide information on data sources and uses.

Bureau of Labor Statistics

    — Links to state workforce development agencies, which provide assistance accessing and using a variety of socioeconomic data, particularly those prepared in cooperation with BLS. — Contact information for the eight BLS regional offices. Offices provide access to libraries with agency publications going back several decades, and to personnel available to provide information on data sources and uses.

BEA User Group List , Bureau of Economic Analysis
Links to state agencies, universities, and Census State Data Centers that disseminate BEA regional data.

Federal Depository Libraries
Listing of Federal Depository Libraries, which provide free public access to a wide variety of federal government information in both print and electronic formats, and have expert staff available to assist users. There are 1,400 depository libraries nationwide.

List of members of ACCRA, typically in chambers of commerce and economic development agencies. Members can provide advice on data sources and uses.

Economics Departments, Institutes and Research Centers, (EDIRC)
Listing of economics departments, institutes, and research centers, by state, with links.

Search Engines

FirstGov, General Services Administration
Search engine for accessing web pages and documents across U.S. government.

Government Information Locator Service (GILS), U.S. Government Printing Office
Access to socioeconomic data through search engine that identifies federal information resources.

Google US search
Access to federal government and military web sites., Northern Light $
Access to federal government and military web sites.


Microdata , Bureau of the Census
The Census Bureau offers several tools for accessing microdata from the various surveys it carries out. These include:

    (Federal Electronic Research and Review Extraction Tool) — Access to microdata from the Current Population Survey, the Survey of Income and Program Participation, and the National Health Interview Survey. Provided in conjunction with the Bureau of Labor Statistics. — Access to Public Use Microdata Samples (PUMS) from the Decennial Census, the American Housing Survey, and the Current Population Survey. — Sample of responses to Decennial Census and American Community Survey, available on CD-ROM, disk, and tape. $ — A longitudinal file of all U.S. private business establishments and firms (less farms, railroads, Postal Service), 1989-latest year. Data include employment and payroll. Co-funded by Census and the Small Business Administration. Access in Washington and several regional data centers available to qualified researchers analysis also available on fee-for-service basis from Center for Economic Studies. — AHS microdata for metropolitan areas, on CD-ROM .$ — Access to a number of Census Bureau longitudinal data bases describing corporate characteristics and behavior. Surveys include the Census of Manufactures, the Annual Survey of Manufactures, Enterprise Statistics, Quarterly Financial Reports, and Survey of Manufacturing Technology. Access in Washington and several regional data centers available to qualified researchers analysis also available on fee-for-service basis.

Integrated Public Use Microdata Series, University of Minnesota
Access to microdata samples from 15 Decennial Censuses, from 1850 through 1990.

SESTAT, National Science Foundation
Public use data files of over 100,000 college graduates with an education and/or occupation in a natural science, social science or engineering field currently representing about 12 million scientists and engineers in the United States.

Health Microdata, National Center for Health Statistics
Public use data files on documentation from NCHS, with analytic tools.

Community Tracking Study, Center for Studying Health System Change
Public use files from a set of national biennial surveys regarding health systems in local markets and the nation as a whole. Data are collected in 60 randomly selected communities, with telephone surveys of households, employers, and physicians.

Mapping Resources

Geography Network, ESRI
Comprehensive on-line portal to maps and GIS data at all levels of geography.

GIS Data Depot, ThinkBurst Media, Inc. $

Spatial Data on the Web, Massachusetts Institute of Technology
Links to primary on-line sources of maps and GIS data.

Maps and Cartographic Resources, Bureau of the Census
Access to reference and thematic maps, cartographic boundary files, and on-line mapping resources. Also of interest from the Census Bureau:

    — Digital database of geographic features, such as roads, railroads, rivers, lakes, political boundaries, census statistical boundaries, etc. covering the entire United States. Contains information about these features such as their location in latitude and longitude, the name, the type of feature, address ranges for most streets, the geographic relationship to other features, and other related information. Maps available on-line through TIGER Map Service. — Access to Census 2000 geographic products. — On-line map preparation, using clickable maps, and including political and Census boundaries and major features, down to street level. Downloadable.

National Mapping Information, U.S. Geological Survey
Access to a variety of mapping resources. Of particular interest, the National Atlas,
providing small-scale, customized maps detailing geospatial (e.g., soils, watersheds) and geostatistical (e.g., crime patterns, population distribution) data.

National Geospatial Data Clearinghouse, Federal Geographic Data Committee
Access to collection of over 100 spatial data servers that have digital geographic data primarily for use in Geographic Information Systems (GIS), image processing systems, and other modelling software.

Web Links to State and Regional GIS Resources, Federal Geographic Data Committee
Links to individual state and regional GIS resource centers associated with the National Geospatial Data Clearinghouse.

Directory of On-Line Maps, MapDigger
Links to free on-line maps, by topic. Also, links to GIS data by state.

Geographic Information Systems and Spatial Data, U.S. Fish & Wildlife Service
Access to variety of data, tools, standards and metadata from the Fish & Wildlife Service and other organizations. Emphasis on topographical and environmental data.

Environmental GIS Tools, Environmental Protection Agency
Links to various on-line GIS resources.

GIS Products, GeoLytics, Inc. $
CDs of Census TIGER street, boundary, and zipcode geographic data.

Geographic Classifications and Codes

Geographic Classifications, Codes, and Resources, Bureau of the Census
Reference page regarding Census geographic concepts, definitions, classifications (e.g., urban/rural), and FIPS codes for states, counties, and places. Current and historical lists of metropolitan areas provided.

Geographic Correspondence Engine, Missouri Census Data Center
Tool for determining geocodes within specified geographic area. Codes available for states, metro areas, counties, places, census tracts, ZCTAs, urban/rural, legislative districts, and school districts. Corresponds to Census 2000 geography.

March 29, 2015

New Spatial Aggregation Tutorial for GIS Tools for Hadoop

Sarah motivates you to learn about spatial aggregation, aka spatial binning, with two visualizations of New York taxi data:

Now that I have your attention, -), from the post:

This spatial analysis ability is available using the GIS Tools for Hadoop. There is a new tutorial posted that takes you through the steps of aggregating taxi data. You can find the tutorial here.

4. Summary and conclusions

Public health syndromic surveillance allows for stakeholders and policymakers to estimate the magnitude and distribution of a potential infectious disease outbreak in real time or near real time. Testing some surveillance systems against yearly disease cycles (e.g., influenza) has shown that the systems do validly model and predict these outbreaks (Platt and Boccino et al. 2003 , Lombardo and Buckeridge 2007 ). The growing outbreak cluster can actually be recognized by a series of spatial detection with expending time frames. Typically, syndromic surveillance systems and their associated stakeholders cut across jurisdictional boundaries (hospital, city, county, public health unit, province, federal) and, therefore, require the cooperation of these various entities (Gestland et al. 2003 ). Cooperation and facilitation are required in order for such a system to function efficiently. This is a challenge that has been gradually overcome in many jurisdictions across Canada, the United States, and Europe in recent years (Moore 2004 , Rolfhamre and Grabowska et al. 2004 , Lombardo and Buckeridge 2007 ).

Disease processes result in disease patterns that are both complex in nature and may operate over a range of spatial and temporal scales to produce an array of intricate patterns of disease incidence during an outbreak (Graham et al. 2004 ). Although monitoring changes in the spatial pattern of diseases is extremely useful for early, rapid detection of infectious disease outbreaks, many of the methods reviewed above have not become routines in current surveillance systems. Interpretation of results from spatial and temporal algorithms can be tricky as different algorithms may result in different conclusions for a same disease pattern. Most surveillance systems focus more on temporal aberration detection and GIS-based spatial mapping and visualization (Moore 2004 , Moore et al. 2008 ). The continued lack of effective spatial detection algorithms creates a real deficit in the true potential and abilities of syndromic surveillance systems.

This article gives a review on various spatial and spatial–temporal aberration detection methods that have been used in various systems or have the potential of being used in GIS-based surveillance systems. It should be noted that the disease detection algorithms cannot guarantee an accurate detection for an epidemic disease outbreak, other than a timely, relatively cheap, and approximate alarm. Thus, syndromic surveillance cannot replace the groundwork of epidemiologists to track down clinical and exposure information.

With the increasing availability of geographic information in the syndromic systems, the spatial and spatial–temporal cluster detection should play more important roles in providing early warning of disease outbreaks. Algorithm sensitivity and specificity are ongoing issues in the use of spatial and spatial–temporal detection algorithms in the syndromic surveillance systems (Dafni et al. 2004 ). Most of these spatial and spatial–temporal techniques are sensitive to spatial and/or temporal units used in the analysis. Although there are many spatial and temporal detection approaches existing, there is no definitive proof that these approaches can catch all early disease outbreaks due to the issues of data quality and other issues in the systems. Algorithm sensitivity needs to be considered and tested before these algorithms are implemented in GIS-based syndromic systems so that the balance between early outbreak detection and false alarms is optimized as these alarms are costly in terms of financial and human resources to investigate. More research on the impact of different algorithm parameters, spatial units, and spatial and temporal uncertainty on the performance of detection methods will be needed before they can be practically implemented into surveillance systems for disease monitoring and health policy decision.

As the surveillance system is expanded to cover larger regions, as well as to take more data sources, more robust temporal and spatial detection approaches that can integrate information from multiple different data sources are needed in the future surveillance systems. Some nontraditional approaches developed in spatial–temporal analysis of other geographic fields, such as artificial neural networks in machine learning, Bayesian spatial–temporal approaches, knowledge-based data mining, and data-fusion approaches, should be a research direction in the future.