Chlorine Transportation Optimization for the Chemical Security Analysis Center
The overall goal of the census-tools project is to provide the applied math lab (AML) with a set of libraries and utilities necessary to complete its assignment.
One of the foremost goals that must be achieved is to model the
average population density throughout the United States. Using
this data, we would like to be able to calculate the risk
associated with an event
taking place somewhere in the
United States. This will, in general, be an accident or other
unexpected event that causes some damage to the surrounding
population and environment.
Our canonical example (for which the project was created) is that of a chlorine spill. If chlorine is spilled or otherwise released somewhere in the United States, we would like to be able to calculate the number of people affected.
The following software is required to utilize census-tools in full. It should be possible to reuse the code independently of the other software, but many of the features such as database integration and build system automation require third-party programs.
There are two Census databases (obtained as flat files) which we require for our purposes. The first is Summary File 1, which contains the block-level demographic information. The second database, TIGER/Line, gives us the geometry of those block.
There are a number of tables contained within the Summary File 1 database. We require only one of these: the geographic header records. The Summary File 1 geographic header records provide, among other things, the following:
The TIGER/Line information comes in the form of ESRI shapefiles. These shapefiles contain the geometric information about the Census blocks.
Each TIGER/Line record contains,
blkidfp00, which contains the concatenation of block/state/county/tract. This is our unique identifier.
We need to correlate the TIGER/Line geometric information with the
demographic information contained in the Summary File 1 geographic
header records. To do this, we need to rely on the unique
blkidfp00
identifier.
For the TIGER table, this field has already been calculated. We calculate it manually for the geographic header records. Then, we can define a one-to-one correspondence between geographic header records and TIGER/Line records. This satisfies one of our goals: correlating the two tables allows us to determine the population density at any point.
Note: the makefile provides a task for creation/import of the databases, but its use is not strictly required.
A Postgres/PostGIS database is required to store our Census
data. The database name is unimportant (default: census
),
but several of the scripts refer to the table names. For
simplicity, we will call the database census
from now on.
Once the database has been created, we need to import two PostGIS
tables so that we can support the GIS functionality. These two
files are lwpostgis.sql
and
spatial_ref_sys.sql
. See the makefile for an example of their import.
Before we import any data, we need to develop a database schema that models our Census data. The TIGER/Line schema is provided for us, so we have modeled the Summary File 1 schema in a similar fashion.
The Summary File 1 table schema is defined in a standard SQL script. The following steps will all require that a database exist, and contain the table defined by this script.
The Summary File 1 data are obtained as flat files. We therefore utilize a Python script, , to parse and import the blocks. The source code to the script is documented, but basically, it performs three tasks: it parses each record from the geographic header record file, filters out any non-block records, and then writes out a SQL statement to insert that record in to the Postgres/PostGIS database.
Since the shapefiles are in a standard format, we can use
pre-existing tools to import the data in to our SQL
database. PostGIS provides a binary, shp2pgsql
, that will
parse and convert the shapefiles to SQL.
There is one caveat here: the shp2pgsql
program requires
an SRID as an argument; this SRID is assigned to each record it
imports. We have designated an SRID of 4269, which denotes
NAD83
, or the North American Datum (1983). There may be
some possibility of improvement here; since we are only
considering the Mid-Atlantic, there may be coordinate systems
(SRIDs) which are more accurate in our subset of the United
States.
There are a number of possible optimizations that can be made should performance ever become prohibitive. To date, these have been eschewed for lack of flexibility and/or development time.
Currently, the TIGER/Line block data is stored in a separate table from the Summary File 1 block data. The two are combined at query time via SQL JOINs. Since we import the TIGER data first, and use a custom import script for SF1, we could de-normalize this design to increase query speed.
This would slow down the SF1 import, of course; but the import only needs to be performed once. The procedure would look like the following:
When the TIGER data is imported via shp2pgsql
, a GiST
index is added to the geometry column by means of the
-I flag. This improves the performance of the
population calculations by a (wildly-estimates) order of
magnitude.
Postgres, however, offers another type of similar index — the GIN Index. If performance degrades beyond what is acceptable, it may be worth evaluating the benefit of a GIN index versus the GiST one.
TRAGIS is developed by the Geographic Information Science & Tech Group at Oak Ridge National Laboratory. From the abstract,
The Transportation Routing Analysis Geographic Information System (TRAGIS) model is used to calculate highway, rail, or waterway routes within the United States. TRAGIS is a client-server application with the user interface and map data files residing on the user’s personal computer and the routing engine and network data files on a network server. The user’s manual provides documentation on installation and the use of the many features of the model.
It is of interest to us because it provides routing subject to HAZMAT and many other restrictions. Essentially, one provides TRAGIS with starting point A and an end point B; it then calculates the best route between A and B, subject to a number of constraints.
Unfortunately, TRAGIS is a client/server application. Most of the interesting logic and data are stored on the server, to which we don't have access. The User Manual does provide some clues as to where they have obtained their data, though.