X-Git-Url: http://gitweb.michael.orlitzky.com/?p=dead%2Fcensus-tools.git;a=blobdiff_plain;f=doc%2Fproject_overview%2Findex.xhtml;fp=doc%2Fproject_overview%2Findex.xhtml;h=051e56782b7203b6967033559ea4ec44a477c8b9;hp=0000000000000000000000000000000000000000;hb=935a6ead0912829a7e0f153aa7aac7494977e69c;hpb=9a1f0d95adff6a2ee308f417b2788735d88c03b8 diff --git a/doc/project_overview/index.xhtml b/doc/project_overview/index.xhtml new file mode 100644 index 0000000..051e567 --- /dev/null +++ b/doc/project_overview/index.xhtml @@ -0,0 +1,273 @@ + + + + + + + + Project Overview + + + + + + + +

Project Overview

+

+ Chlorine Transportation Optimization for the Chemical Security + Analysis Center +

+ +

General Goals

+

+ The overall goal of the census-tools project is to provide the + applied math lab (AML) with a set of libraries and utilities + necessary to complete its assignment. +

+ +

+ One of the foremost goals that must be achieved is to model the + average population density throughout the United States. Using + this data, we would like to be able to calculate the risk + associated with an event taking place somewhere in the + United States. This will, in general, be an accident or other + unexpected event that causes some damage to the surrounding + population and environment. +

+ +

+ Our canonical example (for which the project was created) is that + of a chlorine spill. If chlorine is spilled or otherwise released + somewhere in the United States, we would like to be able to + calculate the number of people affected. +

+ +

Specific Goals

+ + +

Software Requirements

+

+ The following software is required to utilize census-tools in + full. It should be possible to reuse the code independently of the + other software, but many of the features such as database + integration and build system automation require third-party + programs. +

+ +
    +
  1. + The build system utilizes GNU Make. The + makefile simply automates some processes which can be run + manually: executing tests, downloading data, importing + shapefiles, etc. Other (non-GNU) versions of Make should also + work. + +
      +
    1. + The Make system assumes that the user has a POSIX-compatible + shell. The author has tested the makefiles with Bash, but other + shells such as Cygwin should + suffice. +
    2. + +
    3. + wget is used + to download the Census shapefiles. +
    4. +
    +
  2. + +
  3. + The underlying database management system is Postgresql. +
  4. + +
  5. + The GIS capabilities are provided (to Postgresql) by PostGIS. +
  6. +
+ +

Census Databases

+

+ There are two Census databases (obtained as flat files) which we + require for our purposes. The first is Summary + File 1, which contains the block-level demographic + information. The second database, TIGER/Line, gives + us the geometry of those block. +

+ +

Summary File 1

+ +

+ There are a number of tables contained within the Summary File 1 + database. We require only one of these: the geographic header + records. The Summary File 1 geographic header records provide, + among other things, the following: +

+ + + + +

TIGER/Line

+ +

+ The TIGER/Line information comes in the form of ESRI + shapefiles. These shapefiles contain the geometric information + about the Census blocks. +

+ +

+ Each TIGER/Line record contains, +

+ + + + +

Summary File 1 / TIGER Integration

+ +

+ We need to correlate the TIGER/Line geometric information with the + demographic information contained in the Summary File 1 geographic + header records. To do this, we need to rely on the unique + blkidfp00 identifier. +

+ +

+ For the TIGER table, this field has already been calculated. We + calculate it manually for the geographic header records. Then, we + can define a one-to-one correspondence between geographic header + records and TIGER/Line records. This satisfies one of our goals: + correlating the two tables allows us to determine the population + density at any point. +

+ + +

Importing the Data

+ +

+ + Note: the makefile provides a task + for creation/import of the databases, but its use is strictly + required. + +

+ + +

Creation of the Database

+ +

+ A Postgres/PostGIS database is required to store our Census + data. The database name is unimportant (default: census), + but several of the scripts refer to the table names. For + simplicity, we will call the database census from now on. +

+ +

+ Once the database has been created, we need to import two PostGIS + tables so that we can support the GIS functionality. These two + files are lwpostgis.sql and + spatial_ref_sys.sql. See the makefile for an example of their import. +

+ +

+ Before we import any data, we need to develop a database schema + that models our Census data. The TIGER/Line schema is provided for + us, so we have modeled the Summary File 1 schema in a similar + fashion. +

+ +

+ The Summary File 1 table schema is defined in a standard + SQL script. The following steps will all require that a + database exist, and contain the table defined by this script. +

+ + +

Importing the Summary File 1 Block Records

+ +

+ The Summary File 1 data are obtained as flat files. We therefore + utilize a Python script, , + to parse and import the blocks. The source code to the script is + documented, but basically, it performs three tasks: it parses each + record from the geographic header record file, filters out any + non-block records, and then writes out a SQL statement to insert + that record in to the Postgres/PostGIS database. +

+ + +

Importing the TIGER/Line Shapefiles

+ +

+ Since the shapefiles are in a standard format, we can use + pre-existing tools to import the data in to our SQL + database. PostGIS provides a binary, shp2pgsql, that will + parse and convert the shapefiles to SQL. +

+ +

+ There is one caveat here: the shp2pgsql program requires + an SRID as an argument; this SRID is assigned to each record it + imports. We have designated an SRID of 4269, which denotes + NAD83, or the North American Datum (1983). There may be + some possibility of improvement here; since we are only + considering the Mid-Atlantic, there may be coordinate systems + (SRIDs) which are more accurate in our subset of the United + States. +

+ + +