X-Git-Url: http://gitweb.michael.orlitzky.com/?a=blobdiff_plain;f=doc%2Fproject_overview%2Findex.xhtml;h=8591cb4417bc8cd991d61440f15d2c30a779260b;hb=3e49aabe1fc6fe281ab47d7fd1e64aa6ab430874;hp=051e56782b7203b6967033559ea4ec44a477c8b9;hpb=935a6ead0912829a7e0f153aa7aac7494977e69c;p=dead%2Fcensus-tools.git
diff --git a/doc/project_overview/index.xhtml b/doc/project_overview/index.xhtml
index 051e567..8591cb4 100644
--- a/doc/project_overview/index.xhtml
+++ b/doc/project_overview/index.xhtml
@@ -30,7 +30,7 @@
One of the foremost goals that must be achieved is to model the
average population density throughout the United States. Using
this data, we would like to be able to calculate the risk
- associated with an event taking place somewhere in the
+ associated with an event
taking place somewhere in the
United States. This will, in general, be an accident or other
unexpected event that causes some damage to the surrounding
population and environment.
@@ -75,6 +75,22 @@
blkidfp00, which contains the concatenation of block/state/county/tract. This is our unique identifier.
blkidfp00identifier.
@@ -208,16 +224,16 @@
A Postgres/PostGIS database is required to store our Census
- data. The database name is unimportant (default: census),
+ data. The database name is unimportant (default: census
),
but several of the scripts refer to the table names. For
- simplicity, we will call the database census from now on.
+ simplicity, we will call the database census
from now on.
Once the database has been created, we need to import two PostGIS
tables so that we can support the GIS functionality. These two
- files are lwpostgis.sql and
- spatial_ref_sys.sql. See the lwpostgis.sql and
+ spatial_ref_sys.sql
. See the makefile for an example of their import.
Since the shapefiles are in a standard format, we can use
pre-existing tools to import the data in to our SQL
- database. PostGIS provides a binary, shp2pgsql, that will
+ database. PostGIS provides a binary, shp2pgsql
, that will
parse and convert the shapefiles to SQL.
- There is one caveat here: the shp2pgsql program requires
+ There is one caveat here: the shp2pgsql
program requires
an SRID as an argument; this SRID is assigned to each record it
imports. We have designated an SRID of 4269, which denotes
NAD83
, or the North American Datum (1983). There may be
@@ -269,5 +285,79 @@
States.
+ There are a number of possible optimizations that can be made + should performance ever become prohibitive. To date, these have + been eschewed for lack of flexibility and/or development time. +
+ ++ Currently, the TIGER/Line block data is stored in a separate table + from the Summary File 1 block data. The two are combined at query + time via SQL + JOINs. Since we import the TIGER data first, and use a custom + import script for SF1, we could de-normalize + this design to increase query speed. +
+ ++ This would slow down the SF1 import, of course; but the import + only needs to be performed once. The procedure would look like the + following: +
+ +
+ When the TIGER data is imported via shp2pgsql
, a GiST
+ index is added to the geometry column by means of the
+ -I flag. This improves the performance of the
+ population calculations by a (wildly-estimates) order of
+ magnitude.
+
+ Postgres, however, offers another type of similar index — + the GIN + Index. If performance degrades beyond what is acceptable, it + may be worth evaluating the benefit of a GIN index versus the GiST + one. +
+