From: Michael Orlitzky Date: Sat, 3 Oct 2009 19:20:58 +0000 (-0400) Subject: Added some possible optimizations to the project overview. X-Git-Url: http://gitweb.michael.orlitzky.com/?p=dead%2Fcensus-tools.git;a=commitdiff_plain;h=5bdd6e837232451fe5ae6dd9f726763587eb631e Added some possible optimizations to the project overview. --- diff --git a/doc/project_overview/index.xhtml b/doc/project_overview/index.xhtml index 051e567..3201609 100644 --- a/doc/project_overview/index.xhtml +++ b/doc/project_overview/index.xhtml @@ -269,5 +269,79 @@ States.

+

Possible Optimizations

+

+ There are a number of possible optimizations that can be made + should performance ever become prohibitive. To date, these have + been eschewed for lack of flexibility and/or development time. +

+ +

De-normalization of TIGER/SF1 Block Data

+

+ Currently, the TIGER/Line block data is stored in a separate table + from the Summary File 1 block data. The two are combined at query + time via SQL + JOINs. Since we import the TIGER data first, and use a custom + import script for SF1, we could de-normalize + this design to increase query speed. +

+ +

+ This would slow down the SF1 import, of course; but the import + only needs to be performed once. The procedure would look like the + following: +

+ +
    +
  1. + Add the SF1 columns to the TIGER table, allowing them to be + nullable initially (since they will all be NULL at first). +
  2. + +
  3. + Within the SF1 import, we would, +
      +
    1. Parse a block
    2. +
    3. + Use that block's blkidfp00 to find the corresponding row in + the TIGER table. +
    4. +
    5. + Update the TIGER row with the values from SF1. +
    6. +
    +
  4. + +
  5. + Optionally set the SF1 columns to NOT NULL. This may have + some performance benefit, but I wouldn't count on it. +
  6. + +
  7. + Fix all the SQL queries to use the schema. +
  8. +
+ + +

Switch from GiST to GIN Indexes

+

+ When the TIGER data is imported via shp2pgsql, a GiST + index is added to the geometry column by means of the + -I flag. This improves the performance of the + population calculations by a (wildly-estimates) order of + magnitude. +

+ +

+ Postgres, however, offers another type of similar index — + the GIN + Index. If performance degrades beyond what is acceptable, it + may be worth evaluating the benefit of a GIN index versus the GiST + one. +

+