X-Git-Url: https://gitweb.michael.orlitzky.com/?a=blobdiff_plain;f=doc%2Fproject_overview%2Findex.xhtml;h=3201609ff353831172b13b440e926a270903e302;hb=730af6c2b7cc3b82a45fe8cdff720c13892316cb;hp=051e56782b7203b6967033559ea4ec44a477c8b9;hpb=935a6ead0912829a7e0f153aa7aac7494977e69c;p=dead%2Fcensus-tools.git

diff --git a/doc/project_overview/index.xhtml b/doc/project_overview/index.xhtml
index 051e567..3201609 100644
--- a/doc/project_overview/index.xhtml
+++ b/doc/project_overview/index.xhtml
@@ -269,5 +269,79 @@
     States.
   </p>
 
+  <h2>Possible Optimizations</h2>
+  <p>
+    There are a number of possible optimizations that can be made
+    should performance ever become prohibitive. To date, these have
+    been eschewed for lack of flexibility and/or development time.
+  </p>
+
+  <h3>De-normalization of TIGER/SF1 Block Data</h3>
+  <p>
+    Currently, the TIGER/Line block data is stored in a separate table
+    from the Summary File 1 block data. The two are combined at query
+    time via <a href="http://en.wikipedia.org/wiki/Join_(SQL)">SQL
+    JOIN</a>s. Since we import the TIGER data first, and use a custom
+    import script for SF1, we could <a
+    href="http://en.wikipedia.org/wiki/Denormalization">de-normalize</a>
+    this design to increase query speed.
+  </p>
+  
+  <p>
+    This would slow down the SF1 import, of course; but the import
+    only needs to be performed once. The procedure would look like the
+    following:
+  </p>
+
+  <ol>
+    <li>
+      Add the SF1 columns to the TIGER table, allowing them to be
+      nullable initially (since they will all be NULL at first).
+    </li>
+
+    <li>
+      Within the SF1 import, we would,
+      <ol>
+	<li>Parse a block</li>
+	<li>
+	  Use that block's blkidfp00 to find the corresponding row in
+	  the TIGER table.
+	</li>
+	<li>
+	  Update the TIGER row with the values from SF1.
+	</li>
+      </ol>
+    </li>
+
+    <li>
+      Optionally set the SF1 columns to NOT NULL. This may have
+      <em>some</em> performance benefit, but I wouldn't count on it.
+    </li>
+
+    <li>
+      Fix all the SQL queries to use the schema.
+    </li>
+  </ol>
+
+  
+  <h3>Switch from GiST to GIN Indexes</h3>
+  <p>
+    When the TIGER data is imported via <em>shp2pgsql</em>, a <a
+    href="http://www.postgresql.org/docs/8.4/static/textsearch-indexes.html">GiST</a>
+    index is added to the geometry column by means of the
+    <strong>-I</strong> flag. This improves the performance of the
+    population calculations by a (wildly-estimates) order of
+    magnitude.
+  </p>
+
+  <p>
+    Postgres, however, offers another type of similar index &mdash;
+    the <a
+    href="http://www.postgresql.org/docs/8.4/static/textsearch-indexes.html">GIN
+    Index</a>. If performance degrades beyond what is acceptable, it
+    may be worth evaluating the benefit of a GIN index versus the GiST
+    one.
+  </p>
+  
 </body>
 </html>