should be considered a bug if they are incorrect. The diagrams are
created using the pgModeler <http://www.pgmodeler.com.br/> tool.
+.SH DATABASE SCHEMA COMPROMISES
+
+There are a few places that the database schema isn't exactly how we'd
+like it to be:
+
+.IP \[bu] 2
+\fIearlylineXML.dtd\fR
+
+The database representations for earlylineXML.dtd and
+MLB_earlylineXML.dtd are the same; that is, they share the same
+tables. The two document types represent team names in different
+ways. In order to accomodate both types with one parser, we had to
+make both ways optional, and then merge the two together before
+converting to the database representation.
+
+Unfortunately, when we merge two optional things together, we get
+another optional thing back. There's no way to say that \(dqat least
+one is not optional.\(dq So the team names in the database schema are
+optional as well, even though they should always be present.
+
.SH NULL POLICY
.P
Normally in a database one makes a distinction between fields that
Processed 1 document(s) total.
.fi
.P
-At this point, the database schema matches the old documents, i.e. the
-ones without \fIAStarter\fR and \fIHStarter\fR. If we use a new
+At this point, the database schema matches the old documents, that is,
+the ones without \fIAStarter\fR and \fIHStarter\fR. If we use a new
version of \fBhtsn-import\fR, supporting the new fields, the migration
is handled gracefully:
.P
it as unsupported so that offending documents can be removed. An example
is provided as test/xml/newsxml-multiple-sms.xml.
+.IP \[bu]
+\fIMLB_earlylineXML.dtd\fR
+
+Unlike earlylineXML.dtd, this document type has more than one <game>
+associated with each <date>. Moreover, each <date> has a bunch of
+<note> children that are supposed to be associated with the <game>s,
+but the document structure indicates no explicit relationship. For
+example,
+
+.nf
+<date>
+ <note>...</note>
+ <game>...</game>
+ <game>...</game>
+ <note>...</note>
+ <game>...</game>
+</date>
+.fi
+
+Here the first <note> is inferred to apply to the two <game>s that
+follow it, and the second <note> applies to the single <game> that
+follows it. But this is very fragile to parse. Instead, we use a hack
+to facilitate (un)pickling, and then drop the notes entirely during
+the database conversion.
+
+A similar workaround is implemented for Odds_XML.dtd.
+
.IP \[bu]
\fIOdds_XML.dtd\fR
The \(dqtime zone\(dq is given as \(dqET\(dq, but unfortunately
\(dqET\(dq is not a valid time zone. It stands for \(dqEastern
Time\(dq, which can belong to either of two time zones, EST or EDT,
-based on the time of the year (i.e. whether or not daylight savings
-time is in effect). Since we can't tell from the timestamp, we always
-parse these as EST which is UTC-5. When daylight savings is in effect,
-they will be off by an hour.
+based on the time of the year (that is, whether or not daylight
+savings time is in effect) and one's location (for example, Arizona
+doesn't observe daylight savings time). It's not much more useful to
+be off by one hour than it is to be off by five hours, and since we
+can't determine the true offset from the timestamp, we always parse
+and store these as UTC.
Here's a list of the ones that may cause surprises:
one ignores the UTC time zone, the time can possibly be off by 12
hours (due to the a.m./p.m. issue).
+The game <time> elements can also be empty. Since we store the
+combined game date/time in one field, these games will appear to begin
+at midnight on the day they occur.
+
.IP \[bu]
\fIjfilexml.dtd\fR
They are also stored as UTC.
+.IP \[bu]
+\fIMLB_earlylineXML.dtd\fR
+
+See earlylineXML.dtd.
+
.IP \[bu]
\fIOdds_XML.dtd\fR
.IP \[bu]
jfilexml.dtd
.IP \[bu]
+MLB_earlylineXML.dtd
+.IP \[bu]
newsxml.dtd
.IP \[bu]
Odds_XML.dtd
.IP \[bu]
CBASK_FTPctXML.dtd
.IP \[bu]
+Cbask_Indv_No_Avg_XML
+.IP \[bu]
Cbask_Indv_Scoring_XML.dtd
.IP \[bu]
+Cbask_Indv_Shooting_XML.dtd
+.IP \[bu]
CBASK_MinutesXML.dtd
.IP \[bu]
Cbask_Polls_XML.dtd
.IP \[bu]
CBASK_ScoringLeadersXML.dtd
.IP \[bu]
+Cbask_Team_Scoring_Rebound_Margin_XML.dtd
+.IP \[bu]
Cbask_Team_ThreePT_Made_XML.dtd
.IP \[bu]
Cbask_Team_ThreePT_PCT_XML.dtd
.IP \[bu]
NFLGiveTakeXML.dtd
.IP \[bu]
+NFLGrassTurfDomeOutsideXML.dtd
+.IP \[bu]
NFLInside20XML.dtd
.IP \[bu]
+NFLInterceptionLeadersXML.dtd
+.IP \[bu]
NFLKickoffsXML.dtd
.IP \[bu]
NFLMondayNightXML.dtd
.IP \[bu]
+NFLPassingLeadersXML.dtd
+.IP \[bu]
NFLPassLeadXML.dtd
.IP \[bu]
NFLQBStartsXML.dtd
.IP \[bu]
+NFLReceivingLeadersXML.dtd
+.IP \[bu]
+NFLRushingLeadersXML.dtd
+.IP \[bu]
NFLSackLeadersXML.dtd
.IP \[bu]
nflstandxml.dtd
.IP \[bu]
+NFLTackleFFLeadersXML.dtd
+.IP \[bu]
NFLTeamRankingsXML.dtd
.IP \[bu]
+NFLTopKickoffReturnXML.dtd
+.IP \[bu]
NFLTopPerformanceXML.dtd
.IP \[bu]
+NFLTopPuntReturnXML.dtd
+.IP \[bu]
NFLTotalYardageXML.dtd
.IP \[bu]
+NFLYardsXML.dtd
+.IP \[bu]
NFL_KickingLeaders_XML.dtd
.IP \[bu]
NFL_NBA_Draft_XML.dtd
.IP \[bu]
+NFL_PuntingLeaders_XML.dtd
+.IP \[bu]
NFL_Roster_XML.dtd
.IP \[bu]
NFL_Team_Stats_XML.dtd