.P
First, we must parse the XML. Each supported document type (see below)
has a full pickle/unpickle implementation (\(dqpickle\(dq is simply a
-synonym for serialize here). That means that we parse the entire
-document into a data structure, and if we pickle (serialize) that data
-structure, we get the exact same XML document tha we started with.
+synonym for \(dqserialize\(dq here). That means that we parse the
+entire document into a data structure, and if we pickle (serialize)
+that data structure, we get the exact same XML document tha we started
+with.
.P
This is important for two reasons. First, it serves as a second level
of validation. The first validation is performed by the XML parser,
The XML document types obtained from the feed are uniquely identified
by their DTDs. We currently support documents with the following DTDs:
.IP \[bu] 2
-Heartbeat.dtd
+AutoRacingResultsXML.dtd
.IP \[bu]
-newsxml.dtd
+Auto_Racing_Schedule_XML.dtd
+.IP \[bu]
+Heartbeat.dtd
.IP \[bu]
Injuries_Detail_XML.dtd
.IP \[bu]
injuriesxml.dtd
.IP \[bu]
+jfilexml.dtd
+.IP \[bu]
+newsxml.dtd
+.IP \[bu]
Odds_XML.dtd
.IP \[bu]
+scoresxml.dtd
+.IP \[bu]
weatherxml.dtd
+.IP \[bu]
+GameInfo
+.RS
+.IP \[bu]
+CBASK_Lineup_XML.dtd
+.IP \[bu]
+cbaskpreviewxml.dtd
+.IP \[bu]
+cflpreviewxml.dtd
+.IP \[bu]
+Matchup_NBA_NHL_XML.dtd
+.IP \[bu]
+MLB_Fielding_XML.dtd
+.IP \[bu]
+MLB_Gaming_Matchup_XML.dtd
+.IP \[bu]
+MLB_Lineup_XML.dtd
+.IP \[bu]
+MLB_Matchup_XML.dtd
+.IP \[bu]
+MLS_Preview_XML.dtd
+.IP \[bu]
+mlbpreviewxml.dtd
+.IP \[bu]
+NBA_Gaming_Matchup_XML.dtd
+.IP \[bu]
+NBA_Playoff_Matchup_XML.dtd
+.IP \[bu]
+NBALineupXML.dtd
+.IP \[bu]
+nbapreviewxml.dtd
+.IP \[bu]
+NCAA_FB_Preview_XML.dtd
+.IP \[bu]
+NFL_NCAA_FB_Matchup_XML.dtd
+.IP \[bu]
+nflpreviewxml.dtd
+.IP \[bu]
+nhlpreviewxml.dtd
+.IP \[bu]
+recapxml.dtd
+.IP \[bu]
+WorldBaseballPreviewXML.dtd
+.RE
+.IP \[bu]
+SportInfo
+.RS
+.IP \[bu]
+CBASK_3PPctXML.dtd
+.IP \[bu]
+Cbask_All_Tourn_Teams_XML.dtd
+.IP \[bu]
+CBASK_AssistsXML.dtd
+.IP \[bu]
+Cbask_Awards_XML.dtd
+.IP \[bu]
+CBASK_BlocksXML.dtd
+.IP \[bu]
+Cbask_Conf_Standings_XML.dtd
+.IP \[bu]
+Cbask_DivII_III_Indv_Stats_XML.dtd
+.IP \[bu]
+Cbask_DivII_Team_Stats_XML.dtd
+.IP \[bu]
+Cbask_DivIII_Team_Stats_XML.dtd
+.IP \[bu]
+CBASK_FGPctXML.dtd
+.IP \[bu]
+CBASK_FoulsXML.dtd
+.IP \[bu]
+CBASK_FTPctXML.dtd
+.IP \[bu]
+Cbask_Indv_Scoring_XML.dtd
+.IP \[bu]
+CBASK_MinutesXML.dtd
+.IP \[bu]
+Cbask_Polls_XML.dtd
+.IP \[bu]
+CBASK_ReboundsXML.dtd
+.IP \[bu]
+CBASK_ScoringLeadersXML.dtd
+.IP \[bu]
+Cbask_Team_ThreePT_Made_XML.dtd
+.IP \[bu]
+Cbask_Team_ThreePT_PCT_XML.dtd
+.IP \[bu]
+Cbask_Team_Win_Pct_XML.dtd
+.IP \[bu]
+Cbask_Top_Twenty_Five_XML.dtd
+.IP \[bu]
+CBASK_TopTwentyFiveResult_XML.dtd
+.IP \[bu]
+Cbask_Tourn_Awards_XML.dtd
+.IP \[bu]
+Cbask_Tourn_Champs_XML.dtd
+.IP \[bu]
+Cbask_Tourn_Indiv_XML.dtd
+.IP \[bu]
+Cbask_Tourn_Leaders_XML.dtd
+.IP \[bu]
+Cbask_Tourn_MVP_XML.dtd
+.IP \[bu]
+Cbask_Tourn_Records_XML.dtd
+.IP \[bu]
+LeagueScheduleXML.dtd
+.IP \[bu]
+minorscoresxml.dtd
+.IP \[bu]
+Minor_Baseball_League_Leaders_XML.dtd
+.IP \[bu]
+Minor_Baseball_Standings_XML.dtd
+.IP \[bu]
+Minor_Baseball_Transactions_XML.dtd
+.IP \[bu]
+mlbbattingavgxml.dtd
+.IP \[bu]
+mlbdoublesleadersxml.dtd
+.IP \[bu]
+MLBGamesPlayedXML.dtd
+.IP \[bu]
+MLBGIDPXML.dtd
+.IP \[bu]
+MLBHitByPitchXML.dtd
+.IP \[bu]
+mlbhitsleadersxml.dtd
+.IP \[bu]
+mlbhomerunsxml.dtd
+.IP \[bu]
+MLBHRFreqXML.dtd
+.IP \[bu]
+MLBIntWalksXML.dtd
+.IP \[bu]
+MLBKORateXML.dtd
+.IP \[bu]
+mlbonbasepctxml.dtd
+.IP \[bu]
+MLBOPSXML.dtd
+.IP \[bu]
+MLBPlateAppsXML.dtd
+.IP \[bu]
+mlbrbisxml.dtd
+.IP \[bu]
+mlbrunsleadersxml.dtd
+.IP \[bu]
+MLBSacFliesXML.dtd
+.IP \[bu]
+MLBSacrificesXML.dtd
+.IP \[bu]
+MLBSBSuccessXML.dtd
+.IP \[bu]
+mlbsluggingpctxml.dtd
+.IP \[bu]
+mlbstandxml.dtd
+.IP \[bu]
+mlbstandxml_preseason.dtd
+.IP \[bu]
+mlbstolenbasexml.dtd
+.IP \[bu]
+mlbtotalbasesleadersxml.dtd
+.IP \[bu]
+mlbtriplesleadersxml.dtd
+.IP \[bu]
+MLBWalkRateXML.dtd
+.IP \[bu]
+mlbwalksleadersxml.dtd
+.IP \[bu]
+MLBXtraBaseHitsXML.dtd
+.IP \[bu]
+MLB_Pitching_Appearances_Leaders.dtd
+.IP \[bu]
+MLB_ERA_Leaders.dtd
+.IP \[bu]
+MLB_Pitching_Balks_Leaders.dtd
+.IP \[bu]
+MLB_Pitching_CG_Leaders.dtd
+.IP \[bu]
+MLB_Pitching_ER_Allowed_Leaders.dtd
+.IP \[bu]
+MLB_Pitching_Hits_Allowed_Leaders.dtd
+.IP \[bu]
+MLB_Pitching_Hit_Batters_Leaders.dtd
+.IP \[bu]
+MLB_Pitching_HR_Allowed_Leaders.dtd
+.IP \[bu]
+MLB_Pitching_IP_Leaders.dtd
+.IP \[bu]
+MLB_Pitching_Runs_Allowed_Leaders.dtd
+.IP \[bu]
+MLB_Pitching_Saves_Leaders.dtd
+.IP \[bu]
+MLB_Pitching_Shut_Outs_Leaders.dtd
+.IP \[bu]
+MLB_Pitching_Starts_Leaders.dtd
+.IP \[bu]
+MLB_Pitching_Strike_Outs_Leaders.dtd
+.IP \[bu]
+MLB_Pitching_Walks_Leaders.dtd
+.IP \[bu]
+MLB_Pitching_WHIP_Leaders.dtd
+.IP \[bu]
+MLB_Pitching_Wild_Pitches_Leaders.dtd
+.IP \[bu]
+MLB_Pitching_Win_Percentage_Leaders.dtd
+.IP \[bu]
+MLB_Pitching_WL_Leaders.dtd
+.IP \[bu]
+NBA_Team_Stats_XML.dtd
+.IP \[bu]
+NBA3PPctXML.dtd
+.IP \[bu]
+NBAAssistsXML.dtd
+.IP \[bu]
+NBABlocksXML.dtd
+.IP \[bu]
+nbaconfrecxml.dtd
+.IP \[bu]
+nbadaysxml.dtd
+.IP \[bu]
+nbadivisionsxml.dtd
+.IP \[bu]
+NBAFGPctXML.dtd
+.IP \[bu]
+NBAFoulsXML.dtd
+.IP \[bu]
+NBAFTPctXML.dtd
+.IP \[bu]
+NBAMinutesXML.dtd
+.IP \[bu]
+NBAReboundsXML.dtd
+.IP \[bu]
+NBAScorersXML.dtd
+.IP \[bu]
+nbastandxml.dtd
+.IP \[bu]
+NBAStealsXML.dtd
+.IP \[bu]
+nbateamleadersxml.dtd
+.IP \[bu]
+nbatripledoublexml.dtd
+.IP \[bu]
+NBATurnoversXML.dtd
+.IP \[bu]
+NCAA_Conference_Schedule_XML.dtd
+.IP \[bu]
+nflfirstdownxml.dtd
+.IP \[bu]
+NFLFumbleLeaderXML.dtd
+.IP \[bu]
+NFLGiveTakeXML.dtd
+.IP \[bu]
+NFLInside20XML.dtd
+.IP \[bu]
+NFLKickoffsXML.dtd
+.IP \[bu]
+NFLMondayNightXML.dtd
+.IP \[bu]
+NFLPassLeadXML.dtd
+.IP \[bu]
+NFLQBStartsXML.dtd
+.IP \[bu]
+NFLSackLeadersXML.dtd
+.IP \[bu]
+nflstandxml.dtd
+.IP \[bu]
+NFLTeamRankingsXML.dtd
+.IP \[bu]
+NFLTopPerformanceXML.dtd
+.IP \[bu]
+NFLTotalYardageXML.dtd
+.IP \[bu]
+NFL_KickingLeaders_XML.dtd
+.IP \[bu]
+NFL_NBA_Draft_XML.dtd
+.IP \[bu]
+NFL_Roster_XML.dtd
+.IP \[bu]
+NFL_Team_Stats_XML.dtd
+.IP \[bu]
+Transactions_XML.dtd
+.IP \[bu]
+Weekly_Sched_XML.dtd
+.IP \[bu]
+WNBA_Team_Leaders_XML.dtd
+.IP \[bu]
+WNBA3PPctXML.dtd
+.IP \[bu]
+WNBAAssistsXML.dtd
+.IP \[bu]
+WNBABlocksXML.dtd
+.IP \[bu]
+WNBAFGPctXML.dtd
+.IP \[bu]
+WNBAFoulsXML.dtd
+.IP \[bu]
+WNBAFTPctXML.dtd
+.IP \[bu]
+WNBAMinutesXML.dtd
+.IP \[bu]
+WNBAReboundsXML.dtd
+.IP \[bu]
+WNBAScorersXML.dtd
+.IP \[bu]
+wnbastandxml.dtd
+.IP \[bu]
+WNBAStealsXML.dtd
+.IP \[bu]
+WNBATurnoversXML.dtd
+.RE
+.P
+The GameInfo and SportInfo types do not have their own top-level
+tables in the database. Instead, their raw XML is stored in either the
+\(dqgame_info\(dq or \(dqsport_info\(dq table respectively.
.SH DATABASE SCHEMA
.P
-At the top level, we have one table for each of the XML document types
-that we import. For example, the documents corresponding to
-\fInewsxml.dtd\fR will have a table called \(dqnews\(dq. All top-level
-tables contain two important fields, \(dqxml_file_id\(dq and
-\(dqtime_stamp\(dq. The former is unique and prevents us from
-inserting the same data twice. The time stamp on the other hand lets
-us know when the data is old and can be removed. The database schema
-make it possible to delete only the outdated top-level records; all
-transient children should be removed by triggers.
+At the top level (with two notable exceptions), we have one table for
+each of the XML document types that we import. For example, the
+documents corresponding to \fInewsxml.dtd\fR will have a table called
+\(dqnews\(dq. All top-level tables contain two important fields,
+\(dqxml_file_id\(dq and \(dqtime_stamp\(dq. The former is unique and
+prevents us from inserting the same data twice. The time stamp on the
+other hand lets us know when the data is old and can be removed. The
+database schema make it possible to delete only the outdated top-level
+records; all transient children should be removed by triggers.
.P
These top-level tables will often have children. For example, each
news item has zero or more locations associated with it. The child
unique constraint in the top-level table's \(dqxml_file_id\(dq will
prevent duplication in this case anyway.
.P
+The aforementioned exceptions are the \(dqgame_info\(dq and
+\(dqsport_info\(dq tables. These tables contain the raw XML for a
+number of DTDs that are not handled individually. This is partially
+for backwards-compatibility with a legacy implementation, but is
+mostly a stopgap due to a lack of resources at the moment. These two
+tables (game_info and sport_info) still possess timestamps that allow
+us to prune old data.
+.P
UML diagrams of the resulting database schema for each XML document
type are provided with the \fBhtsn-import\fR documentation.
+.SH XML Schema Oddities
+.P
+There are a number of problems with the XML on the wire. Even if we
+construct the DTDs ourselves, the results are sometimes
+inconsistent. Here we document a few of them.
+
+.IP \[bu] 2
+Odds_XML.dtd
+
+The <Notes> elements here are supposed to be associated with a set of
+<Game> elements, but since the pair
+(<Notes>...</Notes><Game>...</Game>) can appear zero or more times,
+this leads to ambiguity in parsing. We therefore ignore the notes
+entirely (although a hack is employed to facilitate parsing).
+
+.IP \[bu]
+weatherxml.dtd
+
+There appear to be two types of weather documents; the first has
+<listing> contained within <forecast> and the second has <forecast>
+contained within <listing>. While it would be possible to parse both,
+it would greatly complicate things. The first form is more common, so
+that's all we support for now.
+
.SH OPTIONS
.IP \fB\-\-backend\fR,\ \fB\-b\fR
Default: none
.IP \fB\-\-log-level\fR
-How verbose should the logs be? We log notifications at three levels:
-INFO, WARN, and ERROR. Specify the \(dqmost boring\(dq level of
+How verbose should the logs be? We log notifications at four levels:
+DEBUG, INFO, WARN, and ERROR. Specify the \(dqmost boring\(dq level of
notifications you would like to receive (in all-caps); more
-interesting notifications will be logged as well.
+interesting notifications will be logged as well. The debug output is
+extremely verbose and will not be written to syslog even if you try.
Default: INFO