From: Michael Orlitzky Date: Mon, 2 Jun 2014 20:20:47 +0000 (-0400) Subject: Add GameInfo DTDs and an explanation to the man page. X-Git-Tag: 0.0.6~246 X-Git-Url: https://gitweb.michael.orlitzky.com/?a=commitdiff_plain;h=8a744fd3d6f7c9d4f5f4c0e3c2e5128453da779e;p=dead%2Fhtsn-import.git Add GameInfo DTDs and an explanation to the man page. Enable GameInfo import in Main. --- diff --git a/doc/man1/htsn-import.1 b/doc/man1/htsn-import.1 index 49a6927..869bb52 100644 --- a/doc/man1/htsn-import.1 +++ b/doc/man1/htsn-import.1 @@ -56,25 +56,51 @@ Injuries_Detail_XML.dtd .IP \[bu] injuriesxml.dtd .IP \[bu] +MLB_Gaming_Matchup_XML.dtd (GameInfo) +.IP \[bu] +MLB_Lineup_XML.dtd (GameInfo) +.IP \[bu] +MLB_Matchup_XML.dtd (GameInfo) +.IP \[bu] +MLS_Preview_XML.dtd (GameInfo) +.IP \[bu] +mlbpreviewxml.dtd (GameInfo) +.IP \[bu] +NBA_Gaming_Matchup_XML.dtd (GameInfo) +.IP \[bu] +NBA_Playoff_Matchup_XML.dtd (GameInfo) +.IP \[bu] +NBALineupXML.dtd (GameInfo) +.IP \[bu] +nbapreviewxml.dtd (GameInfo) +.IP \[bu] newsxml.dtd .IP \[bu] +nhlpreviewxml.dtd (GameInfo) +.IP \[bu] Odds_XML.dtd .IP \[bu] +recapxml.dtd (GameInfo) +.IP \[bu] scoresxml.dtd .IP \[bu] weatherxml.dtd +.P +The GameInfo and SportsInfo types do not have their own top-level +tables in the database. Instead, their raw XML is stored in either the +\(dqgame_info\(dq or \(dqsports_info\(dq table respectively. .SH DATABASE SCHEMA .P -At the top level, we have one table for each of the XML document types -that we import. For example, the documents corresponding to -\fInewsxml.dtd\fR will have a table called \(dqnews\(dq. All top-level -tables contain two important fields, \(dqxml_file_id\(dq and -\(dqtime_stamp\(dq. The former is unique and prevents us from -inserting the same data twice. The time stamp on the other hand lets -us know when the data is old and can be removed. The database schema -make it possible to delete only the outdated top-level records; all -transient children should be removed by triggers. +At the top level (with two notable exceptions), we have one table for +each of the XML document types that we import. For example, the +documents corresponding to \fInewsxml.dtd\fR will have a table called +\(dqnews\(dq. All top-level tables contain two important fields, +\(dqxml_file_id\(dq and \(dqtime_stamp\(dq. The former is unique and +prevents us from inserting the same data twice. The time stamp on the +other hand lets us know when the data is old and can be removed. The +database schema make it possible to delete only the outdated top-level +records; all transient children should be removed by triggers. .P These top-level tables will often have children. For example, each news item has zero or more locations associated with it. The child @@ -106,6 +132,14 @@ to delete the old games (through an ON DELETE CASCADE, tied to unique constraint in the top-level table's \(dqxml_file_id\(dq will prevent duplication in this case anyway. .P +The aforementioned exceptions are the \(dqgame_info\(dq and +\(dqsports_info\(dq tables. These tables contain the raw XML for a +number of DTDs that are not handled individually. This is partially +for backwards-compatibility with a legacy implementation, but is +mostly a stopgap due to a lack of resources at the moment. These two +tables (game_info and sports_info) still possess timestamps that allow +us to prune old data. +.P UML diagrams of the resulting database schema for each XML document type are provided with the \fBhtsn-import\fR documentation. @@ -115,8 +149,8 @@ There are a number of problems with the XML on the wire. Even if we construct the DTDs ourselves, the results are sometimes inconsistent. Here we document a few of them. -.IP \[bu] -2 Odds_XML.dtd +.IP \[bu] 2 +Odds_XML.dtd The elements here are supposed to be associated with a set of elements, but since the pair diff --git a/src/Main.hs b/src/Main.hs index 4669b74..9e17bbd 100644 --- a/src/Main.hs +++ b/src/Main.hs @@ -48,7 +48,7 @@ import TSN.DbImport ( DbImport(..), ImportResult(..) ) import qualified TSN.XML.AutoRacingSchedule as AutoRacingSchedule ( dtd, pickle_message ) -import qualified TSN.XML.GameInfo as GameInfo ( dtds ) +import qualified TSN.XML.GameInfo as GameInfo ( dtds, parse_xml ) import qualified TSN.XML.Heartbeat as Heartbeat ( dtd, verify ) import qualified TSN.XML.Injuries as Injuries ( dtd, pickle_message ) import qualified TSN.XML.InjuriesDetail as InjuriesDetail ( @@ -190,7 +190,13 @@ import_file cfg path = do let m = unpickleDoc Weather.pickle_message xml maybe (return $ ImportFailed errmsg) migrate_and_import m - | dtd `elem` GameInfo.dtds = undefined + | dtd `elem` GameInfo.dtds = do + let either_m = GameInfo.parse_xml dtd xml + case either_m of + -- This might give us a slightly better error + -- message than the default 'errmsg'. + Left err -> return $ ImportFailed err + Right m -> migrate_and_import m | dtd `elem` SportInfo.dtds = undefined