X-Git-Url: http://gitweb.michael.orlitzky.com/?a=blobdiff_plain;f=doc%2Fman1%2Fhtsn-import.1;h=450ad7a9b9cbac19a0e2d1b71aadf83ff3e66ea6;hb=e46c9e59d0594c28dcb2d3c18fb877298bc1b5dd;hp=8c6f936c480b44cad8879a0eaab43d202b7c99bb;hpb=cf1b07edbb1a3013a3e0d49a070c74e87655e01a;p=dead%2Fhtsn-import.git diff --git a/doc/man1/htsn-import.1 b/doc/man1/htsn-import.1 index 8c6f936..450ad7a 100644 --- a/doc/man1/htsn-import.1 +++ b/doc/man1/htsn-import.1 @@ -48,29 +48,95 @@ pickle/unpickle everything already, this should be impossible. The XML document types obtained from the feed are uniquely identified by their DTDs. We currently support documents with the following DTDs: .IP \[bu] 2 -Heartbeat.dtd +AutoRacingResultsXML.dtd .IP \[bu] -newsxml.dtd +Auto_Racing_Schedule_XML.dtd +.IP \[bu] +Heartbeat.dtd .IP \[bu] Injuries_Detail_XML.dtd .IP \[bu] injuriesxml.dtd .IP \[bu] +newsxml.dtd +.IP \[bu] Odds_XML.dtd .IP \[bu] +scoresxml.dtd +.IP \[bu] weatherxml.dtd +.IP \[bu] +GameInfo +.RS +.IP \[bu] +CBASK_Lineup_XML.dtd +.IP \[bu] +cbaskpreviewxml.dtd +.IP \[bu] +cflpreviewxml.dtd +.IP \[bu] +Matchup_NBA_NHL_XML.dtd +.IP \[bu] +MLB_Gaming_Matchup_XML.dtd +.IP \[bu] +MLB_Lineup_XML.dtd +.IP \[bu] +MLB_Matchup_XML.dtd +.IP \[bu] +MLS_Preview_XML.dtd +.IP \[bu] +mlbpreviewxml.dtd +.IP \[bu] +NBA_Gaming_Matchup_XML.dtd +.IP \[bu] +NBA_Playoff_Matchup_XML.dtd +.IP \[bu] +NBALineupXML.dtd +.IP \[bu] +nbapreviewxml.dtd +.IP \[bu] +NCAA_FB_Preview_XML.dtd +.IP \[bu] +NFL_NCAA_FB_Matchup_XML.dtd +.IP \[bu] +nflpreviewxml.dtd +.IP \[bu] +nhlpreviewxml.dtd +.IP \[bu] +recapxml.dtd +.IP \[bu] +WorldBaseballPreviewXML.dtd +.RE +.IP \[bu] +SportInfo +.RS +.IP \[bu] +CBASK_3PPctXML.dtd +.IP \[bu] +Cbask_All_Tourn_Teams_XML.dtd +.IP \[bu] +CBASK_AssistsXML.dtd +.IP \[bu] +Cbask_Awards_XML.dtd +.IP \[bu] +CBASK_BlocksXML.dtd +.RE +.P +The GameInfo and SportInfo types do not have their own top-level +tables in the database. Instead, their raw XML is stored in either the +\(dqgame_info\(dq or \(dqsport_info\(dq table respectively. .SH DATABASE SCHEMA .P -At the top level, we have one table for each of the XML document types -that we import. For example, the documents corresponding to -\fInewsxml.dtd\fR will have a table called \(dqnews\(dq. All top-level -tables contain two important fields, \(dqxml_file_id\(dq and -\(dqtime_stamp\(dq. The former is unique and prevents us from -inserting the same data twice. The time stamp on the other hand lets -us know when the data is old and can be removed. The database schema -make it possible to delete only the outdated top-level records; all -transient children should be removed by triggers. +At the top level (with two notable exceptions), we have one table for +each of the XML document types that we import. For example, the +documents corresponding to \fInewsxml.dtd\fR will have a table called +\(dqnews\(dq. All top-level tables contain two important fields, +\(dqxml_file_id\(dq and \(dqtime_stamp\(dq. The former is unique and +prevents us from inserting the same data twice. The time stamp on the +other hand lets us know when the data is old and can be removed. The +database schema make it possible to delete only the outdated top-level +records; all transient children should be removed by triggers. .P These top-level tables will often have children. For example, each news item has zero or more locations associated with it. The child @@ -102,6 +168,14 @@ to delete the old games (through an ON DELETE CASCADE, tied to unique constraint in the top-level table's \(dqxml_file_id\(dq will prevent duplication in this case anyway. .P +The aforementioned exceptions are the \(dqgame_info\(dq and +\(dqsport_info\(dq tables. These tables contain the raw XML for a +number of DTDs that are not handled individually. This is partially +for backwards-compatibility with a legacy implementation, but is +mostly a stopgap due to a lack of resources at the moment. These two +tables (game_info and sport_info) still possess timestamps that allow +us to prune old data. +.P UML diagrams of the resulting database schema for each XML document type are provided with the \fBhtsn-import\fR documentation. @@ -111,8 +185,8 @@ There are a number of problems with the XML on the wire. Even if we construct the DTDs ourselves, the results are sometimes inconsistent. Here we document a few of them. -.IP \[bu] -2 Odds_XML.dtd +.IP \[bu] 2 +Odds_XML.dtd The elements here are supposed to be associated with a set of elements, but since the pair @@ -152,10 +226,11 @@ will not be auto-rotated; use something like logrotate for that. Default: none .IP \fB\-\-log-level\fR -How verbose should the logs be? We log notifications at three levels: -INFO, WARN, and ERROR. Specify the \(dqmost boring\(dq level of +How verbose should the logs be? We log notifications at four levels: +DEBUG, INFO, WARN, and ERROR. Specify the \(dqmost boring\(dq level of notifications you would like to receive (in all-caps); more -interesting notifications will be logged as well. +interesting notifications will be logged as well. The debug output is +extremely verbose and will not be written to syslog even if you try. Default: INFO