X-Git-Url: http://gitweb.michael.orlitzky.com/?a=blobdiff_plain;f=doc%2Fman1%2Fhtsn-import.1;h=ef43f1ee8aafc1c4840703cf686426e47bc611e8;hb=d2f5d93b2b68f581d4cb4eabecc556c01762d370;hp=f1edf446f5b20b5a68dddf9722aec2006869c1dd;hpb=a599e73b762cc14239c2dc22be9bec7c1df90548;p=dead%2Fhtsn-import.git diff --git a/doc/man1/htsn-import.1 b/doc/man1/htsn-import.1 index f1edf44..ef43f1e 100644 --- a/doc/man1/htsn-import.1 +++ b/doc/man1/htsn-import.1 @@ -48,14 +48,16 @@ pickle/unpickle everything already, this should be impossible. The XML document types obtained from the feed are uniquely identified by their DTDs. We currently support documents with the following DTDs: .IP \[bu] 2 -Heartbeat.dtd +Auto_Racing_Schedule_XML.dtd .IP \[bu] -newsxml.dtd +Heartbeat.dtd .IP \[bu] Injuries_Detail_XML.dtd .IP \[bu] injuriesxml.dtd .IP \[bu] +newsxml.dtd +.IP \[bu] Odds_XML.dtd .IP \[bu] weatherxml.dtd @@ -105,6 +107,30 @@ prevent duplication in this case anyway. UML diagrams of the resulting database schema for each XML document type are provided with the \fBhtsn-import\fR documentation. +.SH XML Schema Oddities +.P +There are a number of problems with the XML on the wire. Even if we +construct the DTDs ourselves, the results are sometimes +inconsistent. Here we document a few of them. + +.IP \[bu] +2 Odds_XML.dtd + +The elements here are supposed to be associated with a set of + elements, but since the pair +(......) can appear zero or more times, +this leads to ambiguity in parsing. We therefore ignore the notes +entirely (although a hack is employed to facilitate parsing). + +.IP \[bu] +weatherxml.dtd + +There appear to be two types of weather documents; the first has + contained within and the second has +contained within . While it would be possible to parse both, +it would greatly complicate things. The first form is more common, so +that's all we support for now. + .SH OPTIONS .IP \fB\-\-backend\fR,\ \fB\-b\fR