From cf1b07edbb1a3013a3e0d49a070c74e87655e01a Mon Sep 17 00:00:00 2001 From: Michael Orlitzky Date: Tue, 21 Jan 2014 16:27:17 -0500 Subject: [PATCH] Document the weird weather sample. --- doc/README.schemagen | 4 ++++ doc/man1/htsn-import.1 | 24 ++++++++++++++++++++++++ 2 files changed, 28 insertions(+) diff --git a/doc/README.schemagen b/doc/README.schemagen index 8c570d3..d32075b 100644 --- a/doc/README.schemagen +++ b/doc/README.schemagen @@ -43,3 +43,7 @@ needed manual tweaking. The final, believed-to-be-correct schemas for all supported document types can be found in the "schema" folder in the project root. Having the "correct" DTDs available means you don't need XML-Schema-learner available to install htsn-import. + +As explained in the man page, there is a second type of weatherxml +document that we don't parse at the moment. An example is provided as +schemagen/weatherxml/20143655.xml. diff --git a/doc/man1/htsn-import.1 b/doc/man1/htsn-import.1 index f1edf44..8c6f936 100644 --- a/doc/man1/htsn-import.1 +++ b/doc/man1/htsn-import.1 @@ -105,6 +105,30 @@ prevent duplication in this case anyway. UML diagrams of the resulting database schema for each XML document type are provided with the \fBhtsn-import\fR documentation. +.SH XML Schema Oddities +.P +There are a number of problems with the XML on the wire. Even if we +construct the DTDs ourselves, the results are sometimes +inconsistent. Here we document a few of them. + +.IP \[bu] +2 Odds_XML.dtd + +The elements here are supposed to be associated with a set of + elements, but since the pair +(......) can appear zero or more times, +this leads to ambiguity in parsing. We therefore ignore the notes +entirely (although a hack is employed to facilitate parsing). + +.IP \[bu] +weatherxml.dtd + +There appear to be two types of weather documents; the first has + contained within and the second has +contained within . While it would be possible to parse both, +it would greatly complicate things. The first form is more common, so +that's all we support for now. + .SH OPTIONS .IP \fB\-\-backend\fR,\ \fB\-b\fR -- 2.43.2