X-Git-Url: http://gitweb.michael.orlitzky.com/?a=blobdiff_plain;f=doc%2FREADME.schemagen;h=d32075ba0083f9fe3a1bcfa59997bf66e7743ab5;hb=4fe75185430116f2139dd64e97267254eda9d445;hp=30c3b1b885fc287c834803b2f0cba76a3a3d1a43;hpb=77bff3e2abda103471ab881bad81db67003582ff;p=dead%2Fhtsn-import.git diff --git a/doc/README.schemagen b/doc/README.schemagen index 30c3b1b..d32075b 100644 --- a/doc/README.schemagen +++ b/doc/README.schemagen @@ -9,7 +9,7 @@ construct a database into which to insert the XML. How do we know if to know how many times it can appear. So we need some form of specification. And reading all of the XML files one at a time to count the number of s is impractical. So, we would like to generate -the DTDs manually. +the DTDs automatically. The process should go something like, @@ -37,3 +37,13 @@ To generate them, run `make schema` at the project root. XML-Schema-learner will be invoked on each subfolder of "schemagen" and will output the corresponding DTDs to the "schemagen" folder. + +Most of the production schemas are generated this way; however, a few +needed manual tweaking. The final, believed-to-be-correct schemas for +all supported document types can be found in the "schema" folder in +the project root. Having the "correct" DTDs available means you +don't need XML-Schema-learner available to install htsn-import. + +As explained in the man page, there is a second type of weatherxml +document that we don't parse at the moment. An example is provided as +schemagen/weatherxml/20143655.xml.