+In order to parse XML, you need to know the structure of your
+documents. Usually this is given in the form of a DTD or schema. The
+Sports Network does provide DTDs for their XML, but they're wrong! So,
+what can we do?
+
+The easiest option would be to guess and pray. But we need to
+construct a database into which to insert the XML. How do we know if
+<game> should be a column, or if it should have its own table? We need
+to know how many times it can appear. So we need some form of
+specification. And reading all of the XML files one at a time to count
+the number of <game>s is impractical. So, we would like to generate
+the DTDs manually.
+
+The process should go something like,
+
+ 1. Generate a DTD from the first foo.xml file we see. Call it
+ foo.dtd.
+
+ 2. Validate future foo documents against foo.dtd. If they all
+ validate, great. If one fails, add it to the corpus and update
+ foo.dtd so that both the original and the new foo.xml validate.
+
+ 3. Repeat until no more failures occur. This can never be perfect:
+ tomorrow we could get a foo.xml that's wildly different from what
+ we've seen in the past. But it's the best we can hope for under
+ the circumstances.
+
+Enter XML-Schema-learner. This tool can infer a DTD from a set of
+sample XML files. The top-level "schemagen" folder (in this project)
+contains a number of subfolders -- one for each type of document that
+we want to parse. Contained therein are XML samples for that
+particular document type. These were hand-picked one at a time
+according to the procedure above, and the complete set of XML is what
+we use to generate the DTDs used by htsn-import.
+
+To generate them, run `make schema` at the project
+root. XML-Schema-learner will be invoked on each subfolder of
+"schemagen" and will output the corresponding DTDs to the "schemagen"
+folder.