From: Michael Orlitzky Date: Mon, 7 Jul 2014 13:47:02 +0000 (-0400) Subject: Add an overview to the development README. X-Git-Tag: 0.0.6~2 X-Git-Url: http://gitweb.michael.orlitzky.com/?a=commitdiff_plain;h=241c0449e26e8207f0e82426db9f1bf709ab5fb6;p=dead%2Fhtsn-import.git Add an overview to the development README. --- diff --git a/doc/README.development b/doc/README.development index e58b418..61a5e59 100644 --- a/doc/README.development +++ b/doc/README.development @@ -1,3 +1,62 @@ +== Overview == + +The "main" function accepts a list of XML files on the command line, +and goes through them one at a time. A minimal parse attempt is made +to determine the DTD of the file, and then one big case statement +decides what to do with it based on that DTD name. Each DTD name has +an associated module in src/TSN/XML which can do a few things: + + 1. List the DTDs for which it's responsible + + 2. Parse a top-level element into an XmlTree + + 3. In rare cases (Weather, News) detect specific malformed documents + +Most of the XML modules are similar. The big idea is that every object +(for example, a ) has both a database type and an XML type. When +those two types differ, we need to be able to convert between +them. So, for example, if the XML representation of a team differs +from the database representation, we might define, + +> data Team = ... +> data TeamXml = ... + +But if you're lucky, the database/XML representations will be the +same, and you'd only need to define "Team"! + +The most common situation where the representations differ is when +there exists a parent/child relationship. In the XML representation, +you will have e.g. the Team contained within a Game: + +> data GameXml = GameXml { xml_game_id :: Int, xml_team :: TeamXml } + +But in the database representation--which looks a lot like a schema +specification--there's no mention of the team at all. + +> data Game = Game { game_id :: Int } + +That's because the database representation of the Team will have a +foreign key to a Game instead: + +> data Team = Team { games_id :: DefaultKey Game, ... } + +Most of the XML modules are devoted to converting back and forth +between these two types. The XML modules are also responsible for +"unpickling" the XML document, which essentially parses it into a +bunch of Haskell data types (the FooXml representations). + +Furthermore, each top-level message element in the XML modules knows +how to insert itself into the database. The "Message" type is always a +member of the "DbImport" class, and that class defines two methods: +dbmigrate, to run the migrations, and dbimport, which actually says +how to import the thing. + +Each XML file is handed off to the appropriate XML module which then +runs its migrations and tries to import the XML into the database. The +results are reported and collected into a list so that later the +processed files may be removed. + + == Pickle Failures == Our schemas are "best guesses" based on what we've seen on the