X-Git-Url: http://gitweb.michael.orlitzky.com/?a=blobdiff_plain;f=htsn-import.cabal;h=99df37f27359f157257a0b3c22b3969e4ac06e53;hb=4cdcdbe593c30f6434a25896951a1a4dfcc2b1ca;hp=cbec4e6646dd62ed6afe411b31870bd37dee022a;hpb=988f693ce7f1abb6566e75d539ac312b627c31d5;p=dead%2Fhtsn-import.git diff --git a/htsn-import.cabal b/htsn-import.cabal index cbec4e6..99df37f 100644 --- a/htsn-import.cabal +++ b/htsn-import.cabal @@ -7,10 +7,56 @@ category: Utils license: GPL-3 license-file: doc/LICENSE build-type: Simple +extra-source-files: + doc/htsnrc-import.example + doc/man1/htsn-import.1 + makefile + test/xml/*.xml + schema/*.dtd + schemagen/*/*.xml synopsis: Import XML files from The Sports Network into an RDBMS. description: - Import XML files from The Sports Network into an RDBMS. + /Usage/: + . + @ + htsn-import [OPTIONS] [FILES] + @ + . + The Sports Network offers an XML feed + containing various sports news and statistics. Our sister program + /htsn/ is capable of retrieving the feed and saving the individual + XML documents contained therein. But what to do with them? + . + The purpose of /htsn-import/ is to take these XML documents and + get them into something we can use, a relational database management + system (RDBMS), loosely known as a SQL database. The structure of + relational database, is, well, relational, and the feed XML is not. So + there is some work to do before the data can be inserted. + . + First, we must parse the XML. Each supported document type (see below) + has a full pickle/unpickle implementation (\"pickle\" is simply a + synonym for serialize here). That means that we parse the entire + document into a data structure, and if we pickle (serialize) that data + structure, we get the exact same XML document tha we started with. + . + This is important for two reasons. First, it serves as a second level + of validation. The first validation is performed by the XML parser, + but if that succeeds and unpicking fails, we know that something is + fishy. Second, we don't ever want to be surprised by some new element + or attribute showing up in the XML. The fact that we can unpickle the + whole thing now means that we won't be surprised in the future. + . + The aforementioned feature is especially important because we + automatically migrate the database schema every time we import a + document. If you attempt to import a \"newsxml.dtd\" document, all + database objects relating to the news will be created if they do not + exist. We don't want the schema to change out from under us without + warning, so it's important that no XML be parsed that would result in + a different schema than we had previously. Since we can + pickle/unpickle everything already, this should be impossible. + . + Examples and usage documentation are available in the man page. executable htsn-import build-depends: @@ -26,6 +72,7 @@ executable htsn-import groundhog-postgresql == 0.4.*, groundhog-sqlite == 0.4.*, groundhog-th == 0.4.*, + MissingH == 1.2.*, old-locale == 1.0.*, tasty == 0.7.*, tasty-hunit == 0.4.*, @@ -39,6 +86,24 @@ executable htsn-import hs-source-dirs: src/ + other-modules: + Backend + CommandLine + Configuration + ConnectionString + ExitCodes + OptionalConfiguration + TSN.Codegen + TSN.DbImport + TSN.Picklers + TSN.XmlImport + TSN.XML.Heartbeat + TSN.XML.Injuries + TSN.XML.InjuriesDetail + TSN.XML.News + TSN.XML.Odds + Xml + ghc-options: -Wall -fwarn-hi-shadowing @@ -79,6 +144,7 @@ test-suite testsuite groundhog-postgresql == 0.4.*, groundhog-sqlite == 0.4.*, groundhog-th == 0.4.*, + MissingH == 1.2.*, old-locale == 1.0.*, tasty == 0.7.*, tasty-hunit == 0.4.*,