4 author: Michael Orlitzky
5 maintainer: Michael Orlitzky <michael@orlitzky.com>
8 license-file: doc/LICENSE
12 doc/htsn-importrc.example
13 doc/man1/htsn-import.1
21 schemagen/Heartbeat/*.xml
22 schemagen/injuriesxml/*.xml
23 schemagen/Injuries_Detail_XML/*.xml
24 schemagen/newsxml/*.xml
25 schemagen/Odds_XML/*.xml
26 schemagen/weatherxml/*.xml
28 Import XML files from The Sports Network into an RDBMS.
33 htsn-import [OPTIONS] [FILES]
36 The Sports Network <http://www.sportsnetwork.com/> offers an XML feed
37 containing various sports news and statistics. Our sister program
38 /htsn/ is capable of retrieving the feed and saving the individual
39 XML documents contained therein. But what to do with them?
41 The purpose of /htsn-import/ is to take these XML documents and
42 get them into something we can use, a relational database management
43 system (RDBMS), loosely known as a SQL database. The structure of
44 relational database, is, well, relational, and the feed XML is not. So
45 there is some work to do before the data can be inserted.
47 First, we must parse the XML. Each supported document type (see below)
48 has a full pickle/unpickle implementation (\"pickle\" is simply a
49 synonym for serialize here). That means that we parse the entire
50 document into a data structure, and if we pickle (serialize) that data
51 structure, we get the exact same XML document tha we started with.
53 This is important for two reasons. First, it serves as a second level
54 of validation. The first validation is performed by the XML parser,
55 but if that succeeds and unpicking fails, we know that something is
56 fishy. Second, we don't ever want to be surprised by some new element
57 or attribute showing up in the XML. The fact that we can unpickle the
58 whole thing now means that we won't be surprised in the future.
60 The aforementioned feature is especially important because we
61 automatically migrate the database schema every time we import a
62 document. If you attempt to import a \"newsxml.dtd\" document, all
63 database objects relating to the news will be created if they do not
64 exist. We don't want the schema to change out from under us without
65 warning, so it's important that no XML be parsed that would result in
66 a different schema than we had previously. Since we can
67 pickle/unpickle everything already, this should be impossible.
69 Examples and usage documentation are available in the man page.
71 executable htsn-import
75 configurator == 0.2.*,
82 groundhog-postgresql == 0.4.*,
83 groundhog-sqlite == 0.4.*,
84 groundhog-th == 0.4.*,
90 transformers == 0.3.*,
105 OptionalConfiguration
113 TSN.XML.InjuriesDetail
122 -fwarn-missing-signatures
123 -fwarn-name-shadowing
127 -fwarn-incomplete-record-updates
128 -fwarn-monomorphism-restriction
129 -fwarn-unused-do-bind
140 -- The following unbreak profiling with template haskell. We have
141 -- to build the program twice; once without profile and again with
148 type: exitcode-stdio-1.0
149 hs-source-dirs: src test
150 main-is: TestSuite.hs
154 configurator == 0.2.*,
158 htsn-common == 0.0.1,
161 groundhog-postgresql == 0.4.*,
162 groundhog-sqlite == 0.4.*,
163 groundhog-th == 0.4.*,
167 tasty-hunit == 0.4.*,
169 transformers == 0.3.*,
172 -- It's not entirely clear to me why I have to reproduce all of this.
176 -fwarn-missing-signatures
177 -fwarn-name-shadowing
181 -fwarn-incomplete-record-updates
182 -fwarn-monomorphism-restriction
183 -fwarn-unused-do-bind
192 type: exitcode-stdio-1.0
197 -- Additional test dependencies.
200 -- It's not entirely clear to me why I have to reproduce all of this.
204 -fwarn-missing-signatures
205 -fwarn-name-shadowing
209 -fwarn-incomplete-record-updates
210 -fwarn-monomorphism-restriction
211 -fwarn-unused-do-bind
219 source-repository head
221 location: http://michael.orlitzky.com/git/htsn-import.git