4 author: Michael Orlitzky
5 maintainer: Michael Orlitzky <michael@orlitzky.com>
8 license-file: doc/LICENSE
12 doc/htsn-importrc.example
13 doc/man1/htsn-import.1
15 doc/README.development
20 schemagen/AutoRacingResultsXML/*.xml
21 schemagen/Auto_Racing_Schedule_XML/*.xml
22 schemagen/Heartbeat/*.xml
23 schemagen/injuriesxml/*.xml
24 schemagen/Injuries_Detail_XML/*.xml
25 schemagen/newsxml/*.xml
26 schemagen/Odds_XML/*.xml
27 schemagen/scoresxml/*.xml
28 schemagen/weatherxml/*.xml
33 Import XML files from The Sports Network into an RDBMS.
38 htsn-import [OPTIONS] [FILES]
41 The Sports Network <http://www.sportsnetwork.com/> offers an XML feed
42 containing various sports news and statistics. Our sister program
43 /htsn/ is capable of retrieving the feed and saving the individual
44 XML documents contained therein. But what to do with them?
46 The purpose of /htsn-import/ is to take these XML documents and
47 get them into something we can use, a relational database management
48 system (RDBMS), loosely known as a SQL database. The structure of
49 relational database, is, well, relational, and the feed XML is not. So
50 there is some work to do before the data can be inserted.
52 First, we must parse the XML. Each supported document type (see below)
53 has a full pickle/unpickle implementation (\"pickle\" is simply a
54 synonym for serialize here). That means that we parse the entire
55 document into a data structure, and if we pickle (serialize) that data
56 structure, we get the exact same XML document tha we started with.
58 This is important for two reasons. First, it serves as a second level
59 of validation. The first validation is performed by the XML parser,
60 but if that succeeds and unpicking fails, we know that something is
61 fishy. Second, we don't ever want to be surprised by some new element
62 or attribute showing up in the XML. The fact that we can unpickle the
63 whole thing now means that we won't be surprised in the future.
65 The aforementioned feature is especially important because we
66 automatically migrate the database schema every time we import a
67 document. If you attempt to import a \"newsxml.dtd\" document, all
68 database objects relating to the news will be created if they do not
69 exist. We don't want the schema to change out from under us without
70 warning, so it's important that no XML be parsed that would result in
71 a different schema than we had previously. Since we can
72 pickle/unpickle everything already, this should be impossible.
74 Examples and usage documentation are available in the man page.
76 executable htsn-import
87 groundhog-postgresql >= 0.5,
88 groundhog-sqlite >= 0.5,
111 OptionalConfiguration
119 TSN.XML.AutoRacingResults
120 TSN.XML.AutoRacingSchedule
124 TSN.XML.InjuriesDetail
136 -fwarn-missing-signatures
137 -fwarn-name-shadowing
141 -fwarn-incomplete-record-updates
142 -fwarn-monomorphism-restriction
143 -fwarn-unused-do-bind
150 -- The following unbreak profiling with template haskell. We have
151 -- to build the program twice; once without profile and again with
158 type: exitcode-stdio-1.0
159 hs-source-dirs: src test
160 main-is: TestSuite.hs
168 htsn-common >= 0.0.1,
171 groundhog-postgresql >= 0.5,
172 groundhog-sqlite >= 0.5,
183 -- It's not entirely clear to me why I have to reproduce all of this.
187 -fwarn-missing-signatures
188 -fwarn-name-shadowing
192 -fwarn-incomplete-record-updates
193 -fwarn-monomorphism-restriction
194 -fwarn-unused-do-bind
199 type: exitcode-stdio-1.0
204 -- Additional test dependencies.
207 -- It's not entirely clear to me why I have to reproduce all of this.
211 -fwarn-missing-signatures
212 -fwarn-name-shadowing
216 -fwarn-incomplete-record-updates
217 -fwarn-monomorphism-restriction
218 -fwarn-unused-do-bind
226 -- These won't work without shelltestrunner installed in your
227 -- $PATH. Maybe there is some way to tell Cabal that.
228 test-suite shelltests
229 type: exitcode-stdio-1.0
231 main-is: ShellTests.hs
240 htsn-common >= 0.0.1,
243 groundhog-postgresql >= 0.5,
244 groundhog-sqlite >= 0.5,
258 source-repository head
260 location: http://michael.orlitzky.com/git/htsn-import.git