4 author: Michael Orlitzky
5 maintainer: Michael Orlitzky <michael@orlitzky.com>
8 license-file: doc/LICENSE
12 doc/htsn-importrc.example
13 doc/man1/htsn-import.1
17 schemagen/Heartbeat/*.xml
18 schemagen/injuriesxml/*.xml
19 schemagen/Injuries_Detail_XML/*.xml
20 schemagen/newsxml/*.xml
21 schemagen/Odds_XML/*.xml
22 schemagen/weatherxml/*.xml
24 Import XML files from The Sports Network into an RDBMS.
29 htsn-import [OPTIONS] [FILES]
32 The Sports Network <http://www.sportsnetwork.com/> offers an XML feed
33 containing various sports news and statistics. Our sister program
34 /htsn/ is capable of retrieving the feed and saving the individual
35 XML documents contained therein. But what to do with them?
37 The purpose of /htsn-import/ is to take these XML documents and
38 get them into something we can use, a relational database management
39 system (RDBMS), loosely known as a SQL database. The structure of
40 relational database, is, well, relational, and the feed XML is not. So
41 there is some work to do before the data can be inserted.
43 First, we must parse the XML. Each supported document type (see below)
44 has a full pickle/unpickle implementation (\"pickle\" is simply a
45 synonym for serialize here). That means that we parse the entire
46 document into a data structure, and if we pickle (serialize) that data
47 structure, we get the exact same XML document tha we started with.
49 This is important for two reasons. First, it serves as a second level
50 of validation. The first validation is performed by the XML parser,
51 but if that succeeds and unpicking fails, we know that something is
52 fishy. Second, we don't ever want to be surprised by some new element
53 or attribute showing up in the XML. The fact that we can unpickle the
54 whole thing now means that we won't be surprised in the future.
56 The aforementioned feature is especially important because we
57 automatically migrate the database schema every time we import a
58 document. If you attempt to import a \"newsxml.dtd\" document, all
59 database objects relating to the news will be created if they do not
60 exist. We don't want the schema to change out from under us without
61 warning, so it's important that no XML be parsed that would result in
62 a different schema than we had previously. Since we can
63 pickle/unpickle everything already, this should be impossible.
65 Examples and usage documentation are available in the man page.
67 executable htsn-import
71 configurator == 0.2.*,
78 groundhog-postgresql == 0.4.*,
79 groundhog-sqlite == 0.4.*,
80 groundhog-th == 0.4.*,
86 transformers == 0.3.*,
101 OptionalConfiguration
108 TSN.XML.InjuriesDetail
116 -fwarn-missing-signatures
117 -fwarn-name-shadowing
121 -fwarn-incomplete-record-updates
122 -fwarn-monomorphism-restriction
123 -fwarn-unused-do-bind
134 -- The following unbreak profiling with template haskell. We have
135 -- to build the program twice; once without profile and again with
142 type: exitcode-stdio-1.0
143 hs-source-dirs: src test
144 main-is: TestSuite.hs
148 configurator == 0.2.*,
152 htsn-common == 0.0.1,
155 groundhog-postgresql == 0.4.*,
156 groundhog-sqlite == 0.4.*,
157 groundhog-th == 0.4.*,
161 tasty-hunit == 0.4.*,
163 transformers == 0.3.*,
166 -- It's not entirely clear to me why I have to reproduce all of this.
170 -fwarn-missing-signatures
171 -fwarn-name-shadowing
175 -fwarn-incomplete-record-updates
176 -fwarn-monomorphism-restriction
177 -fwarn-unused-do-bind
186 type: exitcode-stdio-1.0
191 -- Additional test dependencies.
194 -- It's not entirely clear to me why I have to reproduce all of this.
198 -fwarn-missing-signatures
199 -fwarn-name-shadowing
203 -fwarn-incomplete-record-updates
204 -fwarn-monomorphism-restriction
205 -fwarn-unused-do-bind
213 source-repository head
215 location: http://michael.orlitzky.com/git/htsn-import.git