4 author: Michael Orlitzky
5 maintainer: Michael Orlitzky <michael@orlitzky.com>
8 license-file: doc/LICENSE
11 doc/htsnrc-import.example
12 doc/man1/htsn-import.1
18 Import XML files from The Sports Network into an RDBMS.
23 htsn-import [OPTIONS] [FILES]
26 The Sports Network <http://www.sportsnetwork.com/> offers an XML feed
27 containing various sports news and statistics. Our sister program
28 /htsn/ is capable of retrieving the feed and saving the individual
29 XML documents contained therein. But what to do with them?
31 The purpose of /htsn-import/ is to take these XML documents and
32 get them into something we can use, a relational database management
33 system (RDBMS), loosely known as a SQL database. The structure of
34 relational database, is, well, relational, and the feed XML is not. So
35 there is some work to do before the data can be inserted.
37 First, we must parse the XML. Each supported document type (see below)
38 has a full pickle/unpickle implementation (\"pickle\" is simply a
39 synonym for serialize here). That means that we parse the entire
40 document into a data structure, and if we pickle (serialize) that data
41 structure, we get the exact same XML document tha we started with.
43 This is important for two reasons. First, it serves as a second level
44 of validation. The first validation is performed by the XML parser,
45 but if that succeeds and unpicking fails, we know that something is
46 fishy. Second, we don't ever want to be surprised by some new element
47 or attribute showing up in the XML. The fact that we can unpickle the
48 whole thing now means that we won't be surprised in the future.
50 The aforementioned feature is especially important because we
51 automatically migrate the database schema every time we import a
52 document. If you attempt to import a \"newsxml.dtd\" document, all
53 database objects relating to the news will be created if they do not
54 exist. We don't want the schema to change out from under us without
55 warning, so it's important that no XML be parsed that would result in
56 a different schema than we had previously. Since we can
57 pickle/unpickle everything already, this should be impossible.
59 Examples and usage documentation are available in the man page.
61 executable htsn-import
65 configurator == 0.2.*,
72 groundhog-postgresql == 0.4.*,
73 groundhog-sqlite == 0.4.*,
74 groundhog-th == 0.4.*,
80 transformers == 0.3.*,
102 TSN.XML.InjuriesDetail
110 -fwarn-missing-signatures
111 -fwarn-name-shadowing
115 -fwarn-incomplete-record-updates
116 -fwarn-monomorphism-restriction
117 -fwarn-unused-do-bind
128 -- The following unbreak profiling with template haskell. We have
129 -- to build the program twice; once without profile and again with
136 type: exitcode-stdio-1.0
137 hs-source-dirs: src test
138 main-is: TestSuite.hs
142 configurator == 0.2.*,
146 htsn-common == 0.0.1,
149 groundhog-postgresql == 0.4.*,
150 groundhog-sqlite == 0.4.*,
151 groundhog-th == 0.4.*,
155 tasty-hunit == 0.4.*,
157 transformers == 0.3.*,
160 -- It's not entirely clear to me why I have to reproduce all of this.
164 -fwarn-missing-signatures
165 -fwarn-name-shadowing
169 -fwarn-incomplete-record-updates
170 -fwarn-monomorphism-restriction
171 -fwarn-unused-do-bind
180 type: exitcode-stdio-1.0
185 -- Additional test dependencies.
188 -- It's not entirely clear to me why I have to reproduce all of this.
192 -fwarn-missing-signatures
193 -fwarn-name-shadowing
197 -fwarn-incomplete-record-updates
198 -fwarn-monomorphism-restriction
199 -fwarn-unused-do-bind
207 source-repository head
209 location: http://michael.orlitzky.com/git/htsn-import.git