]> gitweb.michael.orlitzky.com - dead/htsn-import.git/blob - htsn-import.cabal
Add a new module, TSN.Database, and use it to clean up TSN.XML.News.
[dead/htsn-import.git] / htsn-import.cabal
1 name: htsn-import
2 version: 0.0.1
3 cabal-version: >= 1.8
4 author: Michael Orlitzky
5 maintainer: Michael Orlitzky <michael@orlitzky.com>
6 category: Utils
7 license: GPL-3
8 license-file: doc/LICENSE
9 build-type: Simple
10 extra-source-files:
11 doc/dbschema/*.png
12 doc/htsn-importrc.example
13 doc/man1/htsn-import.1
14 makefile
15 test/xml/*.xml
16 test/xml/*.dtd
17 schema/*.dtd
18 schemagen/Heartbeat/*.xml
19 schemagen/injuriesxml/*.xml
20 schemagen/Injuries_Detail_XML/*.xml
21 schemagen/newsxml/*.xml
22 schemagen/Odds_XML/*.xml
23 schemagen/weatherxml/*.xml
24 synopsis:
25 Import XML files from The Sports Network into an RDBMS.
26 description:
27 /Usage/:
28 .
29 @
30 htsn-import [OPTIONS] [FILES]
31 @
32 .
33 The Sports Network <http://www.sportsnetwork.com/> offers an XML feed
34 containing various sports news and statistics. Our sister program
35 /htsn/ is capable of retrieving the feed and saving the individual
36 XML documents contained therein. But what to do with them?
37 .
38 The purpose of /htsn-import/ is to take these XML documents and
39 get them into something we can use, a relational database management
40 system (RDBMS), loosely known as a SQL database. The structure of
41 relational database, is, well, relational, and the feed XML is not. So
42 there is some work to do before the data can be inserted.
43 .
44 First, we must parse the XML. Each supported document type (see below)
45 has a full pickle/unpickle implementation (\"pickle\" is simply a
46 synonym for serialize here). That means that we parse the entire
47 document into a data structure, and if we pickle (serialize) that data
48 structure, we get the exact same XML document tha we started with.
49 .
50 This is important for two reasons. First, it serves as a second level
51 of validation. The first validation is performed by the XML parser,
52 but if that succeeds and unpicking fails, we know that something is
53 fishy. Second, we don't ever want to be surprised by some new element
54 or attribute showing up in the XML. The fact that we can unpickle the
55 whole thing now means that we won't be surprised in the future.
56 .
57 The aforementioned feature is especially important because we
58 automatically migrate the database schema every time we import a
59 document. If you attempt to import a \"newsxml.dtd\" document, all
60 database objects relating to the news will be created if they do not
61 exist. We don't want the schema to change out from under us without
62 warning, so it's important that no XML be parsed that would result in
63 a different schema than we had previously. Since we can
64 pickle/unpickle everything already, this should be impossible.
65 .
66 Examples and usage documentation are available in the man page.
67
68 executable htsn-import
69 build-depends:
70 base == 4.*,
71 cmdargs >= 0.10.6,
72 configurator == 0.2.*,
73 directory == 1.2.*,
74 filepath == 1.3.*,
75 hslogger == 1.2.*,
76 htsn-common == 0.0.1,
77 hxt == 9.3.*,
78 groundhog == 0.4.*,
79 groundhog-postgresql == 0.4.*,
80 groundhog-sqlite == 0.4.*,
81 groundhog-th == 0.4.*,
82 MissingH == 1.2.*,
83 old-locale == 1.0.*,
84 tasty == 0.7.*,
85 tasty-hunit == 0.4.*,
86 time == 1.4.*,
87 transformers == 0.3.*,
88 tuple == 0.2.*
89
90 main-is:
91 Main.hs
92
93 hs-source-dirs:
94 src/
95
96 other-modules:
97 Backend
98 CommandLine
99 Configuration
100 ConnectionString
101 ExitCodes
102 OptionalConfiguration
103 TSN.Codegen
104 TSN.Database
105 TSN.DbImport
106 TSN.Picklers
107 TSN.XmlImport
108 TSN.XML.Heartbeat
109 TSN.XML.Injuries
110 TSN.XML.InjuriesDetail
111 TSN.XML.News
112 TSN.XML.Odds
113 TSN.XML.Weather
114 Xml
115
116 ghc-options:
117 -Wall
118 -fwarn-hi-shadowing
119 -fwarn-missing-signatures
120 -fwarn-name-shadowing
121 -fwarn-orphans
122 -fwarn-type-defaults
123 -fwarn-tabs
124 -fwarn-incomplete-record-updates
125 -fwarn-monomorphism-restriction
126 -fwarn-unused-do-bind
127 -rtsopts
128 -threaded
129 -optc-O3
130 -optc-march=native
131 -O2
132
133 ghc-prof-options:
134 -prof
135 -fprof-auto
136 -fprof-cafs
137 -- The following unbreak profiling with template haskell. We have
138 -- to build the program twice; once without profile and again with
139 -- these flags.
140 -hisuf hi_p
141 -osuf o_p
142
143
144 test-suite testsuite
145 type: exitcode-stdio-1.0
146 hs-source-dirs: src test
147 main-is: TestSuite.hs
148 build-depends:
149 base == 4.*,
150 cmdargs >= 0.10.6,
151 configurator == 0.2.*,
152 directory == 1.2.*,
153 filepath == 1.3.*,
154 hslogger == 1.2.*,
155 htsn-common == 0.0.1,
156 hxt == 9.3.*,
157 groundhog == 0.4.*,
158 groundhog-postgresql == 0.4.*,
159 groundhog-sqlite == 0.4.*,
160 groundhog-th == 0.4.*,
161 MissingH == 1.2.*,
162 old-locale == 1.0.*,
163 tasty == 0.7.*,
164 tasty-hunit == 0.4.*,
165 time == 1.4.*,
166 transformers == 0.3.*,
167 tuple == 0.2.*
168
169 -- It's not entirely clear to me why I have to reproduce all of this.
170 ghc-options:
171 -Wall
172 -fwarn-hi-shadowing
173 -fwarn-missing-signatures
174 -fwarn-name-shadowing
175 -fwarn-orphans
176 -fwarn-type-defaults
177 -fwarn-tabs
178 -fwarn-incomplete-record-updates
179 -fwarn-monomorphism-restriction
180 -fwarn-unused-do-bind
181 -rtsopts
182 -threaded
183 -optc-O3
184 -optc-march=native
185 -O2
186
187
188 test-suite doctests
189 type: exitcode-stdio-1.0
190 hs-source-dirs: test
191 main-is: Doctests.hs
192 build-depends:
193 base == 4.*,
194 -- Additional test dependencies.
195 doctest == 0.9.*
196
197 -- It's not entirely clear to me why I have to reproduce all of this.
198 ghc-options:
199 -Wall
200 -fwarn-hi-shadowing
201 -fwarn-missing-signatures
202 -fwarn-name-shadowing
203 -fwarn-orphans
204 -fwarn-type-defaults
205 -fwarn-tabs
206 -fwarn-incomplete-record-updates
207 -fwarn-monomorphism-restriction
208 -fwarn-unused-do-bind
209 -rtsopts
210 -threaded
211 -optc-O3
212 -optc-march=native
213 -O2
214
215
216 source-repository head
217 type: git
218 location: http://michael.orlitzky.com/git/htsn-import.git
219 branch: master