]> gitweb.michael.orlitzky.com - dead/htsn-import.git/blob - doc/man1/htsn-import.1
0816be35d33cc5eb8a035dc551e40d7e05f1b44b
[dead/htsn-import.git] / doc / man1 / htsn-import.1
1 .TH htsn-import 1
2
3 .SH NAME
4 htsn-import \- Import XML files from The Sports Network into an RDBMS.
5
6 .SH SYNOPSIS
7
8 \fBhtsn-import\fR [OPTIONS] [FILES]
9
10 .SH DESCRIPTION
11
12 .SH DATABASE SCHEMA
13 .P
14 At the top level, we have one table for each of the XML document types
15 that we import. For example, the documents corresponding to
16 \fInewsxml.dtd\fR will have a table called \(dqnews\(dq.
17 .P
18 These top-level tables will often have children. For example, each
19 news item has zero or more locations associated with it. The child
20 table will be named <parent>_<children>, which in this case
21 corresponsds to \(dqnews_locations\(dq.
22 .P
23 To relate the two, a third table exists with name <parent
24 table>__<child table>. Note the two underscores. This prevents
25 ambiguity when the child table itself contains underscores. As long we
26 never go more than one level down, this system should suffice. The
27 table joining \(dqnews\(dq with \(dqnews_locations\(dq is thus called
28 \(dqnews__news_locations\(dq.
29 .P
30 Wherever possible, children are kept unique to prevent pointless
31 duplication. This slows down inserts, and speeds up reads (which we
32 assume are much more frequent). The current rate at which the feed
33 transmits XML is much too slow to cause problems inserting.