X-Git-Url: https://gitweb.michael.orlitzky.com/?a=blobdiff_plain;f=doc%2Fman1%2Fhtsn-import.1;h=7a215b142c420e2931045a895daa1d3ebba974c6;hb=5e06d6a189fd5bc1cbc67a349bbee5e168d3bf24;hp=352eb2b6ab66048f400f97521b580f082c75794e;hpb=775620685401dce01bbd2d715450ab101a8ec56a;p=dead%2Fhtsn-import.git diff --git a/doc/man1/htsn-import.1 b/doc/man1/htsn-import.1 index 352eb2b..7a215b1 100644 --- a/doc/man1/htsn-import.1 +++ b/doc/man1/htsn-import.1 @@ -106,6 +106,29 @@ type are provided with the \fBhtsn-import\fR documentation, in the should be considered a bug if they are incorrect. The diagrams are created using the pgModeler tool. +.SH NULL POLICY +.P +Normally in a database one makes a distinction between fields that +simply don't exist, and those fields that are +\(dqempty\(dq. Translating from XML, there is a natural way to +determine which one should be used: if an element is present in the +XML document but its contents are empty, then an empty string should +be inserted into the corresponding field. If on the other hand the +element is missing entirely, the corresponding database entry should +be NULL to indicate that fact. +.P +This sounds well and good, but the XML must be consistent for the +database consumer to make any sense of what he sees. The feed XML uses +optional and blank elements interchangeably, and without any +discernable pattern. To propagate this pattern into the database would +only cause confusion. +.P +As a result, a policy was adopted: both optional elements and elements +whose contents can be empty will be considered nullable in the +database. If the element is missing, the corresponding field is +NULL. Likewise if the content is simply missing. That means there +should never be a (completely) empty string in a database column. + .SH XML SCHEMA GENERATION .P In order to parse XML, you need to know the structure of your @@ -264,6 +287,28 @@ it would greatly complicate things. The first form is more common, so that's all we support for now. An example is provided as schemagen/weatherxml/20143655.xml. +.SH DEPLOYMENT +.P +When deploying for the first time, the target database will most +likely be empty. The schema will be migrated when a new document type +is seen, but this has a downside: it can be months before every +supported document type has been seen once. This can make it difficult +to test the database permissions. +.P +Since all of the test XML documents have old timestamps, one easy +workaround is the following: simply import all of the test XML +documents, and then delete them using whatever script is used to prune +old entries. This will force the migration of the schema, after which +you can set and test the database permissions. +.P +Something as simple as, +.P +.nf +.I $ find ./test/xml -iname '*.xml' | xargs htsn-import -c foo.sqlite +.fi +.P +should do it. + .SH OPTIONS .IP \fB\-\-backend\fR,\ \fB\-b\fR