Close followers of the struct plugin development at Github will have noticed that a lot happened during the last months:
This post is meant to give some insights on what we did and what has changed.
For people not in the know: the struct plugin is a plugin that makes it possible to enhance the free text nature of DokuWiki by adding structured data that can be aggregated, filtered and combined. Check the official documentation to learn more.
The initial impulse for the development was a request by a customer. They wanted to add struct managed data to pages, but not as actual metadata but as a series of data inputs.
The customer uses DokuWiki to document lab experiments. An experiment has a page with attached struct metadata. Additionally the actual tests of the experiment are run multiple times with slightly different inputs and results.
The traditional approach would have required a new page for each of the tests, even though there is no actual wiki data. All that’s needed is a list of struct managed data points.
The solution is a new kind of struct data: “serial data”. It behaves much like the “lookup data” (which we have now renamed to “global data”) in that it is a simple table with an input form at the end. Of course the table is defined by a struct schema.
The difference to the “global data” (née “lookup data”) is that all the data entered is bound to the page the “serial data” is entered on.
This allows each experiment to have their own series of tests. This could of course have been achieved by a simple wiki syntax based table, too. The “serial data” advantage is that it is managed by struct and thus can be used in aggregations. And since it’s tied to a page, it can also be merged with its page’s “page data”. Thus it’s easy to say for example “list all tests that had a positive outcome and their experiment’s name and researcher”:
---- struct table ----
cols: %title%, experiment.researcher, test.dosage, test.result
schema: experiment, test
filter: test.result = positive
To implement the new “serial data” we did a complete overhaul of how data is stored and how schemas are handled. You will notice that when creating a new schema.
Schema creation used to ask about the intended use of the schema beforehand, giving you the choice of “page schema” and “lookup schema”.
This distinction has been removed. Schemas no longer make any assumption on how they will be used later. After all, they simply define the fields to hold the data - the only difference is if this data is attached to a page or not.
This means you can now use the same schema in different ways. In the majority of cases you will use a schema in only one way, but exceptions may happen.
For example you could use the same schema for “page data” and “global data” and then reference this schema in a “lookup” field. This could be useful for when you have certain “special” items that have detailed pages and a lot of “non-special” items which you only want to list.
For example you want to reference rooms. Your meeting rooms are just generic rooms with a name and location, so they go in the “global data” list. But your laboratories have detailed pages with equipment descriptions, etc. so they have proper pages with “page data” associated.
As hinted above, we renamed a few terms to avoid confusion we noticed while we were working on this.
First of all, schemas no longer have any type. A schema is a schema regardless of how it is used. That usage however still differs and there are currently three types:
- page data - for each field in the schema one value (or multiple values for multi fields) is stored and associated with the page itself. Changing this data will create a new page revision and historic data can be seen in the history.
- global data - this is what we used to call “lookup schema”. We want to avoid confusion with the “lookup” field type, so we renamed this. As the name implies, the data here is not associated with any page but is available globally. Data is stored in rows with each row holding the data for the set of fields. Data in here is not versioned.
This kind of data storage is often used to define values to be used in the Lookup field type to fill a dropdown.
- serial data - this is the new type we introduced above. Like with global data, data is stored in rows with each row holding the data for the set of fields. However all the data is associated with a page, making it possible to reuse the same schema on multiple pages. Again, data in here is not versioned.
Data Storage Internals
If you’re interested in how data is stored internally now, here’s a quick glance. Basically we’re using a composite primary key consisting of a page identifier (pid) and a row identifier (rid).
Depending on both values, we can now differentiate how to handle the data:
- page data has a non-empty pid and a rid of NULL
- global data has an empty pid and a rid > 0
- serial data has a non-empty pid and a rid > 0
For storing the data, we rely on the way it is submitted. Page data is stored while editing the page - the other types have their own special syntax.
Updating from previous versions
All changes described above are already available and published. If you’re already using struct is important to be careful with the update.
You need to update the sqlite plugin first! The sqlite plugin takes care of migrating any existing data to the new format. It needs to be up-to-date before updating the struct plugin, otherwise you will end up with an inconsistent state. Check issue 499 if that happens to you.
The sqlite plugin and the struct plugin itself can simply be updated through DokuWiki’s plugin manager as usual.
Some plugins for struct had issues with the new internal storage changes, but all issues we’re aware of should have been fixed. In any case, we recommend making a backup of struct’s sqlite database before upgrading. Simply make a copy of the