Today there was the kick-off of the hackfest at the PKP 2011 conference. Not many people turned up, but I had the chance to spend some quality (coding) time with PKP developers and to have a sort of personal code sprint on a side project, that is developing a plugin to integrate a Named Entity Recognition (NER) web service into an OJS installation (see here and there for a more theoretical background).
At the end of the day what I got done was:
- setup a local instance of OJS (version 2.3.6) using MAMB;
- give a quick try to the OJS Voyeur plugin, which unfortunately for me is working only with version <=2.2.x;
- create the bare-bone of the plugin, whose code is up here (for my personal record rather than for other’s use, at least at this early stage);
- write a PHP class to query a web service (that I’m developing) to extract citations of ancient works from (plain) texts;
- come up with two possible scenarios for further implementation of the plugin, to happen possibly earlier than next year’s PKP hackfest
1. Client-side centric
The first scenario looks rather heavy on the client-side. The plugin is packaged as an OJS plugin and what it does is essentially as follows:
- a php class act as a proxy (or client) for a 3rd party NER web service;
- the data that are received from via the ajax call are passed on to the web service via XML-RPC;
- the response is returned by the web service as JSON or XML format…
- … and then processed again by the JS script, ideally using a compiled template based on jquery’s template capability. Finally, the citations that were extracted are display as a summary box alongside the article.
2. Server-side centric
Instead, in the second scenario that I envisaged most of the processing happens on the server-side.
- before being displayed, the article is processed to extract <p> elements;
- the main plugin class (plugin.php) takes care of sending the input to and receiving a response from the NER service;
- the response is then ran through a template (template.tpl) by exploiting OJS’s templating functionalities;
- the formatted summary box is injected into the HTML which is now ready to be displayed to the user.
All in all, I think that I came up with (1) mainly because my PHP is rather rusty at the moment ;). Therefore, although I’m quite reluctant to admit so, I might decide to go for (2). However, a good point to opt for the former is the case where the user can decide for each paper whether to enable this feature or not.