pkp2011 – Computers for the Classics

Today there was the kick-off of the hackfest at the PKP 2011 conference. Not many people turned up, but I had the chance to spend some quality (coding) time with PKP developers and to have a sort of personal code sprint on a side project, that is developing a plugin to integrate a Named Entity Recognition (NER) web service into an OJS installation (see here and there for a more theoretical background).

At the end of the day what I got done was:

setup a local instance of OJS (version 2.3.6) using MAMB;
give a quick try to the OJS Voyeur plugin, which unfortunately for me is working only with version <=2.2.x;
create the bare-bone of the plugin, whose code is up here (for my personal record rather than for other’s use, at least at this early stage);
write a PHP class to query a web service (that I’m developing) to extract citations of ancient works from (plain) texts;
come up with two possible scenarios for further implementation of the plugin, to happen possibly earlier than next year’s PKP hackfest 😉

The idea of this post, indeed, is to comment a little on these two possible scenarios.

1. Client-side centric

The first scenario looks rather heavy on the client-side. The plugin is packaged as an OJS plugin and what it does is essentially as follows:

after an article is loaded for view, a javascript (grab.js) gets all the <p> elements of the HTML article and send them over ajax to a php page (proxy.php);
a php class act as a proxy (or client) for a 3rd party NER web service;
the data that are received from via the ajax call are passed on to the web service via XML-RPC;
the response is returned by the web service as JSON or XML format…
… and then processed again by the JS script, ideally using a compiled template based on jquery’s template capability. Finally, the citations that were extracted are display as a summary box alongside the article.

2. Server-side centric

Instead, in the second scenario that I envisaged most of the processing happens on the server-side.

before being displayed, the article is processed to extract <p> elements;
the main plugin class (plugin.php) takes care of sending the input to and receiving a response from the NER service;
the response is then ran through a template (template.tpl) by exploiting OJS’s templating functionalities;
the formatted summary box is injected into the HTML which is now ready to be displayed to the user.

All in all, I think that I came up with (1) mainly because my PHP is rather rusty at the moment ;). Therefore, although I’m quite reluctant to admit so, I might decide to go for (2). However, a good point to opt for the former is the case where the user can decide for each paper whether to enable this feature or not.