PUMA publications matching software
The ESRF is developing the PUMA (publication and user experiment metadata analyser) publication matching software within the STREAMLINE H2020 project to extract data about research carried out at the ESRF.
In the words of Harald Reichert, coordinator of STREAMLINE, “We are sitting on a mountain of data that hasn’t been explored. PUMA gives us the opportunity to discover trends within this data and develop our science programme based on real data”. The data that PUMA can access includes information about the instruments and beamlines, techniques, scientific areas, authors, member countries, publications (>40,000), public proposals submitted (>46,000) and accepted (>20,000), and industrial proposals (>1,300). PUMA provides a quantifiable view of the data, not only enabling conclusions about the science currently being carried out at the ESRF, but also helping to identity trends. It facilitates reporting for the ESRF management and supports strategic planning for the facility. Figure 1 presents an overview of how PUMA integrates into the ESRF user management and library reporting systems.
Figure 1. How PUMA fits into the existing ESRF architecture.
Initially created at the Institut Laue-Langevin within the European project FILL2030, PUMA transitioned to the ESRF in 2019 and has been enhanced for ESRF needs by STREAMLINE, with some of the added features of benefit to both institutes.
What is the impact of PUMA?
Internally to the ESRF, PUMA provides valuable insight and immediate reporting on metadata that would be impossible to exploit manually. It is a tool to support strategic planning based on the evolution of the beamlines, scientific fields, and topical activities.
In a wider context, PUMA could be used by other large-scale research infrastructures in Europe offering a public user programme. The same set of metadata as used for the ESRF would be the basis of the reporting – based on shared KPIs – for other facilities. This would be a powerful and immediate reporting tool for consortia such as LEAPS (League of European Accelerator-based Photon Sources) or LENS (League of advanced European Neutron Sources).
What can PUMA do?
The Advanced Search module of the PUMA web application makes it possible to search through the documents (publications and proposals) that are indexed by PUMA. This search engine has recently been enhanced to allow an advanced search by beamline or instrument, person, institution, keywords, accepted and refused proposals, and the full text of publications or proposals.
The advanced search example presented in Figure 2 shows proposals involving work related to Van Gogh or Picasso paintings retrieved by using an advanced search query – “(“van gogh” | Picasso) painting”. From the statistics graphs on the right, we can easily see that most research was performed at beamline ID21.
Figure 2. PUMA advanced search query.
The Instrument Dashboard allows the grouping of beamlines or instruments by name and by time. For example, when an instrument is modified within the facility, i.e. a beamline relocated to a different source position, then PUMA can visualise the history of the beamline or instrument and combine data from both time periods. PUMA automatically retrieves information on citation, h-index, and journal impact factor statistics and provides advanced graphics such as the evolution of citations for an instrument over time. PUMA relies on the Web of Science (webofscience.com) as a citation source. The PUMA development team is also evaluating “Open Citations” as an alternative open source of citations (opencitations.net). Figure 3 present a view of the instrument dashboard for beamline ID01.
Figure 3. PUMA instrument dashboard.
PUMA can create Collection Reports that group instrument proposals and/or publications. Collections can be created for the preparation of reports (journals, instruments, citations, keywords, etc.). For example, the detailed institution report is particularly useful to visualise the evolution of scientific communities. The following example shows collection statistics for “ID01 proposals”. Users can select a country or city to display institutions that were involved in any publication or proposal (Figure 4).
Figure 4. PUMA collection reports screen showing collection statistics on proposals for beamline ID01.
Outlook
There are more steps to be taken to ensure the success and sustainability of this project. These include opening PUMA to all staff as currently its use is restricted to the developers of the project and a select few high-level testers in the management of the ESRF. Work is ongoing to automate the tracking of publications from ESRF experiments and authors to support librarians with tracking and validating activities for the ESRF library database. Interest in the software from outside of the ESRF is developing at other research infrastructures, in particular within the European lightsource community. Discussions are underway to identify an opportunity to develop the software for a larger community of user facilities.