PIDfest 26

No new PID needed - WikidataIDs as PIDs for academic publishers

Have you ever encountered publisher names in your data and planned to normalize them? In such cases, you’ll likely face the challenge of dealing with multiple names for a single publisher. Many publishers have distinct brands or imprints that they use, and in some instances, these brands were once independent publishers that were acquired.  Therefore you also need information when the merger er happened. To complicate matters further, there are cases where individual brands or imprints are sold, resulting in a change in the publisher.

Current PIDs for organizations like, for example, ROR or the GND do not support this type of connection over time, so what’s the best PID for a publisher? 

In the session, you will learn how we solved the problem in the case of academic publishers for the Open Access Monitor Austria and why we didn’t see the need for a new PID.


Keeping up with the rapid pace of changes in the publishing industry is a significant challenge. The constant shifts in mergers, takeovers, and the lack of reliable metadata have made it difficult to track developments effectively.

Three years ago, a working group within the national project Austrian Transition to Open Access Two (AT2OA², 2021–2024) recognized this gap and decided to address it by standardizing publisher data and to use unique identifiers for them.

For the purpose of the “Open Access Monitor Austria (OAMA)”, the standardization of publisher data quickly became a crucial aspect of the project’s work. It was essential not only to capture a static snapshot of data but also to track and compare the evolution of a specific publisher as well as accurately identify and map imprints, so that we can identify the whole output of a specific publisher.

Current PIDs for organizations like for example ROR or the GND do not support semantic connections over time, so what’s the best way to use a PID for a publisher in our case?

After closely monitoring the broader PID landscape, we made the decision not to introduce a new PID that could encompass these connections. Instead, we opted to leverage Wikidata as an existing open infrastructure. Within Wikidata, we discovered that it is feasible to precisely represent these aspects, such as parent-child relationships, and establish timed connections between them. Consequently, we chose to utilize Wikidata to facilitate a collaborative working environment.

We commenced our work on Wikidata in March 2023 and documented our data model, queries, and other relevant information on a new Wikidata project titled ‘Academic Publishers’. The publisher data on Wikidata got updated and expanded by us. The foundation of our work was a list of “super publishers” that we obtained and published by the ISSN Center. We incorporated the data into the Austrian Datahub by using the Wikidata ID as PID for the publisher in the article data provided by various institutions. This approach enabled us to establish links between different brands.

In between the project AT2OA2 ended and the Open Access Monitor is in normal operations as part of the Austrian Library Consortium (KEMÖ) and hosted by the OBVSG. 

Now as we are using Wikidata IDs in our data as PIDs, we like to raise some questions:

  • How can we build a community around publisher Information to ensure the data’s ongoing accuracy?
  • Is such a system trustworthy enough to serve as PIDs?
  • Do we need to create new PIDs for new issues, or can our approach be applied to other use cases?
  • How can we enhance our system’s interlinking with other PID systems, considering Wikidata’s capability to allow outgoing links, but is this sufficient? 

Links
ISSN Network. (2023). List of multinational publishers of serial publications established by the ISSN Network (September_2023) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.8321076

http://oamonitor.obvsg.at

https://www.at2oa.at/

https://www.wikidata.org/wiki/Wikidata:WikiProject_Academic_Publisher

The speaker's profile picture
Patrick Danowski

Patrick Danowski is manager of the library at the institute of Science and Technology Austria (ISTA). He studied Computer Science and did a traineeship for academic librarians at ZLB combined with a Master Study at Humboldt University (MA LIS). He is also active in IFLA.