
### Open Science for the Public Good
---

Kathleen Fitzpatrick // @kfitz@hcommons.social // kfitz@msu.edu
http://presentations.kfitz.info/nsf250930.html
This work has been supported by the U.S. National Science Foundation under Grants No. OAC-2226271 and OAC-2531819.
Note: I'm project director of Knowledge Commons, an open-access, community-governed, nonprofit network on which knowledge creators across the disciplines and around the world can deposit and share their work, build new collaborations, and create a vibrant digital presence for themselves, their teams, and their projects.
Note: Knowledge Commons began as a project of the Modern Language Association, the largest scholarly society in the humanities. In 2012 I was serving as the association's first Director of Scholarly Communication, and was working to think about how the digital transformation was changing not only the ways that scholars communicated with one another but also the reasons they joined professional societies. In the past, getting access to the society's journal or attending the annual meeting was focal, but those drivers were becoming less important. We posited, however, that the ability to *participate* in the ongoing conversations sponsored by the society would be a draw, and so in 2013 we launched **MLA Commons**, the first node on what would become the Knowledge Commons network.
https://hcommons.org
Note: Knowledge Commons thus has its roots in the humanities, which have been historically underserved in the research infrastructure space; when we first began work on the project, not only did few research communication platforms include the humanities in their fields of interest, but those that did too often lumped all arts and humanities fields together in a single bucket (while maintaining infinite distinctions among the various subfields of physics and chemistry), with the result that even those scholars who wanted to use these platforms to make their work openly available to the world had trouble getting traction, because their community of practice could not find them and coalesce around the shared work.
- We began developing the Commons with the sense that, in encouraging knowledge creators of all kinds to do more of their work in open, collaborative ways, the most significant problem we needed to solve was social rather than technical: we needed to encourage not just individual scholars but scholarly *communities* to join us, to find and engage with one another in building a digital commons. In growing the Commons we first reached out the fields adjacent to modern languages, launching the interdisciplinary **Humanities Commons** in 2016.
- In 2018, recognizing that the Commons was becoming larger than a consortium of scholarly societies with tiny budgets could manage, we migrated the network to Michigan State University, where the project has been overseen by the lab I established there, with significant support from an Infrastructure and Capacity Building Challenge Grant from the **National Endowment for the Humanities** and a Change Capital grant from the Mellon **Foundation**, among a range of other funders and donors. That funding allowed us to build a development team able to take on some significant technical debt that had begun to accrue, as well as a community engagement team that could help drive Commons adoption.
- At MSU, however, we began to realize, first, that focusing our fundraising efforts on driving participation among scholarly societies was unlikely to allow us to reach sustainability, both due to their budget constraints and to their assumptions about what their members needed -- instead, appealing to *institutions* to join and support the network was likely a better path forward. We also saw that driving such institutional participation would require us to serve the campus as a whole, rather than a small subset of the campus that was too often underfunded.
- Our first step toward that broadened interdisciplinarity began in 2021, when we were approached by a group of STEM education researchers at MSU who were seeking ways to build connections and communication across their often siloed fields. They needed a platform on which they could collaborate to develop the principles and practices through which their community might be encouraged to share their work -- and their data -- with one another, and with the world. The STEM Ed+ group approached the Commons to begin this collaboration based on our articulation of the **values** that bring us to Open Science as well as our focus on building community, which come together in our development of a **community governance** structure to ensure that the platform develops with the needs of its institutional and individual members front and center.
- We began our work with the STEM Ed+ teach by talking with STEM education researchers both at MSU and elsewhere in order to find out more about their needs, as well as about the values that guided their work. In the process, we heard about several key issues that they face:
- the obligation to make certain that the benefits of community-engaged research accrue not just to university-based researchers but to the communities with which they work
- the desire to share the results of research as openly as possible while ensuring that often vulnerable communities remain in control of the data gathered about them
- the need to develop mechanisms for sharing work and establishing new collaborations across often siloed communities of practice
- Through these early discussions, we began to sketch out the principles behind **STEM Ed+ Commons**, a network through which STEM education researchers can learn more about the **FAIR and CARE principles** behind Open Science and deliberate together about the means of putting them into practice.
- The **FAIR principles** for data stewardship form the heart of the FAIROS program: ensuring that the products of research are made findable, accessible, interoperable, and reusable, such that the entire scientific community as well as the general public can benefit from them.
- In addition to FAIR, however, we have embraced the **CARE principles** developed by the Global Indigenous Data Alliance. These principles encourage open data movements to consider the communities involved in data gathering and sharing by providing for:
- **Collective Benefit** -- or building data systems whose design allows communities to derive benefit from the data gathered about them
- **Authority to Control** -- which empowers communities to control their own data and determine its appropriate use
- **Responsibility** -- ensuring that researchers are accountable to the communities they work with
- **Ethics** -- making certain that the rights and well-being of people involved in scientific research are of primary concern.
- Our STEM education colleagues sought to bring the FAIR and CARE principles to bear in developing the open systems supporting their work as part of a larger goal: broadening participation in science and transforming the research enterprise through the embrace of an "ours, not mine" view of access, expertise, resources, and power, thus instantiating deep collaboration as a norm. These goals fully aligned with those espoused by the Knowledge Commons team, allowing us to begin thinking about how our platform might be enhanced to serve their needs.
- Those necessary enhancements encouraged us to take a hard look at our existing repository, which we had built in Fedora; while it served the humanities community well enough by allowing for the deposit and sharing of a wide range of document types, a more robust, interoperable repository infrastructure was necessary if we were to make the platform appealing and useful across all disciplines.
- With generous support from the inaugural round of NSF FAIROS RCN grants in 2022, we built STEM Ed+ Commons, and in the process rebuilt our repository in **InvenioRDM**, the platform developed by CERN as an abstracted, self-hosted version of the software supporting Zenodo. InvenioRDM is an open-source platform allowing for the development of turn-key research data management repositories, and boasts a robust community of developers working across a wide range of institutions and research organizations around the world.
- Our work was led by our repository developer, Ian Scott, who connected with and learned from the Invenio community, and who is now an active contributor to that community's ongoing work. We did some significant development on top of Invenio in order to create a more user-friendly metadata gathering and deposit flow as well connecting the repository to the rest of the Knowledge Commons network.
- The result is **KCWorks**, a next-generation repository hosting a wide range of research outputs from scholars across the disciplines and around the world. KCWorks boasts several key features, including:
- a user-friendly deposit flow that encourages users to provide key metadata about their work by breaking up the process into bite-size stages that describe the purposes of the information being requested;
- the ability to deposit up to 100 files as a single project, with a very generous 500GB size limit per record that can be overridden on request;
- robust use of persistent identifiers, including ORCIDs for contributors, DOIs for objects, and RORs (research organization registry) for institutions;
- deposit versioning, with versioned DOIs registered by DataCite;
- more than 70 contributor roles, extending far beyond the CRediT contributor role taxonomy to embrace and acknowledge research participation at every level;
- a wide range of user-selectable licenses applicable to data, documents, and software;
- granular access restrictions at both the record- and the file-level;
- a powerful viewer allowing many files to be read or streamed within the item record;
- COUNTER-compliant item-level analytics for record access and downloads, including both item- and version-level statistics;
- and machine-generated citations in a range of formats.
- KCWorks also provides both individual and institutional users the ability to create collections, gathering deposits in highly customized ways -- by institution, by subfield, by research group, by publication, or as personalized user-defined playlists.
- KCWorks makes use of the **FAST taxonomy**, or Faceted Application of Subject Terminology, which was developed by OCLC based on the Library of Congress Subject Headings, allowing depositors to select topical, geographic, and chronological headings independently of one another. We also allow for robust user-defined keyword creation.
- KCWorks is highly interoperable thanks to its strong **REST API** that connects with all repository operations and its built-in **OAI-PMH** server allowing the repository's metadata to be readily consumed. Upon deposit, record information is pushed both to the user's Knowledge Commons profile and to their ORCID profile, increasing research discoverability.
- We're extremely proud of KCWorks, which today contains over **35,000 deposits in 96 languages** using a wide range of character sets. We were particularly delighted back in January of this year to announce that KCWorks had been selected as the officially designated public access repository of the National Endowment for the Humanities. (Alas, changes at the NEH led to that contract being terminated for convenience in April.)
- But there's much more work ahead of us, both for the network as a whole and for our STEM Ed+ Commons users in particular.
- In the process of our focus groups and user interviews, we heard a lot of frustration about conventional journal-based publishing processes and the bottlenecks they create for researchers in sharing their work.
- especially when those users work in and with communities outside the academy
- open access journals help researchers who might not be affiliated with research universities or have access to research libraries read the latest work in their fields, but the business models that commercial publishers have created around open access -- whether involving APCs or so-called "transformative agreements" -- often prevent many researchers from contributing to that body of knowledge
- moreover the standard structures of peer review, while a crucial part of determining the body of accepted knowledge, results in a serious bottleneck, as the growing number of publications needing review has utterly swamped the available reviewer labor pool, producing at times inordinate delays before publication -- and even more, traditionally anonymized processes keeps peer review a conversation between editors and reviewers, preventing review from serving as a collaborative process that might best support the work's development
- and, finally, commercial control of research dissemination has enabled those same companies to hoover up enormously important data about research, now packaged into Current Research Information Systems that further extract value from research institutions through expensive licenses, with a dearth of open source alternatives
- In our new project, funded in the most recent round of FAIROS grants, we propose to remove these bottlenecks in research dissemination by building a workflow that disentangles its various phases, producing instead a Publish-Review-Curate-Assess model that will allow scientific communication to proceed more collaboratively, more ethically, and more fluidly.
- In this model, KCWorks will become the primary locus of **publication**, as researchers deposit what are still somewhat anachronistically referred to as "pre-prints" alongside data and other research outputs.
- We will then connect KCWorks to a range of **review** and curation tools and communities, using the communication protocol developed by the Confederation of Open Access Repositories, [COAR Notify](https://coar-notify.net). This will allow researchers to request peer review from communities such as Peer Community In and PreReview, and through platforms like Pilcrow, which supports collaborative community review. COAR Notify will also allow researchers to submit deposited work for consideration by editors of a range of journals and other collections, and will allow those editors to suggest submission of work they find interesting. These publications can then be **curated** into overlay publications that provide endorsement for the work based on the results of editorial and/or peer review.
- Both submission for peer review and submission for journal consideration will likely result in changes designed to improve the work, and subsequent versions can then be added to the deposit, each receiving a versioned DOI, with the top-level DOI always pointing to the most current version. In this way, we can finally reach the point at which scientific communication can let go of the notion of the "version of record," and instead emphasize the "record of versions," allowing the process of development to become a visible part of the work
- Additionally, the labor involved in both peer review and editorial work can be properly credited alongside that of researchers and authors, creating both a more ethical recognition of the key roles that that review and editing play in the production of research products as well as a mechanism by which those contributions might be taken into account as part of a scientist's record of "work"
- That takes us through **the Publish-Review-Curate model**, which has been explored in a range of open science communities, including [PLOS Biology](https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3000116), [ASAPbio](https://asapbio.org/understanding-the-publish-review-curate-prc-model-of-scholarly-communication/), [MetaROR](https://cms.metaror.org/publish-review-curate/), and [eLife](https://elifesciences.org/inside-elife/dc24a9cd/open-science-what-is-publish-review-curate). Publish-Review-Curate has been implemented as the research dissemination model for [Open Research Europe](https://open-research-europe.ec.europa.eu).
- However, we are adding a fourth stage to this workflow, **Assess**, which recognizes the need to close the gap between the spaces online where researchers communicate their work and the spaces where they are able to track and report on its impact.
- Currently, KCWorks provides analytics on item-level views and downloads, with counts provided both for a specific version of a deposit as well as aggregated counts across all versions. Our statistics are **COUNTER**-compliant, and we anonymize all visitor-level information. We also filter out robot requests from our counts, including only human requests and human-initiated machine requests.
- We want, however, to provide depositors with as much useful information about the impact of their work as possible, without violating visitor privacy. We will thus be releasing in the next few weeks a new analytics dashboard for both users and groups, providing aggregated information about deposits. On this dashboard we'll be able to show which deposits in a given group have obtained the most traction, how their use has developed over time, as well as the country-level origins of view and download requests; we'll also show information about work in a collection by item type, by license, and more.
- In addition to these machine-generated analytics, however, we want to think about other means of demonstrating the impact of work in the repository. This might include the kinds of usage information provided by Altmetrics, which tracks links to work from social media and other referrers across the internet, but it also might include more bespoke indicators: comments left in a linked discussion, instructor-reported use of materials in course syllabi, and more.
- We intend to work with the **HumetricsHSS** team in order to explore the kinds of values-enacted, humane metrics that might most benefit researchers as they report on their work and design means of gathering and reporting on them.
- Alongside this project, however, we are also working on two other crucial issues: **trust** and **sustainability**. It's not necessary for me to tell you that we are living in a moment in which trust in science is more challenged than it has been in recent memory. Some of these challenges are coming, as the joke has it, from inside the house: the ongoing reproducibility crisis, coupled with evidence of varying kinds of researcher malpractice have created understandable concerns about the integrity of scientific work. Some of these challenges derive from the world around us, however, as misunderstandings of the motivations of scientists and ideological conflicts surrounding inconvenient research combine to produce widespread dismissals of the knowledge produced through scientific research. And worse: there are growing concerns world-wide that politicians might interfere with scientific research or censor its results in highly damaging ways.
- Researchers need to be confident that the work that they make available through our platform will remain accessible over the long term, without interference that could come in the form of technological decay, or in the form of bad actors seeking to alter the scientific record. Our platform includes a range of automated checks for both file and data integrity, but we also want to find ways to ensure that our servers and the work they share are protected from human interference, in order to provide a solid basis for trust in the scientific record.
- We also need to ensure that we can afford to continue to make Knowledge Commons openly and freely available to researchers and interested members of the public. To that end we have recently launched KCWorks for Institutions, a hosted service that provides colleges and universities with a white-label portal for KCWorks, through which their communities can deposit and share their research, with an institutional dashboard tracking its impact, while the work remains discoverable through the repository as a whole. We're working with a pilot institution to migrate their collection out of a commercial system this fall, and hope to onboard several more institutions in the months ahead. As we're fond of saying, we're a better value -- definitely less expensive for institutions than our commercial competitors -- but we also offer better values, including our commitment to community governance.
- But it's a hard moment to try to implement a new sustainability model relying on investment from institutions of higher education. And so I'm going to end here with a word of thanks for the NSF's ongoing support of this project; without this federal investment in FAIR Open Science, our project would quite literally not exist.
- Thank you for your time; I'll look forward to hearing your questions.
## thank you
---
Kathleen Fitzpatrick // @kfitz@hcommons.social // kfitz@msu.edu
Note: Many thanks.