This is the fourth in a series of blog posts in which CCC shares our analysis, undertaken with Media Growth Strategies, of metadata management across each stage of the research lifecycle with the scholarly community to spark dialogue and drive action. Our first blog in this series looked at the metadata challenges that researchers and other stakeholders face during the idea development and proposal preparation stages. Our second blog examined the interactions between researchers and funders during the proposal submission stage and our third blog focused on the challenges stakeholders, particularly researchers, face during the research and authoring stage.
In this blog, we will focus on the publication and preservation stages. The most pressing challenge surfaces in the publication stage when the researcher is ready to submit their article to a journal. As we continue to hear from the community, when the burden of data entry is put on the researcher, major issues can arise, such as mistaken OA funding eligibility and funding compliance issues. Let’s focus on this specific challenge in the publication stage as well as explore some of the cross-stakeholder challenges faced post-publication during article and data preservation.   Â
Researchers opt out of OA due to affordability concerns and lack of understanding of their funding opportunities.
When submitting an article to a journal through the publisher’s submission system, much of the bibliographic, author, and funding data can be extracted from the manuscript, which is a major efficiency. However, gaps in the data may remain due to overlooked submission guidelines or data pulled from outdated author profiles. In these cases, data collection or confirmation is left up to the submitting researcher, who may not have access to or understand the value of providing the data requested within an OA context. This over-reliance on authors leads to a lack of consistent, reliable data when checking for OA funding entitlements, including but not limited to granular, accurate organizational affiliation identifiers for a manuscript. This error-prone data capture of corresponding author affiliation and incomplete funding details can result in the paper missing the opportunity to get OA funding upon acceptance and lead to authors paying one-off APCs who are otherwise eligible for pre-paid deals or discounts. It can also result in authors opting out of OA publication because they don’t believe they have the funds to move forward down that path. As a result, OA initiatives driven by institutions and funders may lack uptake.
Institutional deals require granular metadata to accurately determine affiliation information for funding eligibility.
For both institutions that fund OA publication and the publishers with whom they have agreements, a lack of standardized article metadata at submission has implications across editorial, OA, and production systems that can lead to costly, manual work. For example, OA funding decisions cannot be based on abbreviations or free-form data, especially when certain entities within an institutional organization are excluded from a deal (e.g., hospitals). If the data manually inputted by the author doesn’t use a standardized name or PID (e.g., abbreviations, nicknames), this can interfere with matching to the correct institution ID. In addition, using email addresses for affiliation identification can impact funding entitlements, especially if the email account is old; the researcher has multiple affiliations; or a personal email account is used. When complete data is unavailable, institutions and publishers must manually find papers that should have matched to an agreement and collaborate on a resolution.
For funders, having proper funder/grant affiliation information is essential to production workflows to ensure compliance. There is currently a limited ability to systematically provide correct and up-to-date information on funding requirements to help authors comply with different mandates.
Publishers manually update publication records for completeness, incurring high costs.
In our research, we found that as the publisher prepares the article for publication, they are sometimes manually entering PIDs before registering DOIs to produce a more complete publication record that enables long-term preservation and discoverability of the scholarly work. This is a laborious practice with high economic and opportunity costs that could be reduced with earlier, automated PID assertion or validation. Â
Guide to Metadata Management Across the Research Lifecycle
 Interested in seeing the full research report? A key artifact CCC developed by leveraging the data and insights we gained from this study is the State of Scholarly Metadata interactive report. This report guides you through metadata management—highlighting the challenges, related impacts, and key decision points.
In our next blog post, the last in our series, we discuss the metadata challenges faced by researchers, institutions, funders, and publishers during the Reuse & Measurement stage. To learn more, visit The State of Scholarly Metadata where we also invite you to provide your input through the feedback function. Â