rss_2.0Journal of Data and Information Science FeedSciendo RSS Feed for Journal of Data and Information Science of Data and Information Science Feed evolution of metal organic frameworks: A scientometric approach with human-in-the-loop<abstract> <title style='display:none'>Abstract</title> <sec> <title style='display:none'>Purpose</title> <p>This paper reports on a scientometric analysis bolstered by human-in-the-loop, domain experts, to examine the field of metal-organic frameworks (MOFs) research. Scientometric analyses reveal the intellectual landscape of a field. The study engaged MOF scientists in the design and review of our research workflow. MOF materials are an essential component in next-generation renewable energy storage and biomedical technologies. The research approach demonstrates how engaging experts, via human-in-the-loop processes, can help develop a comprehensive view of a field’s research trends, influential works, and specialized topics.</p> </sec> <sec> <title style='display:none'>Design/methodology/approach</title> <p>A scientometric analysis was conducted, integrating natural language processing (NLP), topic modeling, and network analysis methods. The analytical approach was enhanced through a human-in-the-loop iterative process involving MOF research scientists at selected intervals. MOF researcher feedback was incorporated into our method. The data sample included 65,209 MOF research articles. Python3 and software tool <italic>VOSviewer</italic> were used to perform the analysis.</p> </sec> <sec> <title style='display:none'>Findings</title> <p>The findings demonstrate the value of including domain experts in research workflows, refinement, and interpretation of results. At each stage of the analysis, the MOF researchers contributed to interpreting the results and method refinements targeting our focus on MOF research. This study identified influential works and their themes. Our findings also underscore four main MOF research directions and applications.</p> </sec> <sec> <title style='display:none'>Research limitations</title> <p>This study is limited by the sample (articles identified and referenced by the Cambridge Structural Database) that informed our analysis.</p> </sec> <sec> <title style='display:none'>Practical implications</title> <p>Our findings contribute to addressing the current gap in fully mapping out the comprehensive landscape of MOF research. Additionally, the results will help domain scientists target future research directions.</p> </sec> <sec> <title style='display:none'>Originality/value</title> <p>To the best of our knowledge, the number of publications collected for analysis exceeds those of previous studies. This enabled us to explore a more extensive body of MOF research compared to previous studies. Another contribution of our work is the iterative engagement of domain scientists, who brought in-depth, expert interpretation to the data analysis, helping hone the study.</p> </sec> </abstract>ARTICLEtrue academic institutions based on the productivity, impact, and quality of institutional scholars<abstract> <title style='display:none'>Abstract</title> <sec> <title style='display:none'>Purpose</title> <p>The quantitative rankings of over 55,000 institutions and their institutional programs are based on the individual rankings of approximately 30 million scholars determined by their productivity, impact, and quality.</p> </sec> <sec> <title style='display:none'>Design/methodology/approach</title> <p>The institutional ranking process developed here considers all institutions in all countries and regions, thereby including those that are established, as well as those that are emerging in scholarly prowess. Rankings of individual scholars worldwide are first generated using the recently introduced, fully indexed ScholarGPS database. The rankings of individual scholars are extended here to determine the lifetime and last-five-year Top 20 rankings of academic institutions over all Fields of scholarly endeavor, in 14 individual Fields, in 177 Disciplines, and in approximately 350,000 unique Specialties. Rankings associated with five specific Fields (Medicine, Engineering &amp; Computer Science, Life Sciences, Physical Sciences &amp; Mathematics, and Social Sciences), and in two Disciplines (Chemistry, and Electrical &amp; Computer Engineering) are presented as examples, and changes in the rankings over time are discussed.</p> </sec> <sec> <title style='display:none'>Findings</title> <p>For the Fields considered here, the Top 20 institutional rankings in Medicine have undergone the least change (lifetime versus last five years), while the rankings in Engineering &amp; Computer Science have exhibited significant change. The evolution of institutional rankings over time is largely attributed to the recent emergence of Chinese academic institutions, although this emergence is shown to be highly Field- and Discipline-dependent.</p> </sec> <sec> <title style='display:none'>Research limitations</title> <p>The ScholarGPS database used here ranks institutions in the categories of: (i) all Fields, (ii) in 14 individual Fields, (iii) in 177 Disciplines, and (iv) in approximately 350,000 unique Specialties. A comprehensive investigation covering all categories is not practical.</p> </sec> <sec> <title style='display:none'>Practical implementations</title> <p>Existing rankings of academic institutions have: (i) often been restricted to pre-selected institutions, clouding the potential discovery of scholarly activity in emerging institutions and countries; (ii) considered only broad areas of research, limiting the ability of university leadership to act on the assessments in a concrete manner, or in contrast; (iii) have considered only a narrow area of research for comparison, diminishing the broader applicability and impact of the assessment. In general, existing institutional rankings depend on which institutions are included in the ranking process, which areas of research are considered, the breadth (or granularity) of the research areas of interest, and the methodologies used to define and quantify research performance. In contrast, the methods presented here can provide important data over a broad range of granularity to allow responsible individuals to gauge the performance of any institution from the Overall (all Fields) level, to the level of the Specialty. The methods may also assist identification of the root causes of shifts in institution rankings, and how these shifts vary across hundreds of thousands of Fields, Disciplines, and Specialties of scholarly endeavor.</p> </sec> <sec> <title style='display:none'>Originality/value</title> <p>This study provides the first ranking of all academic institutions worldwide over Fields, Disciplines, and Specialties based on a unique methodology that quantifies the productivity, impact, and quality of individual scholars.</p> </sec> </abstract>ARTICLEtrue direct and indirect impact on technology and policy of transformative research via ego citation network<abstract> <title style='display:none'>Abstract</title> <sec> <title style='display:none'>Purpose</title> <p>The disseminating of academic knowledge to nonacademic audiences partly relies on the transition of subsequent citing papers. This study aims to investigate direct and indirect impact on technology and policy originating from transformative research based on ego citation network.</p> </sec> <sec> <title style='display:none'>Design/methodology/approach</title> <p>Key Nobel Prize-winning publications (NPs) in fields of gene engineering and astrophysics are regarded as a proxy for transformative research. In this contribution, we introduce a network-structural indicator of citing patents to measure technological impact of a target article and use policy citations as a preliminary tool for policy impact.</p> </sec> <sec> <title style='display:none'>Findings</title> <p>The results show that the impact on technology and policy of NPs are higher than that of their subsequent citation generations in gene engineering but not in astrophysics.</p> </sec> <sec> <title style='display:none'>Research limitations</title> <p>The selection of Nobel Prizes is not balanced and the database used in this study, <italic>Dimensions</italic>, suffers from incompleteness and inaccuracy of citation links.</p> </sec> <sec> <title style='display:none'>Practical implications</title> <p>Our findings provide useful clues to better understand the characteristics of transformative research in technological and policy impact.</p> </sec> <sec> <title style='display:none'>Originality/value</title> <p>This study proposes a new framework to explore the direct and indirect impact on technology and policy originating from transformative research.</p> </sec> </abstract>ARTICLEtrue Unique citing documents Journal Impact Factor (Uniq-JIF) as a supplement for the standard Journal Impact Factor LLM-assisted writing in scientific communication: Are we there yet?<abstract> <title style='display:none'>Abstract</title> <p>Large Language Models (LLMs), exemplified by ChatGPT, have significantly reshaped text generation, particularly in the realm of writing assistance. While ethical considerations underscore the importance of transparently acknowledging LLM use, especially in scientific communication, genuine acknowledgment remains infrequent. A potential avenue to encourage accurate acknowledging of LLM-assisted writing involves employing automated detectors. Our evaluation of four cutting-edge LLM-generated text detectors reveals their suboptimal performance compared to a simple ad-hoc detector designed to identify abrupt writing style changes around the time of LLM proliferation. We contend that the development of specialized detectors exclusively dedicated to LLM-assisted writing detection is necessary. Such detectors could play a crucial role in fostering more authentic recognition of LLM involvement in scientific communication, addressing the current challenges in acknowledgment practices.</p> </abstract>ARTICLEtrue quantitative study of disruptive technology policy texts: An example of China’s artificial intelligence policy<abstract> <title style='display:none'>Abstract</title> <sec> <title style='display:none'>Purpose</title> <p>The transformative impact of disruptive technologies on the restructuring of the times has attracted widespread global attention. This study aims to analyze the characteristics and shortcomings of China’s artificial intelligence (AI) disruptive technology policy, and to put forward suggestions for optimizing China’s AI disruptive technology policy.</p> </sec> <sec> <title style='display:none'>Design/methodology/approach</title> <p>Develop a three-dimensional analytical framework for “policy tools-policy actors-policy themes” and apply policy tools, social network analysis, and LDA topic model to conduct a comprehensive analysis of the utilization of policy tools, cooperative relationships among policy actors, and the trends in policy theme settings within China’s innovative AI technology policy.</p> </sec> <sec> <title style='display:none'>Findings</title> <p>We find that the collaborative relationship among the policy actors of AI disruptive technology in China is insufficiently close. Marginal subjects exhibit low participation in the cooperation network and overly rely on central subjects, forming a “center-periphery” network structure. Policy tool usage is predominantly focused on supply and environmental types, with a severe inadequacy in demand-side policy tool utilization. Policy themes are diverse, encompassing topics such as “Intelligent Services” “Talent Cultivation” “Information Security” and “Technological Innovation”, which will remain focal points. Under the themes of “Intelligent Services” and “Intelligent Governance”, policy tool usage is relatively balanced, with close collaboration among policy entities. However, the theme of “AI Theoretical System” lacks a comprehensive understanding of tool usage and necessitates enhanced cooperation with other policy entities.</p> </sec> <sec> <title style='display:none'>Research limitations</title> <p>The data sources and experimental scope are subject to certain limitations, potentially introducing biases and imperfections into the research results, necessitating further validation and refinement.</p> </sec> <sec> <title style='display:none'>Practical implications</title> <p>The study introduces a three-dimensional analysis framework for disruptive technology policy texts, which is significant for formulating and enhancing disruptive technology policies.</p> </sec> <sec> <title style='display:none'>Originality/value</title> <p>This study utilizes text mining and content analysis techniques to quantitatively analyze disruptive technology policy texts. It systematically evaluates China’s AI policies quantitatively, focusing on policy tools, policy actors, policy themes. The study uncovers the characteristics and deficiencies of current AI policies, offering recommendations for formulating and enhancing disruptive technology policies.</p> </sec> </abstract>ARTICLEtrue authorship: Analyzing contributions in and the challenges of appropriate attribution<abstract> <title style='display:none'>Abstract</title> <sec> <title style='display:none'>Purpose</title> <p>This study aims to evaluate the accuracy of authorship attributions in scientific publications, focusing on the fairness and precision of individual contributions within academic works.</p> </sec> <sec> <title style='display:none'>Design/methodology/approach</title> <p id="P">The study analyzes 81,823 publications from the journal <italic>PLOS ONE</italic>, covering the period from January 2018 to June 2023. It examines the authorship attributions within these publications to try and determine the prevalence of inappropriate authorship. It also investigates the demographic and professional profiles of affected authors, exploring trends and potential factors contributing to inaccuracies in authorship.</p> </sec> <sec> <title style='display:none'>Findings</title> <p>Surprisingly, 9.14% of articles feature at least one author with inappropriate authorship, affecting over 14,000 individuals (2.56% of the sample). Inappropriate authorship is more concentrated in Asia, Africa, and specific European countries like Italy. Established researchers with significant publication records and those affiliated with companies or nonprofits show higher instances of potential monetary authorship.</p> </sec> <sec> <title style='display:none'>Research limitations</title> <p>Our findings are based on contributions as declared by the authors, which implies a degree of trust in their transparency. However, this reliance on self-reporting may introduce biases or inaccuracies into the dataset. Further research could employ additional verification methods to enhance the reliability of the findings.</p> </sec> <sec> <title style='display:none'>Practical implications</title> <p>These findings have significant implications for journal publishers, highlighting the necessity for robust control mechanisms to ensure the integrity of authorship attributions. Moreover, researchers must exercise discernment in determining when to acknowledge a contributor and when to include them in the author list. Addressing these issues is crucial for maintaining the credibility and fairness of academic publications.</p> </sec> <sec> <title style='display:none'>Originality/value</title> <p>This study contributes to an understanding of critical issues within academic authorship, shedding light on the prevalence and impact of inappropriate authorship attributions. By calling for a nuanced approach to ensure accurate credit is given where it is due, the study underscores the importance of upholding ethical standards in scholarly publishing.</p> </sec> </abstract>ARTICLEtrue evaluation of seven multi-label classification methods on real-world patent and publication datasets<abstract> <title style='display:none'>Abstract</title> <sec> <title style='display:none'>Purpose</title> <p>Many science, technology and innovation (STI) resources are attached with several different labels. To assign automatically the resulting labels to an interested instance, many approaches with good performance on the benchmark datasets have been proposed for multilabel classification task in the literature. Furthermore, several open-source tools implementing these approaches have also been developed. However, the characteristics of real-world multilabel patent and publication datasets are not completely in line with those of benchmark ones. Therefore, the main purpose of this paper is to evaluate comprehensively seven multi-label classification methods on real-world datasets.</p> </sec> <sec> <title style='display:none'>Design/methodology/approach</title> <p>Three real-world datasets (Biological-Sciences, Health-Sciences, and USPTO) from SciGraph and USPTO database are constructed. Seven multilabel classification methods with tuned parameters (dependency-LDA, ML<italic>k</italic>NN, LabelPowerset, RA<italic>k</italic>EL, TextCNN, TexRNN, and TextRCNN) are comprehensively compared on these three real-world datasets. To evaluate the performance, the study adopts three classification-based metrics: Macro-F1, Micro-F1, and Hamming Loss.</p> </sec> <sec> <title style='display:none'>Findings</title> <p>The TextCNN and TextRCNN models show obvious superiority on small-scale datasets with more complex hierarchical structure of labels and more balanced documentlabel distribution in terms of macro-F1, micro-F1 and Hamming Loss. The ML<italic>k</italic>NN method works better on the larger-scale dataset with more unbalanced document-label distribution.</p> </sec> <sec> <title style='display:none'>Research limitations</title> <p>Three real-world datasets differ in the following aspects: statement, data quality, and purposes. Additionally, open-source tools designed for multi-label classification also have intrinsic differences in their approaches for data processing and feature selection, which in turn impacts the performance of a multi-label classification approach. In the near future, we will enhance experimental precision and reinforce the validity of conclusions by employing more rigorous control over variables through introducing expanded parameter settings.</p> </sec> <sec> <title style='display:none'>Practical implications</title> <p>The observed Macro F1 and Micro F1 scores on real-world datasets typically fall short of those achieved on benchmark datasets, underscoring the complexity of real-world multi-label classification tasks. Approaches leveraging deep learning techniques offer promising solutions by accommodating the hierarchical relationships and interdependencies among labels. With ongoing enhancements in deep learning algorithms and large-scale models, it is expected that the efficacy of multi-label classification tasks will be significantly improved, reaching a level of practical utility in the foreseeable future.</p> </sec> <sec> <title style='display:none'>Originality/value</title> <p>(1) Seven multi-label classification methods are comprehensively compared on three real-world datasets. (2) The TextCNN and TextRCNN models perform better on small-scale datasets with more complex hierarchical structure of labels and more balanced document-label distribution. (3) The ML<italic>k</italic>NN method works better on the larger-scale dataset with more unbalanced document-label distribution.</p> </sec> </abstract>ARTICLEtrue ChatGPT evaluate research quality?<abstract> <title style='display:none'>Abstract</title> <sec> <title style='display:none'>Purpose</title> <p>Assess whether ChatGPT 4.0 is accurate enough to perform research evaluations on journal articles to automate this time-consuming task.</p> </sec> <sec> <title style='display:none'>Design/methodology/approach</title> <p>Test the extent to which ChatGPT-4 can assess the quality of journal articles using a case study of the published scoring guidelines of the UK Research Excellence Framework (REF) 2021 to create a research evaluation ChatGPT. This was applied to 51 of my own articles and compared against my own quality judgements.</p> </sec> <sec> <title style='display:none'>Findings</title> <p>ChatGPT-4 can produce plausible document summaries and quality evaluation rationales that match the REF criteria. Its overall scores have weak correlations with my self-evaluation scores of the same documents (averaging r=0.281 over 15 iterations, with 8 being statistically significantly different from 0). In contrast, the average scores from the 15 iterations produced a statistically significant positive correlation of 0.509. Thus, averaging scores from multiple ChatGPT-4 rounds seems more effective than individual scores. The positive correlation may be due to ChatGPT being able to extract the author’s significance, rigour, and originality claims from inside each paper. If my weakest articles are removed, then the correlation with average scores (r=0.200) falls below statistical significance, suggesting that ChatGPT struggles to make fine-grained evaluations.</p> </sec> <sec> <title style='display:none'>Research limitations</title> <p>The data is self-evaluations of a convenience sample of articles from one academic in one field.</p> </sec> <sec> <title style='display:none'>Practical implications</title> <p>Overall, ChatGPT does not yet seem to be accurate enough to be trusted for any formal or informal research quality evaluation tasks. Research evaluators, including journal editors, should therefore take steps to control its use.</p> </sec> <sec> <title style='display:none'>Originality/value</title> <p>This is the first published attempt at post-publication expert review accuracy testing for ChatGPT.</p> </sec> </abstract>ARTICLEtrue comparative study on characteristics of retracted publications across different open access levels<abstract> <title style='display:none'>Abstract</title> <sec> <title style='display:none'>Purpose</title> <p>Recently, global science has shown an increasing open trend, however, the characteristics of research integrity of open access (OA) publications have rarely been studied. The aim of this study is to compare the characteristics of retracted articles across different OA levels and discover whether OA level influences the characteristics of retracted articles.</p> </sec> <sec> <title style='display:none'>Design/methodology/approach</title> <p>The research conducted an analysis of 6,005 retracted publications between 2001 and 2020 from the Web of Science and Retraction Watch databases. These publications were categorized based on their OA levels, including Gold OA, Green OA, and non-OA. The study explored retraction rates, time lags and reasons within these categories.</p> </sec> <sec> <title style='display:none'>Findings</title> <p>The findings of this research revealed distinct patterns in retraction rates among different OA levels. Publications with Gold OA demonstrated the highest retraction rate, followed by Green OA and non-OA. A comparison of retraction reasons between Gold OA and non-OA categories indicated similar proportions, while Green OA exhibited a higher proportion due to falsification and manipulation issues, along with a lower occurrence of plagiarism and authorship issues. The retraction time lag was shortest for Gold OA, followed by non-OA, and longest for Green OA. The prolonged retraction time for Green OA could be attributed to an atypical distribution of retraction reasons.</p> </sec> <sec> <title style='display:none'>Research limitations</title> <p>There is no exploration of a wider range of OA levels, such as Hybrid OA and Bronze OA.</p> </sec> <sec> <title style='display:none'>Practical implications</title> <p>The outcomes of this study suggest the need for increased attention to research integrity within the OA publications. The occurrences of falsification, manipulation, and ethical concerns within Green OA publications warrant attention from the scientific community.</p> </sec> <sec> <title style='display:none'>Originality/value</title> <p>This study contributes to the understanding of research integrity in the realm of OA publications, shedding light on retraction patterns and reasons across different OA levels.</p> </sec> </abstract>ARTICLEtrue an integrated platform of retracted papers and concerned papers<abstract> <title style='display:none'>Abstract</title> <sec> <title style='display:none'>Purpose</title> <p>The notable increase in retraction papers has attracted considerable attention from diverse stakeholders. Various sources are now offering information related to research integrity, including concerns voiced on social media, disclosed lists of paper mills, and retraction notices accessible through journal websites. However, despite the availability of such resources, there remains a lack of a unified platform to consolidate this information, thereby hindering efficient searching and cross-referencing. Thus, it is imperative to develop a comprehensive platform for retracted papers and related concerns. This article aims to introduce “Amend,” a platform designed to integrate information on research integrity from diverse sources.</p> </sec> <sec> <title style='display:none'>Design/methodology/approach</title> <p>The Amend platform consolidates concerns and lists of problematic articles sourced from social media platforms (e.g., PubPeer, For Better Science), retraction notices from journal websites, and citation databases (e.g., Web of Science, CrossRef). Moreover, Amend includes investigation and punishment announcements released by administrative agencies (e.g., NSFC, MOE, MOST, CAS). Each related paper is marked and can be traced back to its information source via a provided link. Furthermore, the Amend database incorporates various attributes of retracted articles, including citation topics, funding details, open access status, and more. The reasons for retraction are identified and classified as either academic misconduct or honest errors, with detailed subcategories provided for further clarity.</p> </sec> <sec> <title style='display:none'>Findings</title> <p>Within the Amend platform, a total of 32,515 retracted papers indexed in SCI, SSCI, and ESCI between 1980 and 2023 were identified. Of these, 26,620 (81.87%) were associated with academic misconduct. The retraction rate stands at 6.64 per 10,000 articles. Notably, the retraction rate for non-gold open access articles significantly differs from that for gold open access articles, with this disparity progressively widening over the years. Furthermore, the reasons for retractions have shifted from traditional individual behaviors like falsification, fabrication, plagiarism, and duplication to more organized large-scale fraudulent practices, including Paper Mills, Fake Peer-review, and Artificial Intelligence Generated Content (AIGC).</p> </sec> <sec> <title style='display:none'>Research limitations</title> <p>The Amend platform may not fully capture all retracted and concerning papers, thereby impacting its comprehensiveness. Additionally, inaccuracies in retraction notices may lead to errors in tagged reasons.</p> </sec> <sec> <title style='display:none'>Practical implications</title> <p>Amend provides an integrated platform for stakeholders to enhance monitoring, analysis, and research on academic misconduct issues. Ultimately, the Amend database can contribute to upholding scientific integrity.</p> </sec> <sec> <title style='display:none'>Originality/value</title> <p>This study introduces a globally integrated platform for retracted and concerning papers, along with a preliminary analysis of the evolutionary trends in retracted papers.</p> </sec> </abstract>ARTICLEtrue roles of research data infrastructure in research paradigm evolution<abstract> <title style='display:none'>Abstract</title> <p>Research data infrastructures form the cornerstone in both cyber and physical spaces, driving the progression of the data-intensive scientific research paradigm. This opinion paper presents an overview of global research data infrastructure, drawing insights from national roadmaps and strategic documents related to research data infrastructure. It emphasizes the pivotal role of research data infrastructures by delineating four new missions aimed at positioning them at the core of the current scientific research and communication ecosystem. The four new missions of research data infrastructures are: (1) as a pioneer, to transcend the disciplinary border and address complex, cutting-edge scientific and social challenges with problem- and data-oriented insights; (2) as an architect, to establish a digital, intelligent, flexible research and knowledge services environment; (3) as a platform, to foster the high-end academic communication; (4) as a coordinator, to balance scientific openness with ethics needs.</p> </abstract>ARTICLEtrue laws of funding for scientific citations: how citations change in funded and unfunded research between basic and applied sciences<abstract> <title style='display:none'>Abstract</title> <sec> <title style='display:none'>Purpose</title> <p>The goal of this study is to analyze the relationship between funded and unfunded papers and their citations in both basic and applied sciences.</p> </sec> <sec> <title style='display:none'>Design/methodology/approach</title> <p>A power law model analyzes the relationship between research funding and citations of papers using 831,337 documents recorded in the Web of Science database.</p> </sec> <sec> <title style='display:none'>Findings</title> <p>The original results reveal general characteristics of the diffusion of science in research fields: a) Funded articles receive higher citations compared to unfunded papers in journals; b) Funded articles exhibit a super-linear growth in citations, surpassing the increase seen in unfunded articles. This finding reveals a higher diffusion of scientific knowledge in funded articles. Moreover, c) funded articles in both basic and applied sciences demonstrate a similar expected change in citations, equivalent to about 1.23%, when the number of funded papers increases by 1% in journals. This result suggests, for the first time, that funding effect of scientific research is an invariant driver, irrespective of the nature of the basic or applied sciences.</p> </sec> <sec> <title style='display:none'>Originality/value</title> <p>This evidence suggests empirical laws of funding for scientific citations that explain the importance of robust funding mechanisms for achieving impactful research outcomes in science and society. These findings here also highlight that funding for scientific research is a critical driving force in supporting citations and the dissemination of scientific knowledge in recorded documents in both basic and applied sciences.</p> </sec> <sec> <title style='display:none'>Practical implications</title> <p>This comprehensive result provides a holistic view of the relationship between funding and citation performance in science to guide policymakers and R&amp;D managers with science policies by directing funding to research in promoting the scientific development and higher diffusion of results for the progress of human society.</p> </sec> </abstract>ARTICLEtrue explorative study on document type assignment of review articles in Web of Science, Scopus and journals’ websites<abstract> <title style='display:none'>Abstract</title> <sec> <title style='display:none'>Purpose</title> <p>Accurately assigning the document type of review articles in citation index databases like Web of Science(WoS) and Scopus is important. This study aims to investigate the document type assignation of review articles in Web of Science, Scopus and Publisher’s websites on a large scale.</p> </sec> <sec> <title style='display:none'>Design/methodology/approach</title> <p>27,616 papers from 160 journals from 10 review journal series indexed in SCI are analyzed. The document types of these papers labeled on journals’ websites, and assigned by WoS and Scopus are retrieved and compared to determine the assigning accuracy and identify the possible reasons for wrongly assigning. For the document type labeled on the website, we further differentiate them into explicit review and implicit review based on whether the website directly indicates it is a review or not.</p> </sec> <sec> <title style='display:none'>Findings</title> <p>Overall, WoS and Scopus performed similarly, with an average precision of about 99% and recall of about 80%. However, there were some differences between WoS and Scopus across different journal series and within the same journal series. The assigning accuracy of WoS and Scopus for implicit reviews dropped significantly, especially for Scopus.</p> </sec> <sec> <title style='display:none'>Research limitations</title> <p>The document types we used as the gold standard were based on the journal websites’ labeling which were not manually validated one by one. We only studied the labeling performance for review articles published during 2017-2018 in review journals. Whether this conclusion can be extended to review articles published in non-review journals and most current situation is not very clear.</p> </sec> <sec> <title style='display:none'>Practical implications</title> <p>This study provides a reference for the accuracy of document type assigning of review articles in WoS and Scopus, and the identified pattern for assigning implicit reviews may be helpful to better labeling on websites, WoS and Scopus.</p> </sec> <sec> <title style='display:none'>Originality/value</title> <p>This study investigated the assigning accuracy of document type of reviews and identified the some patterns of wrong assignments.</p> </sec> </abstract>ARTICLEtrue funding and citations in papers of Nobel Laureates in Physics, Chemistry and Medicine, 2019-2020<abstract> <title style='display:none'>Abstract</title> <sec> <title style='display:none'>Purpose</title> <p>The goal of this study is a comparative analysis of the relation between funding (a main driver for scientific research) and citations in papers of Nobel Laureates in physics, chemistry and medicine over 2019-2020 and the same relation in these research fields as a whole.</p> </sec> <sec> <title style='display:none'>Design/Methodology/Approach</title> <p>This study utilizes a power law model to explore the relationship between research funding and citations of related papers. The study here analyzes 3,539 recorded documents by Nobel Laureates in physics, chemistry and medicine and a broader dataset of 183,016 documents related to the fields of physics, medicine, and chemistry recorded in the Web of Science database.</p> </sec> <sec> <title style='display:none'>Findings</title> <p>Results reveal that in chemistry and medicine, funded researches published in papers of Nobel Laureates have higher citations than unfunded studies published in articles; vice versa high citations of Nobel Laureates in physics are for unfunded studies published in papers. Instead, when overall data of publications and citations in physics, chemistry and medicine are analyzed, all papers based on funded researches show higher citations than unfunded ones.</p> </sec> <sec> <title style='display:none'>Originality/Value</title> <p>Results clarify the driving role of research funding for science diffusion that are systematized in general properties: a) articles concerning funded researches receive more citations than (un)funded studies published in papers of physics, chemistry and medicine sciences, generating a high Matthew effect (a higher growth of citations with the increase in the number of papers); b) research funding increases the citations of articles in fields oriented to applied research (e.g., chemistry and medicine) more than fields oriented towards basic research (e.g., physics).</p> </sec> <sec> <title style='display:none'>Practical Implications</title> <p>The results here explain some characteristics of scientific development and diffusion, highlighting the critical role of research funding in fostering citations and the expansion of scientific knowledge. This finding can support decisionmaking of policymakers and R&amp;D managers to improve the effectiveness in allocating financial resources in science policies to generate a higher positive scientific and societal impact.</p> </sec> </abstract>ARTICLEtrue new evolutional model for institutional field knowledge flow network<abstract> <title style='display:none'>Abstract</title> <sec> <title style='display:none'>Purpose</title> <p>This paper aims to address the limitations in existing research on the evolution of knowledge flow networks by proposing a meso-level institutional field knowledge flow network evolution model (IKM). The purpose is to simulate the construction process of a knowledge flow network using knowledge organizations as units and to investigate its effectiveness in replicating institutional field knowledge flow networks.</p> </sec> <sec> <title style='display:none'>Design/Methodology/Approach</title> <p>The IKM model enhances the preferential attachment and growth observed in scale-free BA networks, while incorporating three adjustment parameters to simulate the selection of connection targets and the types of nodes involved in the network evolution process Using the PageRank algorithm to calculate the significance of nodes within the knowledge flow network. To compare its performance, the BA and DMS models are also employed for simulating the network. Pearson coefficient analysis is conducted on the simulated networks generated by the IKM, BA and DMS models, as well as on the actual network.</p> </sec> <sec> <title style='display:none'>Findings</title> <p>The research findings demonstrate that the IKM model outperforms the BA and DMS models in replicating the institutional field knowledge flow network. It provides comprehensive insights into the evolution mechanism of knowledge flow networks in the scientific research realm. The model also exhibits potential applicability to other knowledge networks that involve knowledge organizations as node units.</p> </sec> <sec> <title style='display:none'>Research Limitations</title> <p>This study has some limitations. Firstly, it primarily focuses on the evolution of knowledge flow networks within the field of physics, neglecting other fields. Additionally, the analysis is based on a specific set of data, which may limit the generalizability of the findings. Future research could address these limitations by exploring knowledge flow networks in diverse fields and utilizing broader datasets.</p> </sec> <sec> <title style='display:none'>Practical Implications</title> <p>The proposed IKM model offers practical implications for the construction and analysis of knowledge flow networks within institutions. It provides a valuable tool for understanding and managing knowledge exchange between knowledge organizations. The model can aid in optimizing knowledge flow and enhancing collaboration within organizations.</p> </sec> <sec> <title style='display:none'>Originality/value</title> <p>This research highlights the significance of meso-level studies in understanding knowledge organization and its impact on knowledge flow networks. The IKM model demonstrates its effectiveness in replicating institutional field knowledge flow networks and offers practical implications for knowledge management in institutions. Moreover, the model has the potential to be applied to other knowledge networks, which are formed by knowledge organizations as node units.</p> </sec> </abstract>ARTICLEtrue structure of cross-disciplinary impact of global disciplines: A perspective of the Hierarchy of Science<abstract> <title style='display:none'>Abstract</title> <sec> <title style='display:none'>Purpose</title> <p>Interdisciplinary fields have become the driving force of modern science and a significant source of scientific innovation. However, there is still a paucity of analysis about the essential characteristics of disciplines’ cross-disciplinary impact.</p> </sec> <sec> <title style='display:none'>Design/methodology/approach</title> <p>In this study, we define cross-disciplinary impact on one discipline as its impact to other disciplines, and refer to a three-dimensional framework of variety-balance-disparity to characterize the structure of cross-disciplinary impact. The variety of cross-disciplinary impact of the discipline was defined as the proportion of the high cross-disciplinary impact publications, and the balance and disparity of cross-disciplinary impact were measured as well. To demonstrate the cross-disciplinary impact of the disciplines in science, we chose Microsoft Academic Graph (MAG) as the data source, and investigated the relationship between disciplines’ cross-disciplinary impact and their positions in the Hierarchy of Science (HOS).</p> </sec> <sec> <title style='display:none'>Findings</title> <p>Analytical results show that there is a significant correlation between the ranking of cross-disciplinary impact and the HOS structure, and that the discipline exerts a greater cross-disciplinary impact on its neighboring disciplines. Several bibliometric features that measure the hardness of a discipline, including the number of references, the number of cited disciplines, the citation distribution, and the Price index have a significant positive effect on the variety of cross-disciplinary impact. The number of references, the number of cited disciplines, and the citation distribution have significant positive and negative effects on balance and disparity, respectively. It is concluded that the less hard the discipline, the greater the cross-disciplinary impact, the higher balance and the lower disparity of cross-disciplinary impact.</p> </sec> <sec> <title style='display:none'>Research limitations</title> <p>In the empirical analysis of HOS, we only included five broad disciplines. This study also has some biases caused by the data source and applied regression models.</p> </sec> <sec> <title style='display:none'>Practical implications</title> <p>This study contributes to the formulation of discipline-specific policies and promotes the growth of interdisciplinary research, as well as offering fresh insights for predicting the cross-disciplinary impact of disciplines.</p> </sec> <sec> <title style='display:none'>Originality/value</title> <p>This study provides a new perspective to properly understand the mechanisms of cross-disciplinary impact and disciplinary integration.</p> </sec> </abstract>ARTICLEtrue Lorenz majorization and frequencies of distances in an undirected network<abstract> <title style='display:none'>Abstract</title> <sec> <title style='display:none'>Purpose</title> <p>To contribute to the study of networks and graphs.</p> </sec> <sec> <title style='display:none'>Design/methodology/approach</title> <p>We apply standard mathematical thinking.</p> </sec> <sec> <title style='display:none'>Findings</title> <p>We show that the distance distribution in an undirected network Lorenz majorizes the one of a chain. As a consequence, the average and median distances in any such network are smaller than or equal to those of a chain.</p> </sec> <sec> <title style='display:none'>Research limitations</title> <p>We restricted our investigations to undirected, unweighted networks.</p> </sec> <sec> <title style='display:none'>Practical implications</title> <p>We are convinced that these results are useful in the study of small worlds and the so-called six degrees of separation property.</p> </sec> <sec> <title style='display:none'>Originality/value</title> <p>To the best of our knowledge our research contains new network results, especially those related to frequencies of distances.</p> </sec> </abstract>ARTICLEtrue comparison of model choice strategies for logistic regression<abstract> <title style='display:none'>Abstract</title> <sec> <title style='display:none'>Purpose</title> <p>The purpose of this study is to develop and compare model choice strategies in context of logistic regression. Model choice means the choice of the covariates to be included in the model.</p> </sec> <sec> <title style='display:none'>Design/methodology/approach</title> <p>The study is based on Monte Carlo simulations. The methods are compared in terms of three measures of accuracy: specificity and two kinds of sensitivity. A loss function combining sensitivity and specificity is introduced and used for a final comparison.</p> </sec> <sec> <title style='display:none'>Findings</title> <p>The choice of method depends on how much the users emphasize sensitivity against specificity. It also depends on the sample size. For a typical logistic regression setting with a moderate sample size and a small to moderate effect size, either BIC, BICc or Lasso seems to be optimal.</p> </sec> <sec> <title style='display:none'>Research limitations</title> <p>Numerical simulations cannot cover the whole range of data-generating processes occurring with real-world data. Thus, more simulations are needed.</p> </sec> <sec> <title style='display:none'>Practical implications</title> <p>Researchers can refer to these results if they believe that their data-generating process is somewhat similar to some of the scenarios presented in this paper. Alternatively, they could run their own simulations and calculate the loss function.</p> </sec> <sec> <title style='display:none'>Originality/value</title> <p>This is a systematic comparison of model choice algorithms and heuristics in context of logistic regression. The distinction between two types of sensitivity and a comparison based on a loss function are methodological novelties.</p> </sec> </abstract>ARTICLEtrue Triple Helix of innovation as a double game involving domestic and foreign actors<abstract> <title style='display:none'>Abstract</title> <sec> <title style='display:none'>Purpose</title> <p>The collaboration relationships between innovation actors at a geographic level may be considered as grouping two separate layers, the domestic and the foreign. At the level of each layer, the relationships and the actors involved constitute a Triple Helix game. The paper distinguished three levels of analysis: the global grouping together all actors, the domestic grouping together domestic actors, and the foreign related to only actors from partner countries.</p> </sec> <sec> <title style='display:none'>Design/methodology/approach</title> <p>Bibliographic records data from the Web of Science for South Korea and West Africa breakdown per innovation actors and distinguishing domestic and international collaboration are analyzed with game theory. The core, the Shapley value, and the nucleolus are computed at the three levels to measure the synergy between actors.</p> </sec> <sec> <title style='display:none'>Findings</title> <p>The synergy operates more in South Korea than in West Africa; the government is more present in West Africa than in South Korea; domestic actors create more synergy in South Korea, but foreign more in West Africa; South Korea can consume all the foreign synergy, which is not the case of West Africa.</p> </sec> <sec> <title style='display:none'>Research limitations</title> <p>Research data are limited to publication records; techniques and methods used may be extended to other research outputs.</p> </sec> <sec> <title style='display:none'>Practical implications</title> <p>West African governments should increase their investment in science, technology, and innovation to benefit more from the synergy their innovation actors contributed at the foreign level. However, the results of the current study may not be sufficient to prove that greater investment will yield benefits from foreign synergies.</p> </sec> <sec> <title style='display:none'>Originality/value</title> <p>This paper uses game theory to assess innovation systems by computing the contribution of foreign actors to knowledge production at an area level. It proposes an indicator to this end.</p> </sec> </abstract>ARTICLEtrue