rss_2.0Journal of Data and Information Science FeedSciendo RSS Feed for Journal of Data and Information Sciencehttps://sciendo.com/journal/JDIShttps://www.sciendo.comJournal of Data and Information Science Feedhttps://sciendo-parsed.s3.eu-central-1.amazonaws.com/64720edf215d2f6c89dba612/cover-image.jpghttps://sciendo.com/journal/JDIS140216A comparative study on characteristics of retracted publications across different open access levelshttps://sciendo.com/article/10.2478/jdis-2024-0010<abstract> <title style='display:none'>Abstract</title> <sec> <title style='display:none'>Purpose</title> <p>Recently, global science has shown an increasing open trend, however, the characteristics of research integrity of open access (OA) publications have rarely been studied. The aim of this study is to compare the characteristics of retracted articles across different OA levels and discover whether OA level influences the characteristics of retracted articles.</p> </sec> <sec> <title style='display:none'>Design/methodology/approach</title> <p>The research conducted an analysis of 6,005 retracted publications between 2001 and 2020 from the Web of Science and Retraction Watch databases. These publications were categorized based on their OA levels, including Gold OA, Green OA, and non-OA. The study explored retraction rates, time lags and reasons within these categories.</p> </sec> <sec> <title style='display:none'>Findings</title> <p>The findings of this research revealed distinct patterns in retraction rates among different OA levels. Publications with Gold OA demonstrated the highest retraction rate, followed by Green OA and non-OA. A comparison of retraction reasons between Gold OA and non-OA categories indicated similar proportions, while Green OA exhibited a higher proportion due to falsification and manipulation issues, along with a lower occurrence of plagiarism and authorship issues. The retraction time lag was shortest for Gold OA, followed by non-OA, and longest for Green OA. The prolonged retraction time for Green OA could be attributed to an atypical distribution of retraction reasons.</p> </sec> <sec> <title style='display:none'>Research limitations</title> <p>There is no exploration of a wider range of OA levels, such as Hybrid OA and Bronze OA.</p> </sec> <sec> <title style='display:none'>Practical implications</title> <p>The outcomes of this study suggest the need for increased attention to research integrity within the OA publications. The occurrences of falsification, manipulation, and ethical concerns within Green OA publications warrant attention from the scientific community.</p> </sec> <sec> <title style='display:none'>Originality/value</title> <p>This study contributes to the understanding of research integrity in the realm of OA publications, shedding light on retraction patterns and reasons across different OA levels.</p> </sec> </abstract>ARTICLEtruehttps://sciendo.com/article/10.2478/jdis-2024-00102024-03-29T00:00:00.000+00:00Amend: an integrated platform of retracted papers and concerned papershttps://sciendo.com/article/10.2478/jdis-2024-0012<abstract> <title style='display:none'>Abstract</title> <sec> <title style='display:none'>Purpose</title> <p>The notable increase in retraction papers has attracted considerable attention from diverse stakeholders. Various sources are now offering information related to research integrity, including concerns voiced on social media, disclosed lists of paper mills, and retraction notices accessible through journal websites. However, despite the availability of such resources, there remains a lack of a unified platform to consolidate this information, thereby hindering efficient searching and cross-referencing. Thus, it is imperative to develop a comprehensive platform for retracted papers and related concerns. This article aims to introduce “Amend,” a platform designed to integrate information on research integrity from diverse sources.</p> </sec> <sec> <title style='display:none'>Design/methodology/approach</title> <p>The Amend platform consolidates concerns and lists of problematic articles sourced from social media platforms (e.g., PubPeer, For Better Science), retraction notices from journal websites, and citation databases (e.g., Web of Science, CrossRef). Moreover, Amend includes investigation and punishment announcements released by administrative agencies (e.g., NSFC, MOE, MOST, CAS). Each related paper is marked and can be traced back to its information source via a provided link. Furthermore, the Amend database incorporates various attributes of retracted articles, including citation topics, funding details, open access status, and more. The reasons for retraction are identified and classified as either academic misconduct or honest errors, with detailed subcategories provided for further clarity.</p> </sec> <sec> <title style='display:none'>Findings</title> <p>Within the Amend platform, a total of 32,515 retracted papers indexed in SCI, SSCI, and ESCI between 1980 and 2023 were identified. Of these, 26,620 (81.87%) were associated with academic misconduct. The retraction rate stands at 6.64 per 10,000 articles. Notably, the retraction rate for non-gold open access articles significantly differs from that for gold open access articles, with this disparity progressively widening over the years. Furthermore, the reasons for retractions have shifted from traditional individual behaviors like falsification, fabrication, plagiarism, and duplication to more organized large-scale fraudulent practices, including Paper Mills, Fake Peer-review, and Artificial Intelligence Generated Content (AIGC).</p> </sec> <sec> <title style='display:none'>Research limitations</title> <p>The Amend platform may not fully capture all retracted and concerning papers, thereby impacting its comprehensiveness. Additionally, inaccuracies in retraction notices may lead to errors in tagged reasons.</p> </sec> <sec> <title style='display:none'>Practical implications</title> <p>Amend provides an integrated platform for stakeholders to enhance monitoring, analysis, and research on academic misconduct issues. Ultimately, the Amend database can contribute to upholding scientific integrity.</p> </sec> <sec> <title style='display:none'>Originality/value</title> <p>This study introduces a globally integrated platform for retracted and concerning papers, along with a preliminary analysis of the evolutionary trends in retracted papers.</p> </sec> </abstract>ARTICLEtruehttps://sciendo.com/article/10.2478/jdis-2024-00122024-03-21T00:00:00.000+00:00New roles of research data infrastructure in research paradigm evolutionhttps://sciendo.com/article/10.2478/jdis-2024-0011<abstract> <title style='display:none'>Abstract</title> <p>Research data infrastructures form the cornerstone in both cyber and physical spaces, driving the progression of the data-intensive scientific research paradigm. This opinion paper presents an overview of global research data infrastructure, drawing insights from national roadmaps and strategic documents related to research data infrastructure. It emphasizes the pivotal role of research data infrastructures by delineating four new missions aimed at positioning them at the core of the current scientific research and communication ecosystem. The four new missions of research data infrastructures are: (1) as a pioneer, to transcend the disciplinary border and address complex, cutting-edge scientific and social challenges with problem- and data-oriented insights; (2) as an architect, to establish a digital, intelligent, flexible research and knowledge services environment; (3) as a platform, to foster the high-end academic communication; (4) as a coordinator, to balance scientific openness with ethics needs.</p> </abstract>ARTICLEtruehttps://sciendo.com/article/10.2478/jdis-2024-00112024-03-05T00:00:00.000+00:00General laws of funding for scientific citations: how citations change in funded and unfunded research between basic and applied scienceshttps://sciendo.com/article/10.2478/jdis-2024-0005<abstract> <title style='display:none'>Abstract</title> <sec> <title style='display:none'>Purpose</title> <p>The goal of this study is to analyze the relationship between funded and unfunded papers and their citations in both basic and applied sciences.</p> </sec> <sec> <title style='display:none'>Design/methodology/approach</title> <p>A power law model analyzes the relationship between research funding and citations of papers using 831,337 documents recorded in the Web of Science database.</p> </sec> <sec> <title style='display:none'>Findings</title> <p>The original results reveal general characteristics of the diffusion of science in research fields: a) Funded articles receive higher citations compared to unfunded papers in journals; b) Funded articles exhibit a super-linear growth in citations, surpassing the increase seen in unfunded articles. This finding reveals a higher diffusion of scientific knowledge in funded articles. Moreover, c) funded articles in both basic and applied sciences demonstrate a similar expected change in citations, equivalent to about 1.23%, when the number of funded papers increases by 1% in journals. This result suggests, for the first time, that funding effect of scientific research is an invariant driver, irrespective of the nature of the basic or applied sciences.</p> </sec> <sec> <title style='display:none'>Originality/value</title> <p>This evidence suggests empirical laws of funding for scientific citations that explain the importance of robust funding mechanisms for achieving impactful research outcomes in science and society. These findings here also highlight that funding for scientific research is a critical driving force in supporting citations and the dissemination of scientific knowledge in recorded documents in both basic and applied sciences.</p> </sec> <sec> <title style='display:none'>Practical implications</title> <p>This comprehensive result provides a holistic view of the relationship between funding and citation performance in science to guide policymakers and R&amp;D managers with science policies by directing funding to research in promoting the scientific development and higher diffusion of results for the progress of human society.</p> </sec> </abstract>ARTICLEtruehttps://sciendo.com/article/10.2478/jdis-2024-00052024-02-26T00:00:00.000+00:00An explorative study on document type assignment of review articles in Web of Science, Scopus and journals’ websiteshttps://sciendo.com/article/10.2478/jdis-2024-0003<abstract> <title style='display:none'>Abstract</title> <sec> <title style='display:none'>Purpose</title> <p>Accurately assigning the document type of review articles in citation index databases like Web of Science(WoS) and Scopus is important. This study aims to investigate the document type assignation of review articles in Web of Science, Scopus and Publisher’s websites on a large scale.</p> </sec> <sec> <title style='display:none'>Design/methodology/approach</title> <p>27,616 papers from 160 journals from 10 review journal series indexed in SCI are analyzed. The document types of these papers labeled on journals’ websites, and assigned by WoS and Scopus are retrieved and compared to determine the assigning accuracy and identify the possible reasons for wrongly assigning. For the document type labeled on the website, we further differentiate them into explicit review and implicit review based on whether the website directly indicates it is a review or not.</p> </sec> <sec> <title style='display:none'>Findings</title> <p>Overall, WoS and Scopus performed similarly, with an average precision of about 99% and recall of about 80%. However, there were some differences between WoS and Scopus across different journal series and within the same journal series. The assigning accuracy of WoS and Scopus for implicit reviews dropped significantly, especially for Scopus.</p> </sec> <sec> <title style='display:none'>Research limitations</title> <p>The document types we used as the gold standard were based on the journal websites’ labeling which were not manually validated one by one. We only studied the labeling performance for review articles published during 2017-2018 in review journals. Whether this conclusion can be extended to review articles published in non-review journals and most current situation is not very clear.</p> </sec> <sec> <title style='display:none'>Practical implications</title> <p>This study provides a reference for the accuracy of document type assigning of review articles in WoS and Scopus, and the identified pattern for assigning implicit reviews may be helpful to better labeling on websites, WoS and Scopus.</p> </sec> <sec> <title style='display:none'>Originality/value</title> <p>This study investigated the assigning accuracy of document type of reviews and identified the some patterns of wrong assignments.</p> </sec> </abstract>ARTICLEtruehttps://sciendo.com/article/10.2478/jdis-2024-00032024-02-06T00:00:00.000+00:00Research funding and citations in papers of Nobel Laureates in Physics, Chemistry and Medicine, 2019-2020https://sciendo.com/article/10.2478/jdis-2024-0006<abstract> <title style='display:none'>Abstract</title> <sec> <title style='display:none'>Purpose</title> <p>The goal of this study is a comparative analysis of the relation between funding (a main driver for scientific research) and citations in papers of Nobel Laureates in physics, chemistry and medicine over 2019-2020 and the same relation in these research fields as a whole.</p> </sec> <sec> <title style='display:none'>Design/Methodology/Approach</title> <p>This study utilizes a power law model to explore the relationship between research funding and citations of related papers. The study here analyzes 3,539 recorded documents by Nobel Laureates in physics, chemistry and medicine and a broader dataset of 183,016 documents related to the fields of physics, medicine, and chemistry recorded in the Web of Science database.</p> </sec> <sec> <title style='display:none'>Findings</title> <p>Results reveal that in chemistry and medicine, funded researches published in papers of Nobel Laureates have higher citations than unfunded studies published in articles; vice versa high citations of Nobel Laureates in physics are for unfunded studies published in papers. Instead, when overall data of publications and citations in physics, chemistry and medicine are analyzed, all papers based on funded researches show higher citations than unfunded ones.</p> </sec> <sec> <title style='display:none'>Originality/Value</title> <p>Results clarify the driving role of research funding for science diffusion that are systematized in general properties: a) articles concerning funded researches receive more citations than (un)funded studies published in papers of physics, chemistry and medicine sciences, generating a high Matthew effect (a higher growth of citations with the increase in the number of papers); b) research funding increases the citations of articles in fields oriented to applied research (e.g., chemistry and medicine) more than fields oriented towards basic research (e.g., physics).</p> </sec> <sec> <title style='display:none'>Practical Implications</title> <p>The results here explain some characteristics of scientific development and diffusion, highlighting the critical role of research funding in fostering citations and the expansion of scientific knowledge. This finding can support decisionmaking of policymakers and R&amp;D managers to improve the effectiveness in allocating financial resources in science policies to generate a higher positive scientific and societal impact.</p> </sec> </abstract>ARTICLEtruehttps://sciendo.com/article/10.2478/jdis-2024-00062024-02-06T00:00:00.000+00:00A new evolutional model for institutional field knowledge flow networkhttps://sciendo.com/article/10.2478/jdis-2024-0009<abstract> <title style='display:none'>Abstract</title> <sec> <title style='display:none'>Purpose</title> <p>This paper aims to address the limitations in existing research on the evolution of knowledge flow networks by proposing a meso-level institutional field knowledge flow network evolution model (IKM). The purpose is to simulate the construction process of a knowledge flow network using knowledge organizations as units and to investigate its effectiveness in replicating institutional field knowledge flow networks.</p> </sec> <sec> <title style='display:none'>Design/Methodology/Approach</title> <p>The IKM model enhances the preferential attachment and growth observed in scale-free BA networks, while incorporating three adjustment parameters to simulate the selection of connection targets and the types of nodes involved in the network evolution process Using the PageRank algorithm to calculate the significance of nodes within the knowledge flow network. To compare its performance, the BA and DMS models are also employed for simulating the network. Pearson coefficient analysis is conducted on the simulated networks generated by the IKM, BA and DMS models, as well as on the actual network.</p> </sec> <sec> <title style='display:none'>Findings</title> <p>The research findings demonstrate that the IKM model outperforms the BA and DMS models in replicating the institutional field knowledge flow network. It provides comprehensive insights into the evolution mechanism of knowledge flow networks in the scientific research realm. The model also exhibits potential applicability to other knowledge networks that involve knowledge organizations as node units.</p> </sec> <sec> <title style='display:none'>Research Limitations</title> <p>This study has some limitations. Firstly, it primarily focuses on the evolution of knowledge flow networks within the field of physics, neglecting other fields. Additionally, the analysis is based on a specific set of data, which may limit the generalizability of the findings. Future research could address these limitations by exploring knowledge flow networks in diverse fields and utilizing broader datasets.</p> </sec> <sec> <title style='display:none'>Practical Implications</title> <p>The proposed IKM model offers practical implications for the construction and analysis of knowledge flow networks within institutions. It provides a valuable tool for understanding and managing knowledge exchange between knowledge organizations. The model can aid in optimizing knowledge flow and enhancing collaboration within organizations.</p> </sec> <sec> <title style='display:none'>Originality/value</title> <p>This research highlights the significance of meso-level studies in understanding knowledge organization and its impact on knowledge flow networks. The IKM model demonstrates its effectiveness in replicating institutional field knowledge flow networks and offers practical implications for knowledge management in institutions. Moreover, the model has the potential to be applied to other knowledge networks, which are formed by knowledge organizations as node units.</p> </sec> </abstract>ARTICLEtruehttps://sciendo.com/article/10.2478/jdis-2024-00092024-02-06T00:00:00.000+00:00Characterizing structure of cross-disciplinary impact of global disciplines: A perspective of the Hierarchy of Sciencehttps://sciendo.com/article/10.2478/jdis-2024-0008<abstract> <title style='display:none'>Abstract</title> <sec> <title style='display:none'>Purpose</title> <p>Interdisciplinary fields have become the driving force of modern science and a significant source of scientific innovation. However, there is still a paucity of analysis about the essential characteristics of disciplines’ cross-disciplinary impact.</p> </sec> <sec> <title style='display:none'>Design/methodology/approach</title> <p>In this study, we define cross-disciplinary impact on one discipline as its impact to other disciplines, and refer to a three-dimensional framework of variety-balance-disparity to characterize the structure of cross-disciplinary impact. The variety of cross-disciplinary impact of the discipline was defined as the proportion of the high cross-disciplinary impact publications, and the balance and disparity of cross-disciplinary impact were measured as well. To demonstrate the cross-disciplinary impact of the disciplines in science, we chose Microsoft Academic Graph (MAG) as the data source, and investigated the relationship between disciplines’ cross-disciplinary impact and their positions in the Hierarchy of Science (HOS).</p> </sec> <sec> <title style='display:none'>Findings</title> <p>Analytical results show that there is a significant correlation between the ranking of cross-disciplinary impact and the HOS structure, and that the discipline exerts a greater cross-disciplinary impact on its neighboring disciplines. Several bibliometric features that measure the hardness of a discipline, including the number of references, the number of cited disciplines, the citation distribution, and the Price index have a significant positive effect on the variety of cross-disciplinary impact. The number of references, the number of cited disciplines, and the citation distribution have significant positive and negative effects on balance and disparity, respectively. It is concluded that the less hard the discipline, the greater the cross-disciplinary impact, the higher balance and the lower disparity of cross-disciplinary impact.</p> </sec> <sec> <title style='display:none'>Research limitations</title> <p>In the empirical analysis of HOS, we only included five broad disciplines. This study also has some biases caused by the data source and applied regression models.</p> </sec> <sec> <title style='display:none'>Practical implications</title> <p>This study contributes to the formulation of discipline-specific policies and promotes the growth of interdisciplinary research, as well as offering fresh insights for predicting the cross-disciplinary impact of disciplines.</p> </sec> <sec> <title style='display:none'>Originality/value</title> <p>This study provides a new perspective to properly understand the mechanisms of cross-disciplinary impact and disciplinary integration.</p> </sec> </abstract>ARTICLEtruehttps://sciendo.com/article/10.2478/jdis-2024-00082024-02-06T00:00:00.000+00:00Extended Lorenz majorization and frequencies of distances in an undirected networkhttps://sciendo.com/article/10.2478/jdis-2024-0007<abstract> <title style='display:none'>Abstract</title> <sec> <title style='display:none'>Purpose</title> <p>To contribute to the study of networks and graphs.</p> </sec> <sec> <title style='display:none'>Design/methodology/approach</title> <p>We apply standard mathematical thinking.</p> </sec> <sec> <title style='display:none'>Findings</title> <p>We show that the distance distribution in an undirected network Lorenz majorizes the one of a chain. As a consequence, the average and median distances in any such network are smaller than or equal to those of a chain.</p> </sec> <sec> <title style='display:none'>Research limitations</title> <p>We restricted our investigations to undirected, unweighted networks.</p> </sec> <sec> <title style='display:none'>Practical implications</title> <p>We are convinced that these results are useful in the study of small worlds and the so-called six degrees of separation property.</p> </sec> <sec> <title style='display:none'>Originality/value</title> <p>To the best of our knowledge our research contains new network results, especially those related to frequencies of distances.</p> </sec> </abstract>ARTICLEtruehttps://sciendo.com/article/10.2478/jdis-2024-00072024-02-06T00:00:00.000+00:00A comparison of model choice strategies for logistic regressionhttps://sciendo.com/article/10.2478/jdis-2024-0001<abstract> <title style='display:none'>Abstract</title> <sec> <title style='display:none'>Purpose</title> <p>The purpose of this study is to develop and compare model choice strategies in context of logistic regression. Model choice means the choice of the covariates to be included in the model.</p> </sec> <sec> <title style='display:none'>Design/methodology/approach</title> <p>The study is based on Monte Carlo simulations. The methods are compared in terms of three measures of accuracy: specificity and two kinds of sensitivity. A loss function combining sensitivity and specificity is introduced and used for a final comparison.</p> </sec> <sec> <title style='display:none'>Findings</title> <p>The choice of method depends on how much the users emphasize sensitivity against specificity. It also depends on the sample size. For a typical logistic regression setting with a moderate sample size and a small to moderate effect size, either BIC, BICc or Lasso seems to be optimal.</p> </sec> <sec> <title style='display:none'>Research limitations</title> <p>Numerical simulations cannot cover the whole range of data-generating processes occurring with real-world data. Thus, more simulations are needed.</p> </sec> <sec> <title style='display:none'>Practical implications</title> <p>Researchers can refer to these results if they believe that their data-generating process is somewhat similar to some of the scenarios presented in this paper. Alternatively, they could run their own simulations and calculate the loss function.</p> </sec> <sec> <title style='display:none'>Originality/value</title> <p>This is a systematic comparison of model choice algorithms and heuristics in context of logistic regression. The distinction between two types of sensitivity and a comparison based on a loss function are methodological novelties.</p> </sec> </abstract>ARTICLEtruehttps://sciendo.com/article/10.2478/jdis-2024-00012024-02-06T00:00:00.000+00:00The Triple Helix of innovation as a double game involving domestic and foreign actorshttps://sciendo.com/article/10.2478/jdis-2024-0004<abstract> <title style='display:none'>Abstract</title> <sec> <title style='display:none'>Purpose</title> <p>The collaboration relationships between innovation actors at a geographic level may be considered as grouping two separate layers, the domestic and the foreign. At the level of each layer, the relationships and the actors involved constitute a Triple Helix game. The paper distinguished three levels of analysis: the global grouping together all actors, the domestic grouping together domestic actors, and the foreign related to only actors from partner countries.</p> </sec> <sec> <title style='display:none'>Design/methodology/approach</title> <p>Bibliographic records data from the Web of Science for South Korea and West Africa breakdown per innovation actors and distinguishing domestic and international collaboration are analyzed with game theory. The core, the Shapley value, and the nucleolus are computed at the three levels to measure the synergy between actors.</p> </sec> <sec> <title style='display:none'>Findings</title> <p>The synergy operates more in South Korea than in West Africa; the government is more present in West Africa than in South Korea; domestic actors create more synergy in South Korea, but foreign more in West Africa; South Korea can consume all the foreign synergy, which is not the case of West Africa.</p> </sec> <sec> <title style='display:none'>Research limitations</title> <p>Research data are limited to publication records; techniques and methods used may be extended to other research outputs.</p> </sec> <sec> <title style='display:none'>Practical implications</title> <p>West African governments should increase their investment in science, technology, and innovation to benefit more from the synergy their innovation actors contributed at the foreign level. However, the results of the current study may not be sufficient to prove that greater investment will yield benefits from foreign synergies.</p> </sec> <sec> <title style='display:none'>Originality/value</title> <p>This paper uses game theory to assess innovation systems by computing the contribution of foreign actors to knowledge production at an area level. It proposes an indicator to this end.</p> </sec> </abstract>ARTICLEtruehttps://sciendo.com/article/10.2478/jdis-2024-00042024-02-06T00:00:00.000+00:00Mapping the geography of editors-in-chiefhttps://sciendo.com/article/10.2478/jdis-2024-0002<abstract> <title style='display:none'>Abstract</title> <sec> <title style='display:none'>Purpose</title> <p>This study aims to explore the geography of editors-in-chief to demonstrate which countries exercise the highest-level decision-making in scholarly communication. In addition, the study seeks to investigate the potential relationships between the origin and nationality of academic publishers and the geography of editors-in-chief.</p> </sec> <sec> <title style='display:none'>Design/methodology/approach</title> <p>The analysis involves 11,915 journals listed in Web of Science’s Social Sciences Citation Index (SSCI) and Science Citation Index Expanded (SCIE). These journals employ 15,795 scholars as editors-in-chief. The geographical locations of the institutions the editors-in-chief are affiliated with were identified; then, the data were aggregated at the country level.</p> </sec> <sec> <title style='display:none'>Findings</title> <p>The results show that most editors-in-chief are located in countries of the Anglosphere, primarily the United States and the United Kingdom. In addition, most academic publishers and professional organizations that publish academic journals were found to be based in the United States and the United Kingdom, where most editors-in-chief are also based.</p> </sec> <sec> <title style='display:none'>Research limitations</title> <p>The analysis involves journals indexed in the Web of Science’s SCIE/SSCI databases, which are demonstrably biased toward the English language. Furthermore, the study only takes a snapshot of the geography of editors-in-chief for the year 2022, but it does not investigate trends.</p> </sec> <sec> <title style='display:none'>Research implications</title> <p>The study maps the highest-level decision-making in scholarly communication.</p> </sec> <sec> <title style='display:none'>Originality/value</title> <p>The study explores and maps the geography of editors-in-chief by using a massive dataset.</p> </sec> </abstract>ARTICLEtruehttps://sciendo.com/article/10.2478/jdis-2024-00022024-02-06T00:00:00.000+00:00The Nobel Prize winners will be among these candidateshttps://sciendo.com/article/10.2478/jdis-2023-0023ARTICLEtruehttps://sciendo.com/article/10.2478/jdis-2023-00232023-11-30T00:00:00.000+00:00Dimensionality reduction model based on integer planning for the analysis of key indicators affecting life expectancyhttps://sciendo.com/article/10.2478/jdis-2023-0025<abstract> <title style='display:none'>Abstract</title> <sec> <title style='display:none'>Purpose</title> <p>Exploring a dimensionality reduction model that can adeptly eliminate outliers and select the appropriate number of clusters is of profound theoretical and practical importance. Additionally, the interpretability of these models presents a persistent challenge.</p> </sec> <sec> <title style='display:none'>Design/methodology/approach</title> <p>This paper proposes two innovative dimensionality reduction models based on integer programming (DRMBIP). These models assess compactness through the correlation of each indicator with its class center, while separation is evaluated by the correlation between different class centers. In contrast to DRMBIP-p, the DRMBIP-v considers the threshold parameter as a variable aiming to optimally balances both compactness and separation.</p> </sec> <sec> <title style='display:none'>Findings</title> <p>This study, getting data from the Global Health Observatory (GHO), investigates 141 indicators that influence life expectancy. The findings reveal that DRMBIP-p effectively reduces the dimensionality of data, ensuring compactness. It also maintains compatibility with other models. Additionally, DRMBIP-v finds the optimal result, showing exceptional separation. Visualization of the results reveals that all classes have a high compactness.</p> </sec> <sec> <title style='display:none'>Research limitations</title> <p>The DRMBIP-p requires the input of the correlation threshold parameter, which plays a pivotal role in the effectiveness of the final dimensionality reduction results. In the DRMBIP-v, modifying the threshold parameter to variable potentially emphasizes either separation or compactness. This necessitates an artificial adjustment to the overflow component within the objective function.</p> </sec> <sec> <title style='display:none'>Practical implications</title> <p>The DRMBIP presented in this paper is adept at uncovering the primary geometric structures within high-dimensional indicators. Validated by life expectancy data, this paper demonstrates potential to assist data miners with the reduction of data dimensions.</p> </sec> <sec> <title style='display:none'>Originality/value</title> <p>To our knowledge, this is the first time that integer programming has been used to build a dimensionality reduction model with indicator filtering. It not only has applications in life expectancy, but also has obvious advantages in data mining work that requires precise class centers.</p> </sec> </abstract>ARTICLEtruehttps://sciendo.com/article/10.2478/jdis-2023-00252023-11-30T00:00:00.000+00:00Text duplication of papers in four medical related fieldshttps://sciendo.com/article/10.2478/jdis-2023-0024<abstract> <title style='display:none'>Abstract</title> <sec> <title style='display:none'>Purpose</title> <p>To reveal the typical features of text duplication in papers from four medical fields: basic medicine, health management, pharmacology and pharmacy, and public health and preventive medicine. To analyze the reasons for duplication and provide suggestions for the management of medical academic misconduct.</p> </sec> <sec> <title style='display:none'>Design/methodology/approach</title> <p>In total, 2,469 representative Chinese journal papers were included in our research, which were submitted by researchers in 2020 and 2021. A plagiarism check was carried out using the Academic Misconduct Literature Check System (AMLC). We generated a corrected similarity index based on the AMLC general similarity index for further analysis. We compared the similarity indices of papers in four medical fields and revealed their trends over time; differences in similarity index between review and research articles were also analyzed according to the different fields. Further analysis of 143 papers suspected of plagiarism was also performed from the perspective of sections containing duplication and according to the field of research.</p> </sec> <sec> <title style='display:none'>Findings</title> <p>Papers in the field of pharmacology and pharmacy had the highest similarity index (8.67 ± 5.92%), which was significantly higher than that in other fields, except health management. The similarity index of review articles (9.77 ± 10.28%) was significantly higher than that of research articles (7.41 ± 6.26%). In total, 143 papers were suspected of plagiarism (5.80%) with similarity indices ≥ 15%; most were papers on health management (78, 54.55%), followed by public health and preventive medicine (38, 26.58%); 90.21% of the 143 papers had duplication in multiple sections, while only 9.79% had duplication in a single section. The distribution of sections with duplication varied among different fields; papers in pharmacology and pharmacy were more likely to have duplication in the data/methods and introduction/background sections, however, papers in health management were more likely to contain duplication in the introduction/background or results/discussion sections. Different structures for papers in different fields may have caused these differences.</p> </sec> <sec> <title style='display:none'>Research limitations</title> <p>There were three limitations to our research. Firstly, we observed that a small number of papers have been checked early. It is unknown who conducted the plagiarism check as this can be included in other evaluations, such as applications for Science and technology projects or awards. If the authors carried out the check, text with high similarity indices may have been excluded before submission, meaning the similarity index in our research may have been lower than the original value. Secondly, there were only four medical fields included in our research. Additional analysis on a wider scale is required in the future. Thirdly, only a general similarity index was calculated in our study; other similarity indices were not tested.</p> </sec> <sec> <title style='display:none'>Practical implications</title> <p>A comprehensive analysis of similarity indices in four medical fields was performed. We made several recommendations for the supervision of medical academic misconduct and the formation of criteria for defining suspected plagiarism for medical papers, as well as for the improved accuracy of text duplication checks.</p> </sec> <sec> <title style='display:none'>Originality/value</title> <p>We quantified the differences between the AMLC general similarity index and the corrected index, described the situation around text duplication and plagiarism in papers from four medical fields, and revealed differences in similarity indices between different article types. We also revealed differences in the sections containing duplication for papers with suspected plagiarism among different fields.</p> </sec> </abstract>ARTICLEtruehttps://sciendo.com/article/10.2478/jdis-2023-00242023-10-13T00:00:00.000+00:00Research misconduct in hospitals is spreading: A bibliometric analysis of retracted papers from Chinese university-affiliated hospitalshttps://sciendo.com/article/10.2478/jdis-2023-0022<abstract> <title style='display:none'>Abstract</title> <sec> <title style='display:none'>Purpose</title> <p>The number of retracted papers from Chinese university-affiliated hospitals is increasing, which has raised much concern. The aim of this study is to analyze the retracted papers from university-affiliated hospitals in mainland China from 2000 to 2021.</p> </sec> <sec> <title style='display:none'>Design/methodology/approach</title> <p>Data for 1,031 retracted papers were identified from the Web of Science Core collection database. The information of the hospitals involved was obtained from their official websites. We analyzed the chronological changes, journal distribution, discipline distribution and retraction reasons for the retracted papers. The grade and geographic locations of the hospitals involved were explored as well.</p> </sec> <sec> <title style='display:none'>Findings</title> <p>We found a rapid increase in the number of retracted papers, while the retraction time interval is decreasing. The main reasons for retraction are plagiarism/self-plagiarism (n=255), invalid data/images/conclusions (n=212), fake peer review (n=175) and honesty error(n=163). The disciplines are mainly distributed in oncology (n=320), pharmacology &amp; pharmacy (n=198) and research &amp; experimental medicine (n=166). About 43.8% of the retracted papers were from hospitals affiliated with prestigious universities.</p> </sec> <sec> <title style='display:none'>Research limitations</title> <p>This study fails to differentiate between retractions due to honest error and retractions due to research misconduct. We believe that there is a fundamental difference between honest error retractions and misconduct retractions. Another limitation is that authors of the retracted papers have not been analyzed in this study.</p> </sec> <sec> <title style='display:none'>Practical implications</title> <p>This study provides a reference for addressing research misconduct in Chinese university-affiliated hospitals. It is our recommendation that universities and hospitals should educate all their staff about the basic norms of research integrity, punish authors of scientific misconduct retracted papers, and reform the unreasonable evaluation system.</p> </sec> <sec> <title style='display:none'>Originality/value</title> <p>Based on the analysis of retracted papers, this study further analyzes the characteristics of institutions of retracted papers, which may deepen the research on retracted papers and provide a new perspective to understand the retraction phenomenon.</p> </sec> </abstract>ARTICLEtruehttps://sciendo.com/article/10.2478/jdis-2023-00222023-09-22T00:00:00.000+00:00The need to develop tailored tools for improving the quality of thematic bibliometric analyses: Evidence from papers published in Sustainability and Scientometricshttps://sciendo.com/article/10.2478/jdis-2023-0021<abstract> <title style='display:none'>Abstract</title> <sec> <title style='display:none'>Purpose</title> <p>The aim of this article is to explore up to seven parameters related to the methodological quality and reproducibility of thematic bibliometric research published in the two most productive journals in bibliometrics, Sustainability (a journal outside the discipline) and Scientometrics, the flagship journal in the field.</p> </sec> <sec> <title style='display:none'>Design/methodology/approach</title> <p>The study identifies the need for developing tailored tools for improving the quality of thematic bibliometric analyses, and presents a framework that can guide the development of such tools. A total of 508 papers are analysed, 77% of Sustainability, and 23% published in Scientometrics, for the 2019-2021 period.</p> </sec> <sec> <title style='display:none'>Findings</title> <p>An average of 2.6 shortcomings per paper was found for the whole sample, with an almost identical number of flaws in both journals. Sustainability has more flaws than Scientometrics in four of the seven parameters studied, while Scientometrics has more shortcomings in the remaining three variables.</p> </sec> <sec> <title style='display:none'>Research limitations</title> <p>The first limitation of this work is that it is a study of two scientific journals, so the results cannot be directly extrapolated to the set of thematic bibliometric analyses published in journals from all fields.</p> </sec> <sec> <title style='display:none'>Practical implications</title> <p>We propose the adoption of protocols, guidelines, and other similar tools, adapted to bibliometric practice, which could increase the thoroughness, transparency, and reproducibility of this type of research.</p> </sec> <sec> <title style='display:none'>Originality/value</title> <p>These results show considerable room for improvement in terms of the adequate use and breakdown of methodological procedures in thematic bibliometric research, both in journals in the Information Science area and journals outside the discipline.</p> </sec> </abstract>ARTICLEtruehttps://sciendo.com/article/10.2478/jdis-2023-00212023-09-22T00:00:00.000+00:00The notion of dominant terminology in bibliometric researchhttps://sciendo.com/article/10.2478/jdis-2023-0020<abstract> <title style='display:none'>Abstract</title> <p>In this opinion paper, we introduce the expressions of <italic>dominant terminology</italic> and <italic>dominant term</italic> in the quantitative studies of science in analogy to the notion of dominant design in product development and innovation.</p> </abstract>ARTICLEtruehttps://sciendo.com/article/10.2478/jdis-2023-00202023-09-13T00:00:00.000+00:00Editorial board publication strategy and acceptance rates in Turkish national journalshttps://sciendo.com/article/10.2478/jdis-2023-0019<abstract> <title style='display:none'>Abstract</title> <sec> <title style='display:none'>Purpose</title> <p>This study takes advantage of newly released journal metrics to investigate whether local journals with more qualified boards have lower acceptance rates, based on data from 219 Turkish national journals and 2,367 editorial board members.</p> </sec> <sec> <title style='display:none'>Design/methodology/approach</title> <p>This study argues that journal editors can signal their scholarly quality by publishing in reputable journals. Conversely, editors publishing inside articles in affiliated national journals would send negative signals. The research predicts that high (low) quality editorial boards will conduct more (less) selective evaluation and their journals will have lower (higher) acceptance rates. Based on the publication strategy of editors, four measures of board quality are defined: Number of board inside publications per editor (INSIDER), number of board Social Sciences Citation Index publications per editor (SSCI), inside-to-SSCI article ratio (ISRA), and board citation per editor (CITATION). Predictions are tested by correlation and regression analysis.</p> </sec> <sec> <title style='display:none'>Findings</title> <p>Low-quality board proxies (INSIDER, ISRA) are positively, and high-quality board proxies (SSCI, CITATION) are negatively associated with acceptance rates. Further, we find that receiving a larger number of submissions, greater women representation on boards, and Web of Science and Scopus (WOSS) coverage are associated with lower acceptance rates. Acceptance rates for journals range from 12% to 91%, with an average of 54% and a median of 53%. Law journals have significantly higher average acceptance rate (68%) than other journals, while WOSS journals have the lowest (43%). Findings indicate some of the highest acceptance rates in Social Sciences literature, including competitive Business and Economics journals that traditionally have low acceptance rates.</p> </sec> <sec> <title style='display:none'>Limitations</title> <p>Research relies on local context to define publication strategy of editors. Findings may not be generalizable to mainstream journals and core science countries where emphasis on research quality is stronger and editorial selection is based on scientific merit.</p> </sec> <sec> <title style='display:none'>Practical implications</title> <p>Results offer useful insights into editorial management of national journals and allow us to make sense of local editorial practices. The importance of scientific merit for selection to national journal editorial boards is particularly highlighted for sound editorial evaluation of submitted manuscripts.</p> </sec> <sec> <title style='display:none'>Originality/value</title> <p>This is the first attempt to document a significant relation between acceptance rates and editorial board publication behavior.</p> </sec> </abstract>ARTICLEtruehttps://sciendo.com/article/10.2478/jdis-2023-00192023-08-25T00:00:00.000+00:00Differences between journal and conference in computer science: a bibliometric view based on Bayesian networkhttps://sciendo.com/article/10.2478/jdis-2023-0017<abstract> <title style='display:none'>Abstract</title> <sec> <title style='display:none'>Purpose</title> <p>This paper aims to investigate the differences between conference papers and journal papers in the field of computer science based on Bayesian network.</p> </sec> <sec> <title style='display:none'>Design/methodology/approach</title> <p>This paper investigated the differences between conference papers and journal papers in the field of computer science based on Bayesian network, a knowledge-representative framework that can model relationships among all variables in the network. We defined the variables required for Bayesian networks modeling, calculated the values of each variable based Aminer dataset (a literature data set in the field of computer science), learned the Bayesian network and derived some findings based on network inference.</p> </sec> <sec> <title style='display:none'>Findings</title> <p>The study found that conferences are more attractive to senior scholars, the academic impact of conference papers is slightly higher than journal papers, and it is uncertain whether conference papers are more innovative than journal papers.</p> </sec> <sec> <title style='display:none'>Research limitations</title> <p>The study was limited to the field of computer science and employed Aminer dataset as the sample. Further studies involving more diverse datasets and different fields could provide a more complete picture of the matter.</p> </sec> <sec> <title style='display:none'>Practical implications</title> <p>By demonstrating that Bayesian networks can effectively analyze issues in Scientometrics, the study offers valuable insights that may enhance researchers’ understanding of the differences between journal and conference in computer science.</p> </sec> <sec> <title style='display:none'>Originality/value</title> <p>Academic conferences play a crucial role in facilitating scholarly exchange and knowledge dissemination within the field of computer science. Several studies have been conducted to examine the distinctions between conference papers and journal papers in terms of various factors, such as authors, citations, h-index and others. Those studies were carried out from different (independent) perspectives, lacking a systematic examination of the connections and interactions between multiple perspectives. This paper supplements this deficiency based on Bayesian network modeling.</p> </sec> </abstract>ARTICLEtruehttps://sciendo.com/article/10.2478/jdis-2023-00172023-08-25T00:00:00.000+00:00en-us-1