Does Information Science Offer for Data Science Research?: A Review of Data and Information Ethics Literature

Abstract

This paper reviews literature pertaining to the development of data science as a discipline, current issues with data bias and ethics, and the role that the discipline of information science may play in addressing these concerns. Information science research and researchers have much to offer for data science, owing to their background as transdisciplinary scholars who apply human-centered and social-behavioral perspectives to issues within natural science disciplines. Information science researchers have already contributed to a humanistic approach to data ethics within the literature and an emphasis on data science within information schools all but ensures that this literature will continue to grow in coming decades. This review article serves as a reference for the history, current progress, and potential future directions of data ethics research within the corpus of information science literature.</p> </abstract>ARTICLE2022-09-08T00:00:00.000+00:00Implications of Publication Requirements for the Research Output of Ukrainian Academics in Scopus in 1999–2019<abstract> <title style='display:none'>Abstract</title> <sec><title style='display:none'>Purpose</title> <p>This article explores the implications of publication requirements for the research output of Ukrainian academics in Scopus in 1999–2019. As such it contributes to the existing body of knowledge on quantitative and qualitative effects of research evaluation policies.</p> </sec> <sec><title style='display:none'>Design/methodology/approach</title> <p>Three metrics were chosen to analyse the implications of publication requirements for the quality of research output: publications in predatory journals, publications in local journals and publications per SNIP quartile from the disciplinary perspective.</p> </sec> <sec><title style='display:none'>Findings</title> <p>Study results highlight, that, firstly, publications of Ukrainian authors in predatory journals rose to 1% in 2019. Secondly, the share of publications in local journals reached the peak of 47.3% in 2015. In 2019 it fell to 31.8%. Thirdly, though the total number of publications has risen dramatically since 2011, but the share of Q3+Q4 has exceeded the share of Q1+Q2. To summarise, the study findings highligh, that research evaluation policies are required to contain not only quantitative but also qualitative criteria.</p> </sec> <sec><title style='display:none'>Research limitation</title> <p>The study does not explore in detail the effects of a particular type of publication requirements.</p> </sec> <sec><title style='display:none'>Practical implications</title> <p>The findings of the study have practical implications for policymakers and university managers aimed to develop research evaluation policies.</p> </sec> <sec><title style='display:none'>Originality/value</title> <p>This paper gains insights into the effects of publication requirements on the research output of Ukrainian academics in Scopus.</p> </sec> </abstract>ARTICLE2022-08-12T00:00:00.000+00:00A Use Case of Patent Classification Using Deep Learning with Transfer Learning<abstract> <title style='display:none'>Abstract</title> <sec><title style='display:none'>Purpose</title> <p>Patent classification is one of the areas in Intellectual Property Analytics (IPA), and a growing use case since the number of patent applications has been increasing worldwide. We propose using machine learning algorithms to classify Portuguese patents and evaluate the performance of transfer learning methodologies to solve this task.</p> </sec> <sec><title style='display:none'>Design/methodology/approach</title> <p>We applied three different approaches in this paper. First, we used a dataset available by INPI to explore traditional machine learning algorithms and ensemble methods. After preprocessing data by applying TF-IDF, FastText and Doc2Vec, the models were evaluated by cross-validation in 5 folds. In a second approach, we used two different Neural Networks architectures, a Convolutional Neural Network (CNN) and a bi-directional Long Short-Term Memory (BiLSTM). Finally, we used pre-trained BERT, DistilBERT, and ULMFiT models in the third approach.</p> </sec> <sec><title style='display:none'>Findings</title> <p>BERTTimbau, a BERT architecture model pre-trained on a large Portuguese corpus, presented the best results for the task, even though with a performance of only 4% superior to a LinearSVC model using TF-IDF feature engineering.</p> </sec> <sec><title style='display:none'>Research limitations</title> <p>The dataset was highly imbalanced, as usual in patent applications, so the classes with the lowest samples were expected to present the worst performance. That result happened in some cases, especially in classes with less than 60 training samples.</p> </sec> <sec><title style='display:none'>Practical implications</title> <p>Patent classification is challenging because of the hierarchical classification system, the context overlap, and the underrepresentation of the classes. However, the final model presented an acceptable performance given the size of the dataset and the task complexity. This model can support the decision and improve the time by proposing a category in the second level of ICP, which is one of the critical phases of the grant patent process.</p> </sec> <sec><title style='display:none'>Originality/value</title> <p>To our knowledge, the proposed models were never implemented for Portuguese patent classification.</p> </sec> </abstract>ARTICLE2022-08-12T00:00:00.000+00:00A Morphology-Driven Method for Measuring Technology Complementarity: Empirical Study Involving Alzheimer's Disease<abstract> <title style='display:none'>Abstract</title> <sec><title style='display:none'>Purpose</title> <p>Measuring the exact technology complementarity between different institutions is necessary to obtain complementary technology resources for R&amp;D cooperation.</p> </sec> <sec><title style='display:none'>Design/methodology/approach</title> <p>This study constructs a morphology-driven method for measuring technology complementarity, taking medical field as an example. First, we calculate semantic similarities between subjects (S and S) and action-objects (AO and AO) based on the Metathesaurus, forming clusters of S and AO based on a semantic similarity matrix. Second, we identify key technology issues and methods based on clusters of S and AO. Third, a technology morphology matrix of several dimensions is constructed using morphology analysis, and the matrix is filled with subjects -action-objects (SAO) structures according to corresponding key technology issues and methods for different institutions. Finally, the technology morphology matrix is used to measure the technology complementarity between different institutions based on SAO.</p> </sec> <sec><title style='display:none'>Findings</title> <p>The improved technology complementarity method based on SAO is more of a supplementary and refined framework for the traditional IPC method.</p> </sec> <sec><title style='display:none'>Research limitations</title> <p>In future studies we will reprocess and identify the SAO structures which were not in the technology morphology matrix, and find other methods to characterize key technical issues and methods. Furthermore, we will add the comparison between proposed method and traditional and mostly used complementarity measurement method based on industry chain and industry code.</p> </sec> <sec><title style='display:none'>Practical implications</title> <p>This study takes medical field as an example. The morphology-driven method for measuring technology complementarity can be migrated and applied for any given field.</p> </sec> <sec><title style='display:none'>Originality/value</title> <p>From the perspective of complementary technology resources, this study develops and tests a more accurate morphology-driven method for technology complementarity measurement.</p> </sec> </abstract>ARTICLE2022-08-12T00:00:00.000+00:00Convergence of Impact Measures and Impact Bundles<abstract> <title style='display:none'>Abstract</title> <sec><title style='display:none'>Purpose</title><p>A new point of view in the study of impact is introduced.</p></sec> <sec><title style='display:none'>Design/methodology/approach</title><p>Using fundamental theorems in real analysis we study the convergence of well-known impact measures.</p></sec> <sec><title style='display:none'>Findings</title><p>We show that pointwise convergence is maintained by all well-known impact bundles (such as the h-, g-, and R-bundle) and that the μ-bundle even maintains uniform convergence. Based on these results, a classification of impact bundles is given.</p></sec> <sec><title style='display:none'>Research limitations</title><p>As for all impact studies, it is just impossible to study all measures in depth.</p></sec> <sec><title style='display:none'>Practical implications</title><p>It is proposed to include convergence properties in the study of impact measures.</p></sec> <sec><title style='display:none'>Originality/value</title><p>This article is the first to present a bundle classification based on convergence properties of impact bundles.</p></sec> </abstract>ARTICLE2022-07-16T00:00:00.000+00:00Learning Context-based Embeddings for Knowledge Graph Completion<abstract> <title style='display:none'>Abstract</title> <sec><title style='display:none'>Purpose</title> <p>Due to the incompleteness nature of knowledge graphs (KGs), the task of predicting missing links between entities becomes important. Many previous approaches are static, this posed a notable problem that all meanings of a polysemous entity share one embedding vector. This study aims to propose a polysemous embedding approach, named KG embedding under relational contexts (ContE for short), for missing link prediction.</p> </sec> <sec><title style='display:none'>Design/methodology/approach</title> <p>ContE models and infers different relationship patterns by considering the context of the relationship, which is implicit in the local neighborhood of the relationship. The forward and backward impacts of the relationship in ContE are mapped to two different embedding vectors, which represent the contextual information of the relationship. Then, according to the position of the entity, the entity's polysemous representation is obtained by adding its static embedding vector to the corresponding context vector of the relationship.</p> </sec> <sec><title style='display:none'>Findings</title> <p>ContE is a fully expressive, that is, given any ground truth over the triples, there are embedding assignments to entities and relations that can precisely separate the true triples from false ones. ContE is capable of modeling four connectivity patterns such as symmetry, antisymmetry, inversion and composition.</p> </sec> <sec><title style='display:none'>Research limitations</title> <p>ContE needs to do a grid search to find best parameters to get best performance in practice, which is a time-consuming task. Sometimes, it requires longer entity vectors to get better performance than some other models.</p> </sec> <sec><title style='display:none'>Practical implications</title> <p>ContE is a bilinear model, which is a quite simple model that could be applied to large-scale KGs. By considering contexts of relations, ContE can distinguish the exact meaning of an entity in different triples so that when performing compositional reasoning, it is capable to infer the connectivity patterns of relations and achieves good performance on link prediction tasks.</p> </sec> <sec><title style='display:none'>Originality/value</title> <p>ContE considers the contexts of entities in terms of their positions in triples and the relationships they link to. It decomposes a relation vector into two vectors, namely, forward impact vector and backward impact vector in order to capture the relational contexts. ContE has the same low computational complexity as TransE. Therefore, it provides a new approach for contextualized knowledge graph embedding.</p> </sec> </abstract>ARTICLE2022-04-25T00:00:00.000+00:00Fighting Against Academic Misconduct: What Can Scientometricians Do? of the Open Access Modality to the Impact of Hybrid Journals Controlling by Field and Time Effects<abstract> <title style='display:none'>Abstract</title> <sec><title style='display:none'>Purpose</title><p>Researchers are more likely to read and cite papers to which they have access than those that they cannot obtain. Thus, the objective of this work is to analyze the contribution of the Open Access (OA) modality to the impact of hybrid journals.</p></sec> <sec><title style='display:none'>Design/methodology/approach</title><p>The “research articles” in the year 2017 from 200 hybrid journals in four subject areas, and the citations received by such articles in the period 2017–2020 in the Scopus database, were analyzed. The hybrid OA papers were compared with the paywalled ones. The journals were randomly selected from those with share of OA papers higher than some minimal value. More than 60 thousand research articles were analyzed in the sample, of which 24% under the OA modality.</p></sec> <sec><title style='display:none'>Findings</title><p>We obtain at journal level that cites per article in both hybrid modalities (OA and paywalled) strongly correlate. However, there is no correlation between the OA prevalence and cites per article. There is OA citation advantage in 80% of hybrid journals. Moreover, the OA citation advantage is consistent across fields and held in time. We obtain an OA citation advantage of 50% in average, and higher than 37% in half of the hybrid journals. Finally, the OA citation advantage is higher in Humanities than in Science and Social Science.</p></sec> <sec><title style='display:none'>Research limitations</title><p>Some of the citation advantage is likely due to more access allows more people to read and hence cite articles they otherwise would not. However, causation is difficult to establish and there are many possible bias. Several factors can affect the observed differences in citation rates. Funder mandates can be one of them. Funders are likely to have OA requirement, and well-funded studies are more likely to receive more citations than poorly funded studies. Another discussed factor is the selection bias postulate, which suggests that authors choose only their most impactful studies to be open access.</p></sec> <sec><title style='display:none'>Practical implications</title><p>For hybrid journals, the open access modality is positive, in the sense that it provides a greater number of potential readers. This in turn translates into a greater number of citations and an improvement in the position of the journal in the rankings by impact factor. For researchers it is also positive because it increases the potential number of readers and citations received.</p></sec> <sec><title style='display:none'>Originality/value</title><p>Our study refines previous results by comparing documents more similar to each other. Although it does not examine the cause of the observed citation advantage, we find that it exists in a very large sample.</p></sec> </abstract>ARTICLE2022-04-25T00:00:00.000+00:00I Don’t Peer-Review for Non-Open Journals, and Neither Should You’m Nervous about Sharing This Secret with You: Youtube Influencers Generate Strong Parasocial Interactions by Discussing Personal Issues<abstract> <title style='display:none'>Abstract</title> <sec><title style='display:none'>Purpose</title><p>Performers may generate loyalty partly through eliciting illusory personal connections with their audience, parasocial relationships (PSRs), and individual illusory exchanges, parasocial interactions (PSIs). On social media, semi-PSIs are real but imbalanced exchanges with audiences, including through comments on influencers’ videos, and strong semi-PSIs are those that occur within PSRs. This article introduces and assesses an automatic method to detect videos with strong PSI potential.</p></sec> <sec><title style='display:none'>Design/methodology/approach</title><p>Strong semi-PSIs were hypothesized to occur when commenters used a variant of the pronoun “you”, typically addressing the influencer. Comments on the videos of UK female influencer channels were used to test whether the proportion of you pronoun comments could be an automated indicator of strong PSI potential, and to find factors associating with the strong PSI potential of influencer videos. The highest and lowest strong PSI potential videos for 117 influencers were classified with content analysis for strong PSI potential and evidence of factors that might elicit PSIs.</p></sec> <sec><title style='display:none'>Findings</title><p>The you pronoun proportion was effective at indicating video strong PSI potential, the first automated method to detect any type of PSI. Gazing at the camera, head and shoulders framing, discussing personal issues, and focusing on the influencer associated with higher strong PSI potential for influencer videos. New social media factors found include requesting feedback and discussing the channel itself.</p></sec> <sec><title style='display:none'>Research limitations</title><p>Only one country, genre and social media platform was analysed.</p></sec> <sec><title style='display:none'>Practical implications</title><p>The method can be used to automatically detect YouTube videos with strong PSI potential, helping influencers to monitor their performance.</p></sec> <sec><title style='display:none'>Originality/value</title><p>This is the first automatic method to detect any aspect of PSI or PSR.</p></sec> </abstract>ARTICLE2022-04-25T00:00:00.000+00:00Extracting and Measuring Uncertain Biomedical Knowledge from Scientific Statements<abstract> <title style='display:none'>Abstract</title> <sec><title style='display:none'>Purpose</title><p>Given the information overload of scientific literature, there is an increasing need for computable biomedical knowledge buried in free text. This study aimed to develop a novel approach to extracting and measuring uncertain biomedical knowledge from scientific statements.</p></sec> <sec><title style='display:none'>Design/methodology/approach</title><p>Taking cardiovascular research publications in China as a sample, we extracted subject–predicate–object triples (SPO triples) as knowledge units and unknown/hedging/conflicting uncertainties as the knowledge context. We introduced information entropy (IE) as potential metric to quantify the uncertainty of epistemic status of scientific knowledge represented at subject-object pairs (SO pairs) levels.</p></sec> <sec><title style='display:none'>Findings</title><p>The results indicated an extraordinary growth of cardiovascular publications in China while only a modest growth of the novel SPO triples. After evaluating the uncertainty of biomedical knowledge with IE, we identified the Top 10 SO pairs with highest IE, which implied the epistemic status pluralism. Visual presentation of the SO pairs overlaid with uncertainty provided a comprehensive overview of clusters of biomedical knowledge and contending topics in cardiovascular research.</p></sec> <sec><title style='display:none'>Research limitations</title><p>The current methods didn’t distinguish the specificity and probabilities of uncertainty cue words. The number of sentences surrounding a given triple may also influence the value of IE.</p></sec> <sec><title style='display:none'>Practical implications</title><p>Our approach identified major uncertain knowledge areas such as diagnostic biomarkers, genetic polymorphism and co-existing risk factors related to cardiovascular diseases in China. These areas are suggested to be prioritized; new hypotheses need to be verified, while disputes, conflicts, and contradictions need to be settled.</p></sec> <sec><title style='display:none'>Originality/value</title><p>We provided a novel approach by combining natural language processing and computational linguistics with informetric methods to extract and measure uncertain knowledge from scientific statements.</p></sec> </abstract>ARTICLE2022-04-25T00:00:00.000+00:00Bibliometrics Is Valuable Science. Why Do Some Journals Seem to Oppose It? Three-Step Workflow: A Pragmatic Approach to Allocating Academic Hospitals’ Affiliations for Bibliometric Purposes<abstract> <title style='display:none'>Abstract</title> <sec><title style='display:none'>Purpose</title> <p>A key question when ranking universities is whether or not to allocate the publication output of affiliated hospitals to universities. This paper presents a method for classifying the varying degrees of interdependency between academic hospitals and universities in the context of the Leiden Ranking.</p> </sec> <sec><title style='display:none'>Design/methodology/approach</title> <p>Hospital nomenclatures vary worldwide to denote some form of collaboration with a university, however they do not correspond to universally standard definitions. Thus, rather than seeking a normative definition of academic hospitals, we propose a three-step workflow that aligns the university-hospital relationship with one of three general models: full integration of the hospital and the medical faculty into a single organization; health science centres in which hospitals and medical faculty remain separate entities albeit within the same governance structure; and structures in which universities and hospitals are separate entities which collaborate with one another. This classification system provides a standard through which publications which mention affiliations with academic hospitals can be better allocated.</p> </sec> <sec><title style='display:none'>Findings</title> <p>In the paper we illustrate how the three-step workflow effectively translates the three above-mentioned models into two types of instrumental relationships for the assignation of publications: “associate” and “component”. When a hospital and a medical faculty are fully integrated or when a hospital is part of a health science centre, the relationship is classified as component. When a hospital follows the model of collaboration and support, the relationship is classified as associate. The compilation of data following these standards allows for a more uniform comparison between worldwide educational and research systems.</p> </sec> <sec><title style='display:none'>Research limitations</title> <p>The workflow is resource intensive, depends heavily on the information provided by universities and hospitals, and is more challenging for languages that use non-Latin characters. Further, the application of the workflow demands a careful evaluation of different types of input which can result in ambiguity and makes it difficult to automatize.</p> </sec> <sec><title style='display:none'>Practical implications</title> <p>Determining the type of affiliation an academic hospital has with a university can have a substantial impact on the publication counts for universities. This workflow can also aid in analysing collaborations among the two types of organizations.</p> </sec> <sec><title style='display:none'>Originality/value</title> <p>The three-step workflow is a unique way to establish the type of relationship an academic hospital has with a university accounting for national and regional differences on nomenclature.</p> </sec> </abstract>ARTICLE2022-02-03T00:00:00.000+00:00Academic Collaborator Recommendation Based on Attributed Network Embedding<abstract> <title style='display:none'>Abstract</title> <sec><title style='display:none'>Purpose</title> <p>Based on real-world academic data, this study aims to use network embedding technology to mining academic relationships, and investigate the effectiveness of the proposed embedding model on academic collaborator recommendation tasks.</p> </sec> <sec><title style='display:none'>Design/methodology/approach</title> <p>We propose an academic collaborator recommendation model based on attributed network embedding (ACR-ANE), which can get enhanced scholar embedding and take full advantage of the topological structure of the network and multi-type scholar attributes. The non-local neighbors for scholars are defined to capture strong relationships among scholars. A deep auto-encoder is adopted to encode the academic collaboration network structure and scholar attributes into a low-dimensional representation space.</p> </sec> <sec><title style='display:none'>Findings</title> <p>1. The proposed non-local neighbors can better describe the relationships among scholars in the real world than the first-order neighbors. 2. It is important to consider the structure of the academic collaboration network and scholar attributes when recommending collaborators for scholars simultaneously.</p> </sec> <sec><title style='display:none'>Research limitations</title> <p>The designed method works for static networks, without taking account of the network dynamics.</p> </sec> <sec><title style='display:none'>Practical implications</title> <p>The designed model is embedded in academic collaboration network structure and scholarly attributes, which can be used to help scholars recommend potential collaborators.</p> </sec> <sec><title style='display:none'>Originality/value</title> <p>Experiments on two real-world scholarly datasets, Aminer and APS, show that our proposed method performs better than other baselines.</p> </sec> </abstract>ARTICLE2022-02-03T00:00:00.000+00:00Progress and Knowledge Transfer from Science to Technology in the Research Frontier of CRISPR Based on the LDA Model<abstract> <title style='display:none'>Abstract</title> <sec><title style='display:none'>Purpose</title> <p>This study explores the underlying research topics regarding CRISPR based on the LDA model and figures out trends in knowledge transfer from science to technology in this area over the latest 10 years.</p> </sec> <sec><title style='display:none'>Design/methodology/approach</title> <p>We collected publications on CRISPR between 2011 and 2020 from the Web of Science, and traced all the patents citing them from 15,904 articles and 18,985 patents in total are downloaded and analyzed. The LDA model was applied to identify underlying research topics in related research. In addition, some indicators were introduced to measure the knowledge transfer from research topics of scientific publications to IPC-4 classes of patents.</p> </sec> <sec><title style='display:none'>Findings</title> <p>The emerging research topics on CRISPR were identified and their evolution over time displayed. Furthermore, a big picture of knowledge transition from research topics to technological classes of patents was presented. We found that for all topics on CRISPR, the average first transition year, the ratio of articles cited by patents, the NPR transition rate are respectively 1.08, 15.57%, and 1.19, extremely shorter and more intensive than those of general fields. Moreover, the transition patterns are different among research topics.</p> </sec> <sec><title style='display:none'>Research limitations</title> <p>Our research is limited to publications retrieved from the Web of Science and their citing patents indexed in A limitation inherent with LDA analysis is in the manual interpretation and labeling of “topics”.</p> </sec> <sec><title style='display:none'>Practical implications</title> <p>Our study provides good references for policy-makers on allocating scientific resources and regulating financial budgets to face challenges related to the transformative technology of CRISPR.</p> </sec> <sec><title style='display:none'>Originality/value</title> <p>The LDA model here is applied to topic identification in the area of transformative researches for the first time, as exemplified on CRISPR. Additionally, the dataset of all citing patents in this area helps to provide a full picture to detect the knowledge transition between S&amp;T.</p> </sec> </abstract>ARTICLE2022-02-03T00:00:00.000+00:00Does Success Breed Success? A Study on the Correlation between Impact Factor and Quantity in Chinese Academic Journals<abstract> <title style='display:none'>Abstract</title> <sec><title style='display:none'>Purpose</title> <p>This paper studies the relationship between the impact factor (IF) and the number of journal papers in Chinese publishing system.</p> </sec> <sec><title style='display:none'>Design/methodology/approach</title> <p>The method proposed by <xref ref-type="bibr" rid="j_jdis-2021-0031_ref_008">Huang (2016)</xref> is used whereas to analysis the data of Chinese journals in this study.</p> </sec> <sec><title style='display:none'>Findings</title> <p>Based on the analysis, we find the following. (1) The average impact factor (AIF) of journals in all disciplines maintained a growth trend from 2007 to 2017. Whether before or after removing outlier journals that may garner publication fees, the IF and its growth rate for most social sciences disciplines are larger than those of most natural sciences disciplines, and the number of journal papers on social sciences disciplines decreased while that of natural sciences disciplines increased from 2007 to 2017. (2) The removal of outlier journals has a greater impact on the relationship between the IF and the number of journal papers in some disciplines such as Geosciences because there may be journals that publish many papers to garner publication fees. (3) The success-breeds-success (SBS) principle is applicable in Chinese journals on natural sciences disciplines but not in Chinese journals on social sciences disciplines, and the relationship is the reverse of the SBS principle in Economics and Education &amp; Educational Research. (4) Based on interviews and surveys, the difference in the relationship between the IF and the number of journal papers for Chinese natural sciences disciplines and Chinese social sciences disciplines may be due to the influence of the international publishing system. Chinese natural sciences journals are losing their academic power while Chinese social sciences journals that are less influenced by the international publishing system are in fierce competition.</p> </sec> <sec><title style='display:none'>Research limitation</title> <p>More implications could be found if long-term tracking and comparing the international publishing system with Chinese publishing system are taken.</p> </sec> <sec><title style='display:none'>Practical implications</title> <p>It is suggested that researchers from different countries study natural science and social sciences journals in their languages and observe the influence of the international publishing system.</p> </sec> <sec><title style='display:none'>Originality/value</title> <p>This paper presents an overview of the relationship between IF and the number of journal papers in Chinese publishing system from 2007 to 2017, provides insights into the relationship in different disciplines in Chinese publishing system, and points out the similarities and differences between Chinese publishing system and international publishing system.</p> </sec> </abstract>ARTICLE2021-08-18T00:00:00.000+00:00Substantiality: A Construct Indicating Research Excellence to Measure University Research Performance<abstract> <title style='display:none'>Abstract</title> <sec><title style='display:none'>Purpose</title> <p>The adequacy of research performance of universities or research institutes have often been evaluated and understood in two axes: “quantity” (i.e. size or volume) and “quality” (i.e. what we define here as a measure of excellence that is considered theoretically independent of size or volume, such as clarity in diamond grading). The purpose of this article is, however, to introduce a third construct named “substantiality” (“ATSUMI” in Japanese) of research performance and to demonstrate its importance in evaluating/understanding research universities.</p> </sec> <sec><title style='display:none'>Design/methodology/approach</title> <p>We take a two-step approach to demonstrate the effectiveness of the proposed construct by showing that (1) some characteristics of research universities are not well captured by the conventional constructs (“quantity” and “quality”)-based indicators, and (2) the “substantiality” indicators can capture them. Furthermore, by suggesting that “substantiality” indicators appear linked to the reputation that appeared in university reputation rankings by simple statistical analysis, we reveal additional benefits of the construct.</p> </sec> <sec><title style='display:none'>Findings</title> <p>We propose a new construct named “substantiality” for measuring research performance. We show that indicators based on “substantiality” can capture important characteristics of research institutes. “Substantiality” indicators demonstrate their “predictive powers” on research reputation.</p> </sec> <sec><title style='display:none'>Research limitations</title> <p>The concept of “substantiality” originated from IGO game; therefore the ease/difficulty of accepting the concept is culturally dependent. In other words, while it is easily accepted by people from Japan and other East Asian countries and regions, it might be difficult for researchers from other cultural regions to accept it.</p> </sec> <sec><title style='display:none'>Practical implications</title> <p>There is no simple solution to the challenge of evaluating research universities’ research performance. It is vital to combine different types of indicators to understand the excellence of research institutes. Substantiality indicators could be part of such a combination of indicators.</p> </sec> <sec><title style='display:none'>Originality/value</title> <p>The authors propose a new construct named substantiality for measuring research performance. They show that indicators based on this construct can capture the important characteristics of research institutes.</p> </sec> </abstract>ARTICLE2021-07-25T00:00:00.000+00:00A Topic Detection Method Based on Word-attention Networks<abstract> <title style='display:none'>Abstract</title> <sec><title style='display:none'>Purpose</title> <p>We proposed a method to represent scientific papers by a complex network, which combines the approaches of neural and complex networks.</p> </sec> <sec><title style='display:none'>Design/methodology/approach</title> <p>Its novelty is representing a paper by a word branch, which carries the sequential structure of words in sentences. The branches are generated by the attention mechanism in deep learning models. We connected those branches at the positions of their common words to generate networks, called word-attention networks, and then detect their communities, defined as topics.</p> </sec> <sec><title style='display:none'>Findings</title> <p>Those detected topics can carry the sequential structure of words in sentences, represent the intra- and inter-sentential dependencies among words, and reveal the roles of words playing in them by network indexes.</p> </sec> <sec><title style='display:none'>Research limitations</title> <p>The parameter setting of our method may depend on practical data. Thus it needs human experience to find proper settings.</p> </sec> <sec><title style='display:none'>Practical implications</title> <p>Our method is applied to the papers of the PNAS, where the discipline designations provided by authors are used as the golden labels of papers’ topics.</p> </sec> <sec><title style='display:none'>Originality/value</title> <p>This empirical study shows that the proposed method outperforms the Latent Dirichlet Allocation and is more stable.</p> </sec> </abstract>ARTICLE2021-08-18T00:00:00.000+00:00New Indicators of the Technological Impact of Scientific Production<abstract> <title style='display:none'>Abstract</title> <sec><title style='display:none'>Purpose</title> <p>Building upon pioneering work by Francis Narin and others, a new methodological approach to assessing the technological impact of scientific research is presented.</p> </sec> <sec><title style='display:none'>Design/methodology/approach</title> <p>It is based on the analysis of citations made in patent families included in the PATSTAT database that is to scientific papers indexed in Scopus.</p> </sec> <sec><title style='display:none'>Findings</title> <p>An advanced citation matching procedure is applied to the data in order to construct two indicators of technological impact: on the citing (patent) side, the country/region in which protection is sought and a patent family's propensity to cite scientific papers are taken into account, and on the cited (paper) side, a relative citation rate is defined for patent citations to papers that is similar to the scientific paper-to-paper citation rate in classical bibliometrics.</p> </sec> <sec><title style='display:none'>Research limitations</title> <p>The results are limited by the available data, in our case Scopus and PATSTAT, and especially by the lack of standardization of references in patents. This required a matching procedure that is neither trivial nor exact.</p> </sec> <sec><title style='display:none'>Practical implications</title> <p>Results at the country/region, document type, and publication age levels are presented. The country/region-level results in particular reveal features that have remained hidden in analyses of straight counts. Especially notable is that the rankings of some Asian countries/regions move upwards when the proposed normalized indicator of technological impact is applied as against the case with straight counts of patent citations to those countries/regions’ published papers.</p> </sec> <sec><title style='display:none'>Originality/value</title> <p>In our opinion, the level of sophistication of the indicators proposed in the current paper is unparalleled in the scientific literature, and provides a solid basis for the assessment of the technological impact of scientific research in countries/regions and institutions.</p> </sec> </abstract>ARTICLE2021-06-24T00:00:00.000+00:00The Scientometric Measurement of Interdisciplinarity and Diversity in the Research Portfolios of Chinese Universities<abstract> <title style='display:none'>Abstract</title> <sec><title style='display:none'>Purpose</title> <p>Interdisciplinarity is a hot topic in science and technology policy. However, the concept of interdisciplinarity is both abstract and complex, and therefore difficult to measure using a single indicator. A variety of metrics for measuring the diversity and interdisciplinarity of articles, journals, and fields have been proposed in the literature. In this article, we ask whether institutions can be ranked in terms of their (inter-)disciplinary diversity.</p> </sec> <sec><title style='display:none'>Design/methodology/approach</title> <p>We developed a software application (interd_vb.exe) that outputs the values of relevant diversity indicators for any document set or network structure. The software is made available, free to the public, online. The indicators it considers include the advanced diversity indicators Rao-Stirling (<italic>RS</italic>) diversity and <italic>DIV*</italic>, as well as standard measures of diversity, such as the Gini coefficient, Shannon entropy, and the Simpson Index. As an empirical demonstration of how the application works, we compared the research portfolios of 42 “Double First-Class” Chinese universities across Web of Science Subject Categories (WCs).</p> </sec> <sec><title style='display:none'>Findings</title> <p>The empirical results suggest that <italic>DIV*</italic> provides results that are more in line with one's intuitive impressions than <italic>RS</italic>, particularly when the results are based on sample-dependent disparity measures. Furthermore, the scores for diversity are more consistent when based on a global disparity matrix than on a local map.</p> </sec> <sec><title style='display:none'>Research limitations</title> <p>“Interdisciplinarity” can be operationalized as bibliographic coupling among (sets of) documents with references to disciplines. At the institutional level, however, diversity may also indicate comprehensiveness. Unlike impact (e.g. citation), diversity and interdisciplinarity are context-specific and therefore provide a second dimension to the evaluation.</p> </sec> <sec><title style='display:none'>Policy or practical implications</title> <p>Operationalization and quantification make it necessary for analysts to make their choices and options clear. Although the equations used to calculate diversity are often mathematically transparent, the specification in terms of computer code helps the analyst to further precision in decisions. Although diversity is not necessarily a goal of universities, a high diversity score may inform potential policies concerning interdisciplinarity at the university level.

Originality/value

This article introduces a non-commercial online application to the public domain that allows researchers and policy analysts to measure "diversity" and "interdisciplinarity" using the various indicators as encompassing as possible for any document set or network structure (e.g. a network of co-authors). Insofar as we know, such a professional computing tool for evaluating data sets using diversity indicators has not yet been made available online.