How real is the quantitative turn? Investigating statistics as the new normal in linguistics<abstract> <title style='display:none'>Abstract</title> <p>Statistical approaches in linguistics seem to have gained in importance in recent times, especially in the field of Corpus Linguistics. In particular, the last ten years have seen an upsurge of linguists being dedicated to statistical methods and the improvement of statistical knowledge. This has repeatedly been described as 'the quantitative turn' in linguistics. In the present paper, we assess how real this quantitative turn actually is and whether statistics can be considered the 'new normal' in (corpus) linguistics. To this end, we have analyzed the contributions to six high-impact journals (<italic>Corpora, Corpus Linguistics and Linguistic Theory, ICAME Journal, English World-Wide, Journal of English Linguistics</italic>, and <italic>Language Variation and Change</italic>) for a period of eleven years (January 2011 until December 2021). Our results suggest that, indeed, statistical methods seem to be on the rise in linguistic studies. However, their frequency strongly varies between the journals, and, in general, we have identified some room for improvement in the use of advanced statistical methods, in particular the discussion of true prediction.</p> </abstract>ARTICLEtrue Agnieszka Leńko-Szymańska and Sandra Götz (eds.). . <p>For instance, many studies have discussed the differences in how often certain elements are used (i) in corpus data from native speakers vs. corpus data from learner from different L1 backgrounds, (ii) in corpora representing different inner- and outer-circle varieties, or (iii) by speakers in corpora representing people of different gender or sexual identities.</p> <p>This paper will make the admittedly bold claim that any such study can in fact by definition unable to 'prove' what is often their main points, namely that the distributional differences found are in fact due to the one hypothesized explanatory variable(s) of L1, VARIETY, or, e.g., GENDER even when the distributional differences are significant and come with a decent effect size. For instance, many studies have discussed the differences in how often certain elements are used (i) in corpus data from native speakers vs. corpus data from learner from different L1 backgrounds, (ii) in corpora representing different inner- and outer-circle varieties, or (iii) by speakers in corpora representing people of different gender or sexual identities.</p> <p>This paper will make the admittedly bold claim that any such study can in fact by definition unable to ‘prove’ what is often their main points, namely that the distributional differences found are in fact due to the one hypothesized explanatory variable(s) of L1, VARIETY, or, e.g., GENDER even when the distributional differences are significant and come with a decent effect size. To substantiate this claim, I will discuss some terminology from the family of methods known as multi-level modeling, namely the distinction between level-1, level-2, ... level-<italic>n</italic> variables and its relevance for many corpus studies. Second, I will then demonstrate how studies using only the above kinds of variables cannot distinguish the effect of their favored predictors from the effect of local/contextual level-1 variables. Third, in discussing this, I will exemplify how such effects need to be explored quantitatively instead.</p> </abstract> Grammatical variation in English as a lingua franca: Multivariate analysis of modal verbs of obligation and necessity in the VOICE corpus<abstract> <title style='display:none'>Abstract</title> <p>The modal verbs of necessity and obligation, a testing ground of grammatical change, have been shown to exhibit change and variation in world Englishes. Previous studies have primarily concentrated on English as a native language (ENL) and English as a second language (ESL) varieties. The present study extends this line of research and explores variation in modal verbs of necessity and obligation in English use as a Lingua Franca (ELF). Descriptive statistics indicate that ELF resembles American English and also shares similarities with ESL varieties. In addition, ELF further exhibits divergence from both ENL and ESL varieties that arises in multilingual interactions. The multivariate analysis of this study employs mixed-effects logistic regression on the use of must and have to. Integrating social and linguistic factors, this analysis exploits metadata gathered from the VOICE corpus, which has thus far been underused. The results of the inferential statistics indicate that the same sociolinguistic factors that influence the variation in ENL and ESL varieties also shape ELF grammar. These findings not only bring ELF closer to other English varieties but also demonstrate the advantage of studying ELF from a variationist sociolinguistic perspective.</p> </abstract> Mapping shared lexical bundles onto rhetorical moves in nursing research articles: A comparative study of paradigmatic variation<abstract> <title style='display:none'>Abstract</title> <p>Previous studies have identified frequent lexical bundles associated with qualitative, quantitative, and mixed methods research paradigms. These paradigmatic investigations of lexical bundles conducted thus far seem to have two limitations. One is that they have primarily concentrated on distinctive lexical bundles, without much analysis of the shared bundles in qualitative, quantitative, and mixed methods research paradigms. Another shortcoming is that they tend not to explore in which contexts lexical bundles are likely to occur. These two problems deserve attention, as shared bundles are also frequently used to facilitate fluent linguistic production and analysing lexical bundles in their surrounding contexts can help reveal their specific textual meanings. To address these two limitations, this study seeks to link shared lexical bundles with rhetorical moves based on a corpus consisting of qualitative, quantitative, and mixed methods nursing research articles. The findings of this study show that in certain move-steps, shared lexical bundles have distinctive discourse functions in mixed methods research. Meanwhile, the findings also show that there are move-steps where shared lexical bundles have similar discourse functions in two or three research paradigms. Revealing shared lexical bundles' discourse functions in specific contexts may enable learners to know where to use the bundles in a text.</p> </abstract> Semantic prosody, semantic transfer and semantic change<abstract> <title style='display:none'>Abstract</title> <p>This article investigates semantic prosody in a diachronic perspective. Although prosodies have been shown to change over time, there is no consensus regarding the source of such changes. The present study explores this further through a corpus study of the development of the lemmas <sc>fabric</sc>, <sc>fabricate</sc> and <sc>fabrication</sc> from the late 15th century to the late 20th century, drawing on material from Early English Books Online, the Corpus of Late Modern English Texts and the British National Corpus. The results of the study show that prosodic changes coincide with the emergence of new senses and indicate that these processes are related to and possibly caused by semantic transfer induced by persistent prosodies over time.</p> </abstract> A comparative corpus-based investigation of results sections of research articles in Applied Linguistics and Physics<abstract> <title style='display:none'>Abstract</title> <p>The present study sought to identify the generic structures of the results sections of scientific research articles (RAs) between Applied Linguistics and Physics. Following a manual search approach, a total of 200 RAs in the field of Applied Linguistics and Physics from different top prestigious journals randomly were singled out and analyzed. In addition to offering a tentative template for the rhetorical organizations of results sections, the findings revealed shared and non-shared rhetorical units as well as obligatory and optional steps in the results sections (RSs) of research articles between the disciplines. The findings also indicated that RA writers organize the contents of the RSs around certain rhetorical resources (i.e., M1, M2, M3, M4, and M5) to present key experimental and factual analytical results of their studies. The findings further suggested the existence of common core of rhetorical resources in writing RSs between the disciplines, albeit there are a set of certain steps playing an essential part in distinguishing textual features of each discipline as well as depicting how RSs of individual discipline are developed. The findings generated from the study can offer a number of important pedagogical implications for teaching EAP and ESP courses, especially for Applied Linguistics and Physics teachers and students.</p> </abstract> Semantic differences between English clippings and their source words: A corpus-based study<abstract> <title style='display:none'>Abstract</title> <p>This paper uses corpus data and methods of distributional semantics in order to study English clippings such as dorm (&lt; <italic>dormitory</italic>), <italic>memo</italic> (&lt; <italic>memorandum</italic>), or quake (&lt; <italic>earthquake</italic>). We investigate whether systematic meaning differences between clippings and their source words can be detected. The analysis is based on a sample of 50 English clippings. Each of the clippings is represented by a concordance of 100 examples in context that were gathered from the Corpus of Contemporary American English. We compare clippings and their source words both at the aggregate level and in terms of comparisons between individual clippings and their source words. The data show that clippings tend to be used in contexts that represent involved text production, which aligns with the idea that clipped words signal familiarity with their referents. It is further observed that individual clippings and their source words partly diverge in their distributional profiles, reflecting both overlap and differences with regard to their meanings. We interpret these findings against the theoretical background of Construction Grammar and specifically the Principle of No Synonymy.</p> </abstract> TV series as disseminators of emerging vocabulary: Non-codified expressions in the TV Corpus<abstract> <title style='display:none'>Abstract</title> <p>This study presents a method for identifying words that appear in corpus data earlier than their first date of attestation in dictionaries. We demonstrate the application of this method based on a large diachronic corpus, the TV Corpus, and the <italic>Oxford English Dictionary</italic> (OED). Combining automatic extraction of candidate terms from the TV Corpus with comprehensive manual analysis and verification, the method identifies 32 words that were used in TV series before their first attestation in the OED. We present a detailed discussion of these words, analysing their distribution across decades and genres of the TV Corpus, their origins, semantic domains and word-formation processes. We also present extracts with their first uses in the TV Corpus and analyse how the words were presented to the large and anonymous mass audience. Our study shows that the method we present is suitable for identifying early attestations of words in large corpora, even though in the case of the TV Corpus, a great deal of manual analysis and verification is needed. In addition, we argue that TV series and other types of fictional texts are an important resource for studying the coinage and spread of terms, due to their function and the fact that they address a mass audience.</p> </abstract> From servant to yours: The simplification of leavetaking formulae in 18th-century Scottish and Irish English letters<abstract> <title style='display:none'>Abstract</title> <p>The study in hand investigates the impact of social status on the use and change of pragmatic formulae in historical varieties of English. The study asks which leavetaking formulae are used between writers of equal social status in varieties of English in the later 18th century. Working on a corpus of letters compiled from two subsets of letters each from 18th-century Scottish and Irish English, the study illustrates pragmatic change on the basis of the investigation of leavetakings involving the <italic>servant</italic> formula. By doing so, the study also helps to widen the hitherto predominating narrow focus on mainly English English.</p> <p>The study shows that the use of formulae is situationally dependant. It suggests that pragmatic change takes place amongst writers of equal social status in the private domain, which then leads to the use of such formulae in the public domain and to the use between writers of different status groups.</p> </abstract> CoCELD: A new tool for analysing recent changes in English legal discourse<abstract> <title style='display:none'>Abstract</title> <p>Legal discourse is widely assumed to be resistant to change, and indeed legislative documents are extremely conservative with fixed and formulaic structures. However, recent research has shown that changes can be observed in the lexico-grammatical features of some legal documents when examined diachronically, particularly since the emergence in the 1970s of the Plain Language Movement, which sought to draw attention to the unnecessary complexity of the official language, this including legal discourse. Despite the crucial changes in legal language in recent years, research in that direction is scarce to date, particularly in the British English variety, probably due, in part, to the shortage of specialised corpora that allow this kind of studies. In order to bridge this gap, we have embarked on the compilation of the <italic>Corpus of Contemporary English Legal Decisions, 1950–2021</italic> (CoCELD), a corpus of British judicial decisions produced between 1950 and 2021. In this paper we present the structure and characteristics of CoCELD, as well as the methodology used for its compilation. The new corpus, which was released in February 2022, contains sample texts of roughly 2,500 words for each year from 1950 to 2021, which adds up to more than 730,000 words. The corpus contains files in raw text and with POS-annotation, and is freely available for the research community under signed consent. With CoCELD we hope to contribute with a new, useful resource for linguists with an interest in legal language, from both a synchronic and a diachronic perspective.</p> </abstract> Compiling a corpus of South Asian online Englishes: A report, some reflections and a pilot study<abstract> <title style='display:none'>Abstract</title> <p>In this research article we introduce the <bold>S</bold>outh <bold>A</bold>sian <bold>On</bold>line <bold>E</bold>nglishes (SAOnE) corpus representing four South Asian countries, i.e. Bangladesh, India, Pakistan, and Sri Lanka, and two native English-speaking countries, i.e. the UK and the USA. We have used semi-automatic and manual methods to collect data from three internet registers, i.e. newspaper comments, web forums and tweets, and a collection of internet sub-registers which we label as blogs and websites. Additionally, we have collected text messages using online freelance hiring platforms from each of the South Asian countries mentioned above. Each register category in the corpus consists of approximately 1 million words per register per country, except text messages, which contains around 500,000 words per country and only includes the four South Asian countries. We have verified the origin of website and blog links, authors of Twitter, and where possible of commenters and web forum users to make sure that only local content of each country is included. The corpus features some indigenous language content, which is tagged.</p> <p>In addition to the description of this dataset, we also present a pilot study analysing three discourse particles, namely <italic>na</italic>, <italic>neh</italic>, and <italic>yaar</italic>. The discourse particles <italic>na</italic> and <italic>yaar</italic> are native to Hindi/Urdu, while <italic>neh</italic> is based on a Sinhala negation marker. Our analysis indicates that <italic>na</italic> and <italic>neh</italic> have similarities in terms of their position in the clause/utterance. However, <italic>neh</italic> is confined to Sri Lanka while the Hindi/Urdu based discourse particles are also used in our Twitter data from Sri Lanka and Bangladesh. The use of these discourse particles in Bangladeshi tweets shows the influence of Indian culture through Bollywood celebrities. Of the Hindi/Urdu discourse particles <italic>yaar</italic> and <italic>na</italic>, <italic>yaar</italic> is preferred in Pakistan while <italic>na</italic> is preferred in India; additionally, <italic>yaar</italic> is used at the start of the clause more often in our Pakistani data. Lastly, we discuss the implications of the pilot study, the advantages of the type of data used for the pilot study, and future research directions.</p> </abstract>ARTICLEtrue McEnery and Vaclav Brezina. . Cambridge: Cambridge University Press, 2022. 313 pp. ISBN 978-1-1071-1062-5 and evaluation in contemporary American English: A corpus study based on pronominal and nominal expressions with male and female reference<abstract> <title style='display:none'>Abstract</title> <p>This study of contemporary American English examines how males and females are evaluated in terms of their personality, physical appearance, societal importance, etc. across various registers. In this study, <italic>evaluation</italic> is defined as an expression of a speaker or writer’s attitude toward, viewpoint on, or feelings about a male or female referent, which generally carries a positive or a negative meaning. The evaluative tokens analyzed in the study include noun phrases (e.g., <italic>a real jerk</italic>) and adjectival modification (e.g., <italic>congenial</italic>) co-occurring with gender-specific nominal expressions (e.g., <italic>boy</italic>, <italic>lady</italic>) or pronominal expressions (e.g., <italic>he</italic>, <italic>she</italic>). The findings imply a distinct gender patterning in the evaluation: whereas males are evaluated in terms of their skills, abilities, acuities and importance in society, females are typically assessed in terms of their looks and appearance. Males occupy considerably more evaluative space than females, particularly in the Newspaper register. The preponderance of the evaluation of males even in twenty-first-century American English is surprising, considering changes in gender role attitudes in U.S. society in recent decades.</p> </abstract>ARTICLEtrue science in urgent times: CoViD-19 and its impact on scientific writing<abstract> <title style='display:none'>Abstract</title> <p>The urgent need for new knowledge as a result of the CoViD-19 pandemic has led to a significant increase in the amount of scientific writing on the topic. Various analyses of this phenomenon from different approaches have appeared thus far (Horbach 2020; Torres-Salinas 2020). However, less attention has been paid to the impact of this situation on the language of these studies, looking into whether the continued emergency affects authors’ conscious or unconscious linguistic choices, and if so, how. This article compares texts on CoViD with texts written during the previous MERS emergency and its aftermath, trying to find if texts on CoViD present particular linguistic features reflective of this situation of urgency. Results suggest that texts on CoViD do indeed exhibit particular linguistic features, and that these point to a preference for conveying immediate knowledge and a departure from rhetorical practices common in scientific writing.</p> </abstract>ARTICLEtrue Bernaisch (ed.). . Cambridge: Cambridge University Press, 2021. xv, 235 pp. ISBN: 978-1-108-48254-7