Abstract ID: 640
Part of General Paper Session (Other abstracts in this session)
Authors: Wagner, Susanne
Submitted by: Wagner, Susanne (Chemnitz University of Technology, Germany)
The role of frequency effects in language is well-known. For any unbiased, ‘representative’ corpus of a certain size, a very small number of words will occur very frequently. Certain parts of speech (mostly function words) will usually occur at the top of frequency lists. Most of the words in any corpus, however, will only occur once or twice. This observation, known as Zipf’s Law, has played a role in many studies (cf. Bybee & Hopper 2001; Bybee 2002; Krug 1998, 2001; Kortmann 1997; Nycz 2011a,b).
However, the role of lexical constraints on language variation has generally been neglected in the past. Only recently have a number of scholars begun to show that certain (irregular) patterns of variation are due to the ‘odd’ behaviour of a small number of (often high-frequency) lexical items rather than a systematic constraint on e.g. verb semantics (e.g. Childs 2008; King 2008; Tagliamonte 2008; Torres Cacoullos & Walker 2008, 2009; Van Herk 2008; Walker 2008).
In a forthcoming study on null subjects, Erker & Guy for the first time investigate the effect of type-token ratios of certain verbs on null subject realisation in Spanish. Frequency effects are the guiding principle of their analysis. Verbs are divided into frequent forms (total token frequency at least 1% of all [coded] verb tokens in the corpus) and infrequent forms (frequency < 1%). By using log10 frequency, Erker & Guy cancompensates for the Zipfian distribution, which overall helps with visualizing certain trends in the data.
This paper copies Erker & Guy’s methodology and extends it to include the role of frequency as a speaker effect (above- and below-average users of null subjects). In a corpus of sociolinguistic interviews comprising some 8,400 tokens of first person subjects, both null and overt, tokens are divided into a binary category of frequent and infrequent verbs, modelled on Erker & Guy’s study. In a next step, five different statistical models (regression analysis performed with Rbrul) are compared: the full model without the separation of high- and low-frequency verbs or speaker separations, and models including a) only the high- and low-frequency verbs respectively and b) only above- or below-average null subject users.
The results suggest that Erker & Guy’s main finding also holds for data from English – frequency has a spectacular effect on null subject occurrence. First of all, this concerns the constraint ranking: most of the factor groups are only significant in either the high- or the low-frequency sub-model. While there is no FG-internal re-ranking of constraints, the variability between individual factors in-/decreases dramatically. Second, frequency also affects non-linguistic categories: there are major differences in the models comparing speaker effects (above- and below-average null subject users). Overall, the findings underline the need to include frequency effects in any study that is based on logarithmically distributed data.