Programme: accepted abstracts

Abstract ID: 887

Part of Session 177: Field methods in multicultural megacities (Other abstracts in this session)

Sampling São Paulo Portuguese

Authors: Mendes, Ronald Beline; Oushiro, Livia
Submitted by: Oushiro, Livia (Universidade de São Paulo, Brazil)

The city of São Paulo, home to some 11 million people, is the largest city in South America and the seventh largest in the world. Its population is highly diverse in terms of their geographical origin, socioeconomic class, and cultural background. Yet to date very few works describe the linguistic correlates of such sociodemographic complexity.

This paper reports on the objectives, challenges, and methods of Project SP2010 (Mendes, 2011), currently under execution by the Grupo de Estudos e Pesquisa em Sociolinguística (Sociolinguistic Studies and Research Group) at the University of São Paulo. Its primary goal is to collect a representative sample of Paulistano speech and make both recordings and transcriptions available online to foster the development of sociolinguistic research on this variety of Portuguese.

The first question to be addressed is how to build a manageable sample that is representative of São Paulo’s complex sociodemographic makeup. In the current first two-year stage, the sample is stratified according to speakers’ sex/gender, three age groups, and two levels of education, resulting in 12 social profiles, each of which is filled by five speakers, resulting in 60 recordings. In order to cover the wide geographical range of the city, each interviewee in a cell resides in a different zone (north, south, east, west, center). In a second stage, we will extend the corpus to encompass three other social parameters that seem to be relevant for sociolinguistic stratification in São Paulo: social class, number of generations the family has been resident in the city, and the regional background of the immigrant.

A second question is how to contact and schedule interviews with such a varied spectrum of the population, particularly in a busy urban environment where informants may be reluctant to agree to an hour-long interview. To address this possibility, we opt for the “friend of a friend” method (Milroy 2004). Although it implies that the sample is not completely random, this method has proven effective in facilitating contact with new informants as well as yielding more natural speech data.

Finally, a third question is how to make the sample available to facilitate automatized data handling in softwares such as R (Baayen, 2008; Hornik, 2011). Recordings will be transcribed in ELAN (Hellwig et al., 2011), following simple but strict norms in order to enable, for instance, automatic token identification and extraction into a spreadsheet file (Oushiro, 2012).

References:

Baayen, H. (2008) Analysing linguistic data: a practical introduction to statistics. Cambridge: Cambridge University Press.

Hellwig, B., M. Tacchetti & A. Somasundaram (2011) ELAN – Linguistic Annotator. Version 4.1.0. Available at http://www.lat-mpi.eu/tools/elan/manual/.

Hornik, K. (2011) R FAQ. Available at: .

Mendes, R. (2011) SP2010 – Building a sample of Paulistano speech. FAPESP Research Project (Grant number 2011/09278-6).

Milroy, L. (2004) Social networks. In: Chambers, J.K., P. Trudgill & N. Schilling-Estes (eds.) The Handbook of Language Variation and Change. Oxford: Blackwell.

Oushiro, L. (2012) Analyzing (-r) with R. Paper presented at the GSCP 2012 International Conference.

Sociolinguistics Symposium 19

Programme: accepted abstracts

Sampling São Paulo Portuguese