05/12/2016 by lizziepinard

Chatting in the academy: exploring spoken English for academic purposes (Mike McCarthy)

Another addition to my collection of write-ups based on the talks recorded by IATEFL Online and stored on the website for everybody to access. What a wonderful resource! This one is by Michael McCarthy, and, as you would expect, is based on corpora and vocabulary – this time in the context of academic spoken English…

MM starts by saying it is easier to study academic English in its written form and much more challenging in its spoken forms. His main point is that there is no one single thing that we can call Spoken Academic English. His talk will draw on information from corpora and show how it can be used in materials. He is going to use a corpus of lectures, seminars, supervisions and tutorials from the humanities and the sciences, the ACAD, and a sub-corpus the Michigan Corpus of Academic Spoken English, MICASE. He is also going to be using from the CANCODE corpus the sub-corpus of social and intimate conversations. This is the data that MM used.

Corpora are widely known and accepted in our profession, so MM didn’t need to introduce what they are and why we use them. He looked at a frequency list of words, the simplest job you can do with a corpus. You can also do keyword lists, which tells you more than just if something is frequent: it tells you whether it is significantly statistically frequent or the opposite, significantly statistically infrequent. We can also look at chunks and clusters, the way words occur together repeatedly. We rarely go beyond 5 or 6 words, due to the architecture of the human mind. Chunks are most common in the 2-4 word chunk-size. Dispersion is another thing to consider, in terms of the consistency of words being used, to know whether a particular text or genre is skewing the data.

In the spoken ACAD, in the top 50 frequency list, there are lots of the usual conversation markers, lots of informality, lots of you, I, yeah, er, erm. There are lot of familiar discourse markers, such as right and ok, and response tokens, i.e. the words or sounds used to react. The most frequent two word pragmatic marker in ordinary social conversation is you know – 66% of the occurrences of the word ‘know’ are in the form ‘you know/y’know and the picture looks the same in the academic corpus. This, however, is not the whole picture. We have something like everyday conversation but when we go into the keyword analysis, things become a bit more interesting. The top 20 keywords in spoken academic data are:

screen-shot-2016-12-05-at-10-16-14

Now, a lot of familiar conversational items are present but also some that if your friends used them with you in everyday conversation over a cuppa, you’d lose the will to live. So there are words here that don’t have the informal conversation ring. Not least the preposition within which is right up there in the top 20. We will come back to which, terms and sense later.

Keywords tell us more than just what is frequent – they enable us to have a greater, more nuanced picture of how words are functioning in the data. We can find some interesting differences between conversational and academic spoken English: If we do a straight frequency count, the discourse marker “ok” comes out higher in a keyword list than the frequency list, in the academic spoken English corpus.

screen-shot-2016-12-05-at-10-22-17

MM found that 95% of the “ok” in the data are either response tokens e.g. that’s ok, well, ok or discourse markers signalling phases e.g. “ok let’s go on to look at <something else>“. They are used overwhelmingly by the lecturers or tutors. MM had a PhD student with an annoying habit: after exchanging pleasantries, the student would say “ok, now I want to talk about…” and then once they had, he would go “right, ok..” – MM thought it should be HIM saying those phrases. Students very rarely respond with “right, ok“. So in academic speaking, we are looking at a different set of discourse roles than in conversational English, that is what the corpus is showing us. The roles are directly related to the language. Some items that are present in the frequency list disappear in the key word list, i.e. fall too far down the list for MM to be prepared to go through and find them e.g. well, mm, er, you. This negative result says that these words do not distinguish academic speaking from any other kind of speaking. However, some of the language is particular to the roles and contexts of the academic set-up.

MM says it takes a long time and a lot of hard work to actually interpret what the computer is trying to tell you. It is dispassionate: no goals, prejudices, aims or lesson plans. It just offers bits of statistical evidence.

screen-shot-2016-12-05-at-10-28-31

What struck MM when he looked at this list is that on the conversational side, at least 3 and possibly more of these are remarkably vague. It surprises people that there is a great degree of vague category markers that come up to the top of this spoken academic discourse, but it shouldn’t because the student is being nurtured into a community of practice and in any community with shared values/perspectives/opinions, you don’t need to specify them. You can simply say x, y, etc or x, y and things like that. This presence of the vague category markers is crucial – not only do you have to hear and understand them but you have to be able to decode their scope, and know what the lecturer means when they say them. Vague category marking is something that is shared with everyday conversation but the scope is within academic fields.

At no. 18, “in terms of the” – not surprising because in academia we are always defining things in terms of something else, locating pieces of knowledge within other existing/known knowledge – the discipline as a whole or a particular aspect of it. It is much more widely spread in academic spoken discourse compared with conversational:

screen-shot-2016-12-05-at-10-47-54

MM goes on to look at the consistency, or spread of items across data – looking for things that occur in a great number of texts. In social conversational data, the dispersion of I and you is consistently high. The picture in academic spoken English is different:

screen-shot-2016-12-05-at-10-41-07

The pronouns are reversed – you is more frequent than I. This brings us back to the fundamental business of roles: most of academic discourse is about telling “you” how to do things and become part of the community of practice. Thinking back to the chunks “you can see” etc. A transmissive you. However, we do notice there is quite a bit of I in the academic spoken, it’s not remote. I is generally used by lecturers and tutors. But if we look across events, there is great variation. Even within two science lectures, in one there is a personal anecdote, so more use of “I” (more, even, than the informal guest speaker), in another not:

screen-shot-2016-12-05-at-10-46-18

Here is a summary of the tendencies MM has covered:

screen-shot-2016-12-05-at-10-47-18

The remainder of the talk is an advertisement for Cambridge University Press’s Academic Vocabulary in Use book, which draws on what is learnt from the data, trying to capture the mix of chatty conversational items and items that are very peculiar to the academic discourse. The best grip on spoken academic discourse is through understanding the discourse roles of the tutors and the students, which influence how they speak – i.e. differently. They will use certain keywords and chunks, but the labels (e.g. lecture, seminar, supervision) used for speech events are a very imperfect guide of what will be included there.

This was a fascinating talk, one I’m glad I’ve finally caught up on! I always find it interesting to see how corpora are used and what is discovered in the process. Nice to see the “in terms of” chunk in there – it reminds me of my first year at the Sheffield University International Summer School, where during the induction Jenifer Spence – author EAP Essentials and leader of the theory side of our induction programme that year – spent a fair bit of time hammering the importance of “in terms of” into us: we were always to be asking, and encouraging students to ask/consider, “in terms of what?” in relation to whatever it was that they were writing or saying! I had never considered how odd it would sound in an informal chat though, as per MM’s example “How was your holiday?” “Well, in terms of the accommodation…” – not really! Unless you felt like being particularly pompous, I suppose…

14/04/2015 by lizziepinard

IATEFL 2015 Bringing corpus research into the language classroom – Jane Templeton

Corpus time! This talk is by my M.A. colleague of yore, Jane Templeton, also known as corpus guru! 🙂

We start with a small thought experiment:

A class of students don;t know how to use a dictionary. They are reading. One of them asks you the meaning of a word. There is a dictionay next to him. What do you?

a. Tell him to look it up
b. look it up yourself and tell him
c. show him how to look it up

The answer was C, which led us to the following questions:

Why is C the best from the point of view from the student, compared to the others.
Would you expect him to be instantly proficient in dictionary use?
What would you advise if he couldn’t find a word in the dictionary?

Jane explained that her talk is based on some assumptions from Timmis (2015):

Corpus research is potentially useful for learners. It contains information about frequency and behaviour and frequent language is often useful.

However, the potential is not being exploited fully. We will look at ways of doing this, overcoming some limitations:

Jane wasn’t quite sure what to with CR research or techniques to use with students to start with. The two main objections she encountered at work was that 1) it’s too difficult for the students and 2) data driven learning doesn’t work.

She set out to disprove this. In actual fact the opposite happened… CR is difficult for students. Research requires technical expertise and knowledge, time, that most teachers don’t have, never mind students. But this isn’t the kind of corpus research we need students to do.

Data driven learning should work (see Timmis, 2015 again) – it enables more authentic language use, rich input, inductive learning and promotes and practices the skill of noticing, which is very important. But in 2009, it hadn’t been shown it’s more effective as a language presentation method than traditional methods. It was shown to be effective as a reference tool.

For students to learn, for learning to take place, students need to be engaged – either by the language (Relevant to them, they want to use it, need to use it) or by the task, if it’s a task they might replicate outside the classroom, that they can engage with.

So why didn’t DDR work for Jane? The teacher selects the language so it might not be relevant, T researches it, filters results, creates questions and practice activities. The task might not be engaging. Concordance lines do not naturally occur (except in texts about concordances and corpus!) so concordance line tasks are not authentic for them. So it depends on if that particular student at that particular time likes that kind of activity: some do, some don’t.

Even if we can move more students into the green zone, some of them will always get left in the negative zone. It also is very time-consuming for the teacher. So all in all tends to fall to the way-side.

Show vs Tell

Jane talked about the importance, generally, of showing rather than telling students information.

Jane then showed us how she used www.wordandphrase.info to solve a problem she met in class – finding collocates to use with weakness, opportunity and threat of the SWOT analysis to show obtaining benefit. Type in the word, click on search, click on the word when it appears in the box. Choose from the list of verbs.

E.g. overcome weakness; combat threats; counter threats; etc.

This was the first time she used this site as a reference tool with students. The next time it came up was with a different group of students, with whom she was mind-mapping globalisation. They needed verb collocates with threats.

Back to wordandphrase.info to discover…

One of the threats POSED BY globalisation TO local businesses.

How is this useful?

You/the students can use it to answer questions such as these:

What verb can I use with noun to express meaning?
Is the noun the subject or the object of the verb?
Is it the direct object?
If so, is the verb used in the passive?
Is the noun the indirect object?
If so, what preposition is used between the noun and verb?

…and so on.

Benefits:

Quick, easy, no preparation required (just used in the classroom in response to queries), authentic task (they can use it outside the classroom) and it’s relevant as it’s based on language that comes up in class.

These are the kinds of errors you can address:

Errors relating to structure, collocation, formality/register etc.

If you are interested, email Jane and she will send you a link to the wiki she is launching for students to help them use wordandphrase.info independently. (I will link to it from this post once I have the link!)

Jane also showed us “AntConc” where you can do a frequency search and look for content words. You can also discover collocations around key content words. You can use it to check errors. You can compare your own text as a student to an authentic text and look at differences in the way language is used. This can be stylistic e.g. Bangladesh used 4 times in an authentic text vs 25 in a student text.

The aim is to help student be able to do this themselves in the future, in their academic writing.

Advice:

Jane left us some advice to bear in mind as we set off to try these tools with students:

Try it!
Don’t be scared.
Try the activities on the wiki that Jane has made (for access/the link email her at the address provided below), think about how you could use it with your students.
Don’t worry if things don’t work, it happens.
Don’t feel you have to know everything. It’s ok. You and your students can learn together.
Give students scaffolding.
Enjoy it!

(And remember the tools are just as useful for us teachers as for students…!)

To find out more/get the link to the wiki, contact Jane on j.templeton@leeds.ac.uk

A very useful, interesting talk and I look forward to seeing the wiki in the near future when it is launched! 🙂

References:

Timmis, I. (2015) Corpus Linguistics for ELT: Research and Practice (Routledge Corpus Linguistic Guides) Routledge.

04/04/2014 by lizziepinard

Plenary: Michael Hoey – The implications of a corpus linguistic theory for learning the English language (and the Chinese language too)

According to the introductory speaker, Michael’s talk draws its data from corpus linguistics…what a shock… 🙂

What Michael wants to do today is look at two old approaches to language teaching and learning, and bring a new perspective to bear on them. Both approaches have had a great number of adherents and critics, both are very much alive.

The Lexical Approach

He starts by showing us a slide of key works within Lewis’s Lexical Approach, which is now just over 20 years olds, saying there is still work going on but it is now an old approach though very much still alive. According to Lewis, the successful language learner is someone who can recognise, understand and produce lexical chunks. Learning the grammar and slotting other words in doesn’t work in a world where language doesn’t work like that. Rather than learning vocabulary in lists and focusing on grammatical structures, focusing on the actual words that can be used to communicate. When someone learns vocab in context, they pick up grammar naturally, but it doesn’t work vice versa – you don’t pick up much useful vocab when learning grammar separately.

He cites his language learning with regards to Chinese, where he has learnt to talk about his father’s tea-drinking habits while his father is actually dead, but can’t ask for a beer, is not useful – a learning grammar and then pieces of vocabulary. Whereas if you learn vocab relevant to your needs, the grammar comes along.

The lexical approach has been criticised for ignoring how language is learnt, that there is no theoretical underpinning and that it trivialises the role of grammar. There is also the question of whether it is limited to the Indo-European languages.

Stephen Krashen’s Monitor Model

Another old approach but very much alive. Also known as the input hypothesis. Michael thinks it should be called the input theory but will refer to it as the Monitor Model. It is now 30 years old. He showed us some key works related to this as well. According to this model, comprehensible input is the key element needed for language learning to take place. It needs to be slightly above learners’ level and is a subconscious process. Michael illustrates this with his own experience of Chinese learning.

It has been criticised for ignoring how language is learnt, having no linguistic underpinning and trivialising grammar and the role of the teacher.

Michael’s 3 arguments for today:

These two approaches are entirely compatible with psycho-linguistic evidence.
Both of them are supported by at least one carefully worked out linguistic theory – his! (Puts them on shaky ground?! he wonders.. 🙂 )
The characters that both approaches assign to the language learning process are equally true of non-Indo European language.

How do we learn language?

Michael is interested in the psychologists doing language work rather than the linguistics pretending to do psychology work. Psycholinguists identified two things: semantic priming and repetition priming

Semantic priming – informants shown an image or word (the prime) and then shown a second word/image (known as the target word). Speed of recognition is measured. Some primes slow the recognition of target word, while others speed it up.

E.g. “wing” followed by “director” – “wing” won’t alter the speed. If it is followed by “dog”, it will infinitesimally slow down. But if swan, then it will be speeded up slightly. It’s about linguistic knowledge rather than world knowledge.

It is old and uncontroversial work. What does it mean for language learning?

It gives us proof that words are closely linked in a listener’s mind. Words that are closely linked can be recognised more quickly together. So it encourages fluency. It doesn’t fit in at all with grammatical frames and words that slot in. This notion (the grammatical frames one) is not supported. It does, though, fit in very well with the lexical approach, which fully supports it.

Repetition priming

E.g. a listener hears the phrase scarlet followed by onion, then a few days later hears the word scarlet, it will speed up the process of recognising onion. So your brain remembers the co-occurrence and that speeds the recognition up.

Michael shows us some works related to this and says it, too, is uncontroversial.

Repetition priming explains collocation. If a listener/reader encounters words in combination, then they are stored as such and recognition is speeded up. When we encounter words in combination, we link them in our minds without there being any conscious learning. Doesn’t fit in well with the grammatical framework notion but it does fit in with Krashen’s acquisition vs. learning.

So that takes care of number 1 for todays talk.

Now for number 2 – are these supported by another theory?

<slide>

Any account for collocation has got to be central to how language functions. The lexical priming theory is a psycholinguistic theory based on corpus linguistic evidence. It claims that whenever we encounter a word, we subconsciously note the words it occurs with – the collocations.

<He shows use lots of combinations with “hard”> All of these are part of collocation.

Once a priming is created, it is subject to further priming. “ears” collocates with “eyes”, which as a group becomes something that will be primed to collocate with “act as” as in “act as someone’s eyes and ears” So collocations can collocate with something else.

Michael quotes Hill regards to the density of unknown collocations that are the difficulty for learners. But the lexical priming theory goes beyond collocation. It is also about semantic association.

E.g. ears collocates with eyes, but also co-occurs with other parts of the body in a semantic set. All of this is part of that semantic association – “ears” likes to collocate with parts of the body. 23% of all instances of “ears” in Hoey’s corpus collocate with part of the body.

“consequences” tend to be negative – grim, bleak

“results” tend to be positive – great

We also note the grammatical patterns that a word or combination of words tend to take – the colligation.

“consequence” tends to be indefinite e.g. “a consequence of”; “result” tends to be definite e.g. “the result of..”

We are primed for this by recurrent occurrences of these combinations.

We also subconsciously note textual features. This goes beyond the oft-talked about that we have considered so far. We are primed for other things when we encounter language. We recognise whether a word is typically cohesive or not. We know, when reading or listening, whether this word or phrase is going to turn about again. So we know whether we need to hold on to it or can forget it.

Michael shows us an extract from The Guardian and picks out that president occurs three times in the course of three sentences. Not an accident. In Michael’s study, of 66 independent occurrences of “president”, 76% contributed to the cohesion of the text. So we are primed to expect that. We also notice how it is likely to be cohesive e.g. by simple repetition, or pronouns or with a name (co-reference), in the case of “president”

Back to the text – “frankly”, however, was only responsible for 5 instances of cohesion out of 50, and some of them were a stretch. So we are primed to expect avoidance of cohesion, with “frankly”.

Just reading a text, you get lots of experiences of those words that are repeated within the text.

We also are primed for semantic relationships within a text. Every lexical item – word, phrase – may be positively or negatively primed to associate with various items. Texts prime our vocabulary for us and our vocabulary primes us with regard to the organisation of the text. We are also primed to notice position e.g. is it typically used at the beginning of a sentence or the end of a sentence. E.g. “it was announced yesterday” is typically found at the end of the first sentence in a newspaper article. So very precise positioning.

Michael tells us a lot about what we know about “According to” in relation to newspaper texts in terms of collocation, colligation, pragmatic association, semantic association etc. Very interesting.

This all wholly supports Michael Lewis’s view of the centrality of lexis. Everything he was saying in the Lexical Approach book is backed up by psycholinguistic and corpus linguistic research. It must be true. If you want to argue it’s not, you have to argue against ALL the sets of research. All the textual features are central to Stephen Krashen’s claim. You couldn’t expect to teach/learn consciously all the textual properties of every word in the text. But it is the case that the fact that we reproduce these things when we write/speak means we learn from frequent encounters. So he is right that we need to be exposed to naturally occuring data.

Like me with Italian, Michael can’t speak but can read Chinese! The textual features of lexis can be acquired.

Time for number 3 – does the Lexical Approach apply to Chinese? i.e. not only applicable to Indo-European languages

Michael contrasts English and Chinese and then looks at the Lexical Priming claims in terms of Chinese. He shows us collocations of “hao” in Chinese. And then points out that “houhui” is associated with negative colligation. Then shows us that houhui has a semantic association with unhappy action taken and pragmatic association of making a suggestion (a third of instances in the corpus), but only the negative forms.

In terms of significance, sensible to build on the common ground between languages rather than the differences.

Michael has shown us how both Krashen’s and Lewis’s theories have been falsely criticised, and are in fact safe to use. The two researchers have come up with very compatible positions.

hoeymp@liv.ac.uk

Lizzie Pinard

Reflections of an English Language Teacher

corpus linguistics