100% tevredenheidsgarantie Direct beschikbaar na betaling Zowel online als in PDF Je zit nergens aan vast
logo-home
A Test of Coding Procedures for Lexical Data with Tup´ı-Guaran´ı and Chapacuran Languages €13,82   In winkelwagen

Tentamen (uitwerkingen)

A Test of Coding Procedures for Lexical Data with Tup´ı-Guaran´ı and Chapacuran Languages

 5 keer bekeken  0 keer verkocht
  • Vak
  • Coding Procedures for Lexical Data
  • Instelling
  • Coding Procedures For Lexical Data

South America, long considered the ethnographically and linguistically “least known continent” [1], has in recent decades experienced a surge of descriptive and documentary linguistic research [2], [3]. The classification of the languages of this region, and especially those of Amazonia, ha...

[Meer zien]

Voorbeeld 1 van de 4  pagina's

  • 4 augustus 2024
  • 4
  • 2024/2025
  • Tentamen (uitwerkingen)
  • Vragen en antwoorden
  • Coding Procedures for Lexical Data
  • Coding Procedures for Lexical Data
avatar-seller
A Test of Coding Procedures for Lexical Data with
Tupı́-Guaranı́ and Chapacuran Languages
Natalia Chousou-Polydouri,∗ Joshua Birchall,† Sérgio Meira,† Zachary O’Hagan,‡ and Lev Michael‡
∗ Laboratoire Dynamique du Langage
† Museu Paraense Emı́lio Goeldi
‡ University of California, Berkeley



Abstract—Recent phylogenetic studies in historical linguistics the same time, there has been little work to evaluate if and how
have focused on lexical data. However, the way that such data different coding methods affect resulting classifications, with
are coded into characters for phylogenetic analysis has been two exceptions: a parsimony-based empirical test on Indo-
approached in different ways, without investigating how coding
methods may affect the results. In this paper, we compare European by Rexová and colleagues [10] and an analytical
three different coding methods for lexical data (multistate investigation of Pagel and Meade based on a maximum
meaning-based characters, binary root-meaning characters, and likelihood framework [11]. While Rexová and colleagues find
binary cognate characters) in a Bayesian framework, using topological differences when using different coding methods,
data from the Tupı́-Guaranı́ and Chapacuran language families Pagel and Meade predict no impact on topology, although
as case studies. We show that, contrary to prior expectations,
different coding methods can have a significant impact on the differences in branch lengths and support values are expected.
topology of the resulting trees. In this paper, we briefly describe and discuss three major
lexical coding methods and we compare their results in a
Keywords—Bayesian phylogenetic inference, cognate coding, Bayesian Inference framework, using data from the Tupı́-
historical linguistics, South American indigenous languages Guaranı́ and Chapacuran language families as case studies.
I. I NTRODUCTION II. DATA
South America, long considered the ethnographically and We test the different coding methods on lexical datasets
linguistically “least known continent” [1], has in recent for two South American language families: a Tupı́-Guaranı́
decades experienced a surge of descriptive and documentary dataset of 33 languages for a 547-meaning wordlist [7], and
linguistic research [2], [3]. The classification of the languages a Chapacuran dataset of 11 languages for a 126-meaning
of this region, and especially those of Amazonia, has, in wordlist [8]. Each dataset includes data for every language
contrast, advanced little in the last 50 years [4], [5]. However, for which adequate lexical data is available.
the increasing availability of lexical data on South American
languages, as well as recent successes in applying computa- III. M ETHODS
tional phylogenetic techniques to data of this type, offers us the
opportunity to push forward our understanding of genealogical A. Coding procedures
relationships in the region with new datasets and tools [6]–[8]. We compare three coding procedures based on different
While it is accepted that lexical data from natural languages types of characters: 1) multistate meaning-based characters;
carry phylogenetic signal, the study of lexical evolution per se 2) binary root-meaning characters; and 3) binary cognate
has largely been neglected by historical linguistics (with the characters. The two first coding methods are based on a
exception of lexicostatistics), as the evolution of other domains comparative lexical dataset collected using a wordlist, while
of language, such as phonology and morphology, are consid- the third necessitates the broader collection of lexical data
ered more informative for subgrouping and less susceptible including close synonyms.
to borrowing. In contrast, computational phylogenetic studies A typical comparative lexical dataset based on a wordlist
in recent years have focused primarily on lexical evolution, yields inherently multistate characters. Each meaning of the
due to the ease with which relatively short wordlists can be wordlist is a character. All languages that exhibit cognate
analyzed with a variety of established phylogenetic methods. forms for a given meaning are given the same character state
A critical aspect of these methods, and a way in which value. In other words, each character is equivalent to the
they differ, is the manner in which phylogenetic characters question “For meaning X, what root (or roots) express X?”
are generated from lexical data. The differing nature of these and the coding method essentially tracks lexical replacement.
characters ultimately reflects different understandings of the We refer to this scheme as ‘multistate meaning-based’ coding.
phylogenetic notion of homology [9] in the context of lexical Surprisingly, this coding method has been very rarely used
evolution. However, there has been little discussion of the im- [10]. Among its advantages is the ease of data collection and
plications of different coding methods and what the underlying its applicability in instances of little available lexical data. One
assumptions of each are regarding how the lexicon evolves. At potential problem of multistate meaning-based coding is that it

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper Ariikelsey. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €13,82. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 76449 samenvattingen verkocht

Opgericht in 2010, al 14 jaar dé plek om samenvattingen te kopen

Start met verkopen
€13,82
  • (0)
  Kopen