This comparative vocabulary of Uto-Aztecan languages is a work in progress, not a finished product.  The size of Uto-Aztecan (UA) and the regular emergence of new materials guarantee that any comprehensive comparative effort is but a new horizon for viewing the next.  Yet many a linguist’s life work finds its final resting place in files or landfill due to (1) lack of time to finish it, despite the potential value to future researchers; (2) uncertainty about certain details, perhaps 3%, though the other 97% would have benefited all else studying the matter; (3) and/or not relishing the prospect that condemnations of the 3% may seem louder than commendations of the 97%. So let the latest from three decades of doing UA be made available lest it be lost to landfill should I exit without warning.  Publishing, despite its pretense of completion, is as often only the latest draft of endless endeavor. The original hope of finishing such an undertaking before one’s own undertaking gives way to time’s reminder that no one gets everything right the first time, or even the last time in mortal exertions the magnitude of a language family, and our assumptions about when the last time might be are regularly erroneous, as we hardly get glimpses of our hourglasses.  The tragic unpredictable passing of our mentor Wick Miller in May 1994 is an example.

Wick Miller was an example in several ways: he was open, cordial, and encouraging.  He was not critical—perhaps a tad animated at times, but generally friendly as a team-player in our cooperative progress in knowledge of UA.  As founder of the Friends of Uto-Aztecan, he was a friend of UA and of all Uto-Aztecanists.  Wick’s 1988 computerized database of potential cognate sets exemplifies his openness.  He knew it was a compilation of rough draft brainstorming in need of sorting, revision, etcetera, but he shared it openly—opening himself to an egoless vulnerability for the sake of progress, because he was more interested in our progress in knowledge than in his being right all the time.  In that spirit is this work offered.  Errors, loose ends, and uncertainties are certain, but some UA matters may remain unresolved even if one could spend three lifetimes on them, for many more than that have already been devoted to UA.

In the reconstructions I do not deal with vowel length, only vowel quality and consonants.  Figuring out PUA vowel length may fill another lifetime, but not mine.  Reduced consonant clusters and compensatory lengthening may underlie some long vowels in UA (CVCCV > CVVCV) and thus cast doubt on vowel length until the medial clusters are clarified.  That and changing stress patterns, causing vowel lengthening with stress, or shortening or syncope without stress, in the various branches and languages through the layers of time, make the puzzle of PUA vowel-length quite unappealing to me, if not presently impractical.  This work also continues Miller’s (1967, 1988) tradition of including sets found in only one branch.  Rejecters (page 3) of Northern-Uto-Aztecan (NUA) and others of Southern Uto-Aztecan (SUA) make two-branch sets possibly from PUA, and one-branch sets are worth listing, since a reflex from another branch often soon appears, though they can hardly be considered from PUA until such support surfaces.  A few loans are listed that entered UA early enough to be found in multiple branches; as Miller (1988, 1) notes, “loans are of as much historical interest as inherited forms.”

While others may have suggested it was so, Edward Sapir (1913-1914) was the first to sufficiently apply the comparative method to establish Uto-Aztecan as a viable language family, after Buschmann, Kroeber, Whorf, and others helped lay the foundations for Uto-Aztecan studies, by identifying the three previously accepted branches—Shoshonean (NUA), Sonoran, and Aztecan.  A five-letter surname that looms as large as Sapir’s in UA contribution needs no further abbreviation, so sets from Sapir’s founding work (1913-14) are cited as Sapir.  A half century later, Voegelin, Voegelin, and Hale (1962) produced 170 cognate sets that further established the sound correspondences and phonology of UA.  Not long afterwards, Wick Miller (1967) published Uto-Aztecan Cognate Sets, containing 514 cognate sets.   That was the last published work attempting to deal with all the cognate sets of UA.  Miller continued working in UA and his last update (1988) of some 1185 potential cognate sets is herein abbreviated M88.  Kenneth Hill (2006) has done much good work in sorting and revising M88, combining some sets, redistributing others, adding new reflexes to existing sets, and adding cognate sets of his own discovery, totaling more than 1200 sets.  Hill’s revision of M88 is herein abbreviated KH/M06.  Ronald Langacker (1976b, 1977a) and Jason Haugen (2008) have authored excellent books dealing with UA grammar.  Alexis Manaster Ramer (AMR) has also been a prolific contributor to UA studies by means of more articles than are easily retrievable.  His and the works of Dakin, Campbell, Canger, Casad, Estrada Fernandez, Fowler, Heathe, Jane Hill, Langacker, Lionnet, Munro, Shaul, Seiler, Steele, the Voegelins, José Luis Zamarron, and many others—works both published and unpublished, like Kaufman’s 1981 draft manuscript Comparative Uto-Aztecan Phonology—constitute a corpus somewhat daunting for mere mortals to master.  Besides the usual cognate collections, Kenneth Hill’s Serrano Dictionary (in progress) includes many comparative notes on other Takic languages, Tübatülabal, Hopi, and occasionally Numic languages (i.e., most of NUA), such that, for sets with a Serrano reflex, it is another valuable comparative resource for NUA, here cited as KH.NUA.

Branch cognate collections are abbreviated as the initial(s) of author surname(s) dot branch.  The cognate collections with abbreviations are here listed in chronological order:

Many students of Uto-Aztecan have suggested a primary split between Northern Uto-Aztecan and Southern Uto-Aztecan (Heath 1977:27; Heath 1978:222; Langacker 1977:5; Langacker 1978:197, 269; Fowler 1983:234, Cortina-Borja and Valiñas 1989), though a few reject Northern Uto-Aztecan (NUA) and Manaster Ramer (p.c.) rejects Southern Uto-Aztecan (SUA).  Jane Hill (2001, 2009) also discusses evidence for NUA vs. a lack of such for SUA.  NUA does exhibit phonological innovations, such as *-L- > n, *-c- > -y- (Manaster Ramer 1992b) and morphological innovations (Heath 1977, 1978), while SUA may exhibit a slightly closer lexical unity. (See discussions in Miller 1983, Goddard 1996, Cortina-Borja and Valiñas 1989.)  In any case, until a comprehensive morphological study sheds additional light on the matter, opposing the objectors of either half of UA may be premature.  Nevertheless, NUA has traditionally consisted of Numic, Takic, and two single-language branches: Tübatülabal and Hopi.  For SUA let us consider Tepiman, Taracahitan, Corachol, and Aztecan.

Numic (Num) is the largest branch of NUA in area and number of languages.  Numic is further divided into three subbranches: Western Numic spread from Southern California northward roughly straddling the California-Nevada border; Central Numic northeastward from California through southern and eastern Nevada, northwestern Utah, into Idaho, Wyoming, and onto the plains; Southern Numic spread eastward following the Colorado and San Juan River systems through northern Arizona, southern and eastern Utah, and into most of mountainous western Colorado.  Western Numic includes Mono (Mn) and Northern Paiute (NP).  To Central Numic belong Tumpisha Shoshoni (TSh), Shoshoni (Sh), and Comanche (Cm). Southern Numic includes Kawaiisu (Kw), Chemehuevi (Ch), Southern Paiute (SP), Uintah Ute or Northern Ute (NU), White Mesa Ute (WMU), and Colorado Ute (CU).  The term Colorado Ute is used here instead of Southern Ute, because the term Southern Ute implies a unity of non-northern dialects that may be misleading, since White Mesa Ute retains many lexical and phonological features more closely tied to Southern Paiute and other languages further west, features not in Colorado Ute.  As seen in the tabulations above, the Num languages show a high correlation within each branch of Num (76-88), and lesser correlations between the Num languages of different branches (49-62).  Lamb (1958) and others have explained the Num languages’ spread from the NUA homeland in Southern California out into the Great Basin.  The data show the inner-most language of each branch to be more closely related to the outer-most languages of the same branch than to the closer neighboring Num languages of different branches.  This pattern shows considerable diversity in Southern California between languages of differing branches only a few miles away vs. closer ties to those of the same branch perhaps 1,000 miles away.  For example, TSh in Southern California is linguistically much closer to Sh (87) in Wyoming and Cm (79) on the plains, all three of Central Numic (CNum), than TSh is to either nearby Mn (59), of Western Numic (WNum) and also in Southern California, or to nearby Kw (54), of Southern Numic (SNum) and also in Southern California.  This greater diversity in the geographically limited Numic/NUA homeland speaks convincingly for a three-way Numic split in Southern California before spreading north, northeast, and eastward into the Great Basin.

Takic (Tak) has traditionally included the UA languages of Southern California, less Tübatülabal (Tb) and Numic languages.  Within Tak is a tighter Cupan group—Luiseño (Ls), Cahuilla (Ca), and Cupeño (Cp)—though the numbers above show Sr as close to Ca as Ls is to Ca.  Serrano (Sr), Gabrielino (Gb), Kitanemuk (Ktn) and other now extinct languages together with Cupan constitute the Tak branch.  Tak shows a much greater diversity than Numic.  The numbers between the Tak pairs range from 35 to 50 (except for Ca-Cp 65) vs. Numic’s numbers (49-88).  Matters relating to that diversity have periodically caused the validity or unity of the Tak branch to be questioned.  Californian (Alexis Manaster Ramer 1992a; Kenneth Hill 1998) has been a contemplated union of Tb with Tak.  Numbers as low as 34 between Gb and Cp, and 35 between Sr and Ls approximate several other 34′s between Tak and non-Takic languages (Wr, Tr, Eu, Tb, Wc).  Those inter-Tak numbers are no larger than the 35 through 40 that Tb shares with four Tak languages (Gb, Sr, Ca, Cp).  Thus, the union of Tb and Tak into a Californian branch of NUA is viable and reasonable in view of the above data, and the question of the previous Tak unity remains a valid one.   Nevertheless, the author finds value in the Tak tradition, in that Tb’s separation from Tak finds some support (see discussion under Tb), though hardly overwhelming.  Kenneth Hill (2010, 1) also notes Tb’s lack of initial ŋ and allowing ŋ only after vowels to be like the Numic languages and unlike the Tak languages’ initial ŋ, and sees Tb’s lenited absolutive suffix’s (*-t > -l) similarity to the Cupan languages as likely coincidental.

Tübatulabal’s (Tb) numbers with Num range from 35 to 42, with Tak they range from 34 to 40, and the Tb-Hp number is 38.  The differences are so slight and the ranges so overlapping that Tb appears to be about equidistant to all other branches of NUA; thus, Tb seems to hold an especially central place in NUA.  In fact, while Tb may be equidistant to the other NUA branches, considering matters from the other directions, we see that Num is closer to Tb (35-42) than Num is to Tak (21-31) or to Hp (22-33), and that Hp is closer to Tb (38) than Hp to Tak (26-31) or Hp to Num (22-33).  Furthermore, Cortina-Borja and Valiñas (1989, 235) see Tb to be slightly more closely associated with Hp and Num than with Tak.  Thus, it may be useful to retain Tb as a NUA branch for now.  Nevertheless, Tb and Hp both hold quite central positions, not only in NUA, but in UA generally: the Tb and Hp numbers with SUA branches are higher than other NUA languages with SUA languages, though Ca and Sr are not far off.

Hopi (Hp) is a puebloan language presently spoken in northern Arizona.  The Hopi hold a unique position in UA—unique as a single-language branch of NUA and unique as the only known UA tribe to participate in the Ancient Pueblo (Anasazi) tradition, along with three other language families (Kiowa-Tanoan, Keresan, and Zuni).  Some measures put Hp closer to Tak (Cortina-Borja and Valiñas 1989, 228), while the numbers above show the closest Hp correlate to be Tb (38).  Interestingly, however, Hp’s next highest numbers are shared with Yq (36), Eu (35), LP (35), and My (34), all of SUA, after which several low 30’s (30-33) are shared with some Tak and Numic languages, but also with some other Tepiman and Taracahitan languages.  This fairly equal distancing with so many SUA and NUA languages further confirms Hp’s unique place in UA.

Southern Uto-Aztecan (SUA) consists of Tepiman (Tep), Taracahitan (TrC), Corachol (CrC), and Aztecan (Azt), mostly from Arizona to Mexico City.  Miller (1984) includes Tep, TrC, and CrC in Sonoran; however, Tep and CrC in many respects differ more from TrC phonologically and grammatically than any two NUA branches.  In contrast to earlier leanings toward a UA homeland in NUA areas, hints of greater diversity in SUA areas surface regularly, bringing Manaster Ramer, Jane Hill, and myself to deem SUA areas as more likely prospects for the UA homeland.  One such hint is the close proximity of all UA reflexes for PUA *kw in the heart of SUA.  Within miles of each other are Tep b, Cahitan bw, Tbr kw, and Tr w/b/ko (Stubbs 1995), while all of NUA reflects a rather unanimous kw.

Tepiman (Tep) is so unique phonologically (*kw > b, *c > s, *s > h, *y > d, *w > g) among UA languages that it may merit distinction from Taracahitan strictly on phonological grounds and grammar, regardless of word counts.  Yet even word counts show a tight Tep entity with numbers from 73-85 between Tep languages, while 34-49 are the numbers between other Sonoran languages and the Tep languages, about the same as between NUA branches.  That fact and the unique Tep phonology both recommend a separate Tep branch, here represented by Tohono O’odham (TO) in Arizona and Nevome (Nv) of Upper Pima, and for Lower Pima/Pima Bajo (LP) are included Pima de Yepachec (PYp) and Pima de Yécora (PYc).  The Tepehuan languages included are Northern Tepehuan (NT) and Southeastern Tepehuan (ST) in western Mexico.

The Taracahitan (TrC) languages are the core of Sonoran, i.e., Miller’s Sonoran minus Tepiman and Corachol.  The TrC languages (in northwest Mexico) include Eudeve (Eu), Opata (Op) or Tewima/Tegwima (Shaul, p.c.), Tarahumara (Tr), Guarijio (Wr), Tubar (Tbr), Yaqui (Yq), Arizona Yaqui (AYq), and Mayo (My).   Sonoran has been stated to be a  mesh of languages in its overlap and intertwining phonological and lexical complexities (Miller 1984).  Yq, AYq, and My form a tight Cahitan (Cah) branch within TrC, exhibiting 93 shared lexica between Yq and My.  Eu and Op are a closely related pair; and the Tr dialects and the Wr dialects form another fairly closely related subbranch of TrC.  These three subbranches diverge in their reflexes of PUA *kw, but each subbranch is consistent within itself: PUA *kw > Cahitan bw; Eu/Op *b; Tr/Wr *w.

Tubar is a unique language in UA.  While its geographic locale is in the center of Taracahitan, with which it is often classified, two factors make its proper classification enigmatic, if not hazardous: one, the lexical data are limited; two, the limited data, obtained shortly before extinction, show numerous loans and influences upon this small language surrounded by other larger UA languages.  It is apparent that Tbr is in part a product of phonological influences from Tep and lexical loans from TrC, yet it is a kw-language, isolated geographically from the only other kw-languages of SUA: i.e., the Corachol and Aztecan branches.  Classification by word counts may be misleading, due to lexical influences upon the small Tbr-speaking population surrounded by larger numbers of Tep (NT) and TrC (Tr, Wr, My, Yq) speakers.  Phonological influences from neighboring Tep languages upon Tbr include some *s > h, some *w > g, and initial *p > w (Stubbs 2000b).  So Tbr’s lexical position may be more due to loans and meshing movements than to genetic position.  Yet I hesitate to call it a single-language branch, because I do not think it is, not like Hopi is.  It is indeed an enigma, and lack of data may ever keep it so; but as long as it must be an enigma, it may as well be an enigma in TrC, where the lexical count would place it.

Corachol (CrC) is a viable grouping, not only because Cora (Cr) and Huichol (Wc) show a closer lexical relationship to each other (58) than to any other UA languages, but phonologically they form a pair and align better with Aztecan in many ways than with the old Sonoran grouping.  They share an innovation with Aztecan of *p > h/ø and a retention of *kw, neither of which is prevalent in Tep or TrC.

The Aztecan (Azt) branch consists of the many dialects related to Classical Nahuatl.  Cortina-Borja and Valiñas (1989) include nine in their classification study.  An observation of interest is that Azt yields numbers of 30-40 with other SUA languages, but only teens to 20 with NUA languages, except Tb, Hp, and Ca, with whiches (couldn’t resist) the Aztecan numbers are 23-26.

The 2600-plus sets of this book are intended to facilitate and stimulate more comparative research, serving as the next plateau or the latest in new and expanded beginnings.  Adding to and refining this body of data will be an ongoing process by the author and whomever would like to join the cooperative effort.  Thus, other viable cognate sets, reflexes to existing sets, errors, enlightening discussion, and feedback are welcome, and in the next edition will be credited to the contributor in future editions, and should be sent to (Brian’s email) or by snail mail to

Brian Stubbs                                                                                                                                     College of Eastern Utah-San Juan Campus, 639 West 100 South                                        Blanding, Utah  84511

Comments are closed.