indigenous.engineering research takes place on ohlone land | ᎣᏪᏅᏒ / home

what does it mean to be Cherokee online?

Over 100 years since the Dawes Act land grabs, theft of Indigenous resources continues through misappropriated identity. Non-Natives frequently dismiss these “family myths” as harmless, claiming they do not affect Natives in any material way.

I wanted to see the data.

Content Note: This post contains references to anti-Indigenous slurs & other racism.


On June 8th 1887 US President Grover Cleveland signed the Dawes Allotment Act into law. Championed by Senator Henry Dawes as a way to “rid the nation of tribalism through the virtues of private property, allotting land parcels to Indian heads of family”, the Dawes Act was one of the most destructive policies towards tribes ever to come from the federal government.

With the stated intent of “civilizing” tribes by forcing them into a system of private property ownership, the Dawes Act broke up tribal lands into individual allotments that would make traditional collectivist systems nearly impossible to maintain. In addition, the act triggered a massive land-grab by settlers resulting in the dispossession of millions of acres of Native land.

While the land theft would have been devastating enough, tribal lifeways were severely damaged by the Removal and then devastated by Dawes, who believed–and legislated–that traditional Indigenous systems must be destroyed at all costs to “save” Natives from themselves. The destruction was not a by-product of well-meaning legislation but rather both the end & the means for a government whose top priority has historically been dispossession of Native land & dissolution of Native sovereignty.

Cherokees, like all the tribes, lost massive swaths of land through US legislation of Cherokee identity: Cherokees who had survived the Removal were now forced to choose a tribe to identify with, and according to Dawes, they could only choose one–although many claimed descent from multiple tribes. A person with both Cherokee and Creek ancestry, for example, could only register under one tribal identity–and would thus lose a part of their inheritance. These “unclaimed” lands were then sold off to white settlers.

Believing he knew better than Natives how they should govern, organize, and identify themselves, Dawes & the contemporary system of governance he represented stripped Indigenous communities of their self-determination in the name of increasing individual independence, never grasping–or perhaps never caring to notice–the irony.

One hundred and thirty-three years later, the Dawes Act still impacts Native life in myriad ways–even up to who does or does not have access to the internet. And one hundred and thirty-three years later, tribal sovereignty is still under attack through both direct action by elected officials and the misappropriation of Native identities. While they might seem different on the surface, these actions parallel the Dawes Act in their theft of tribal identity as a means to redistribute resources to white settlers.

Due to the frequency with which Cherokee descent is falsely claimed by white settlers, arguably no one is more familiar with the misappropriation of Native identity than the three Cherokee tribes. For example, fake “Cherokee” groups with white ancestry were awarded under false pretenses–i.e. stole–hundreds of millions of dollars in federal contracts earmarked for minority contractors. While in such cases it’s possible to put a literal price tag on stolen Native identities, there are less directly measurable–but still very relevant–social costs as well.

In the sense that they strip Indigeouns peoples of their agency, shifting racial identities are modern reenactments of the Dawes Act played out in the social sphere–once again, an attempt at removing tribes’ ability to self-identify, to name & know their own–core requirements for human self-determination.

So in February 2020, the 133rd anniversary of the Dawes Act, I wanted to find out: What does it mean to be Cherokee online?

Democratic Debates & Cherokee Mentions

One of the most publicly visible cases of misappropriated Native identity is that of Senator Elzabeth Warren, whose claim of Cherokee ancestry as a Harvard “woman of color” law professor became a central issue in her 2020 presidential bid.

Like all the candidates, Warren’s online visibility increases significantly during & immediately after the Democratic Debates.

Visualizing Hijacked Identity: ‘Cherokee’ Word Clouds

Word clouds are a visualization tool used to model the importance of particular words & ideas within a corpus of text data. Words that appear more frequently in a text are represented as more “important” in the word cloud by larger size, bolder text, etc. The underlying assumption of a word cloud as a visualization tool is that the more frequently a word or phrase appears in the corpus, the more closely connected it is to the text.

I created an instantaneous word cloud from tweets containing the word “Cherokee” immediately following the February 7th, 2020 Democratic debate (in which Senator Warren participated), around 10:30 pm EST. The most frequent vocabulary words found in the tweets show that up until shortly after the debate the biggest news around the word “Cherokee” was the tribe’s donation of several culturally vital varieties of seeds to the Svalbard Global Seed Vault, as reported by The Guardian in a story dated February 7th, 12:27 EST. As the Cherokee Nation is only the second Indigenous tribe to contribute to the vault, and the seeds contributed predate European arrival in North America, it makes sense that this news would dominate:

Cherokee word cloud, 02/07 10:30 pm EST

Hours Later: ‘Cherokee’ Becomes Inseparable from the Democratic Debate

However by shortly after 10:30 am February 8th, just twelve hours later, a word cloud created using the same methodology demonstrates how the debate had worked its way in to Cherokee mentions: alongside words like “seeds”, “culturally” and “preserve” hashtags related to the presidential election are prominent in the dataset. Diluting mentions of Cherokee heritage we can see the terms Bernie2016 and Bernie2020, referencing 2020 democratic presidential candidate Bernie Sanders.

This word cloud was generated from tweets containing the word “Cherokee”. There was no other reference or marker for politics, any political candidate, or the democratic debate. The ability to query tweets containing “Cherokee” and in real time see tweets directly linked to the 2020 elections is clear evidence of the deeply-entrenched nature of Warren’s misrepresentations regarding her ethnicity:

Cherokee word cloud, 02/08 10:30 am EST

Vocabularies Evolve, but ‘Cherokee’ Remains Tied to Warren

I generated two more word clouds at later time intervals using “Cherokee” and nothing else as the search term. The vocabularies are demonstrably different in each wordcloud:

Cherokee word cloud, 02/08 1:30 pm EST

Cherokee word cloud, 02/08 3:00 pm EST

The differences illustrate several key points: first, that new tweets are being generated to draw from & that this conversation is culturally significant enough to sustain a fresh draw of at least one thousand tweets every few hours; second, while the conversation continues to evolve (evidenced by the shifting vocabulary focus in each wordcloud) ties to the Democratic debate are inextricable.

Interestingly, the tweets do not even seem to directly refer to Warren herself, but rather other candidates in the debate–presumably vis a vis their performance relative to Warren. This is further evidence that Warren’s cultural misappropriation is in fact cultural theft: the name of a group of living human beings to whom she bears no relation or community ties is now taken for granted as a shorthand for her political brand such that it can even reference her opponents.

It should be noted that on February 8th, 2020, the 133rd anniversary of the signing of the Dawes Act, Cherokee identity was again demonstrably hijacked by a white politician–this time not thought direct legislation, but rather words & actions in the social sphere.

The cost of cultural appropriation is sometimes difficult to directly measure. However in this case the data shows that while the Cherokee Nation was making history preserving North American cultural & ecological heritage, Senator Warren’s dishonesty unfortunately took center stage.

Expanding the Search: a Month of ‘Cherokee’ Tweets

To get more tweets & a more representative sample, I also searched using an expanded 1-month date range that included both the February 7th and the January 14th democratic party debates.

This search, covering a roughly 30 day period, returned over 38,000 tweets containing the word “Cherokee”. It is important to note that there is no guarantee that all these tweets represent direct references to one of the three Cherokee tribes.

That is in fact the point here–that a name referring to three federally recognized tribes is very often not associated with the tribes at all.

Two of the most prominent themes in this dataset are the phrase “beat Trump”, certainly a reference to the Democratic presidential primaries via Elizabeth Warren, and “Jeep Grand”–referring to the automotive manufacturer Jeep & the name of a popular model of car, the Jeep Grand Cherokee:

Cherokee word cloud, January-February 2020

In fact, the top vocabulary word from over a month’s worth of tweets containing the word “Cherokee” is “Jeep”. The second most frequent word from the corpus is “Warren”.

In nearly forty thousand tweets from over the course of a month, Cherokee tribal voices take a proverbial backseat to discussions of non-Native politicians & cars.

Terms That Appear in Connection to ‘Cherokee’

Anecdotal evidence suggests that certain terms appear quite often with spurious claims of Cherokee heritage. To get a feel for how often, I chose a few common concepts that Natives often see represented along with false claims of Native heritage. “Grandma” and “grandmother”, for example, often appear in claims of Native ancestry (frequently without reference to the name of the alleged Indigenous ancestor).

Other frequent references include ‘cheekbones’, as prominent cheekbones are often thought to indicate Native ancestry; ‘princess’, a reference to pernicious Cherokee princess myths; ‘Indian blood’, an offensive phrase often used to both claim & distance oneself from Native ancestry; and sexualized terms such as ‘sexy’, & the offensive slur ‘sq-aw’.

In total, I tested all of the following terms: ‘warren’, ‘grandmother’, ‘grandma’, ‘indian blood’, ‘not offended’, ‘cheekbones’, ‘sq-aw’, ‘princess’, ‘sexy’

While there are certainly synonyms for terms such as ‘sexy’ etc, I did not explore these in detail. The frequency with which this term co-occurs with ‘Cherokee’ certainly suggests that if anything, sexualized references to Cherokee identity are underrepresented in the dataset.

A quick test revealed that every one of the tested terms appears in conjunction with the word “Cherokee” fairly frequently on twitter–each term easily returned 100 tweets.

Beyond Toy Datasets: Up to 10,000 Tweets per Term

Since it was possible (& easy) to find 100 tweets per tested term, I increased the potential dataset size by orders of magnitude in the hope that a large enough dataset would allow a glimpse into what society means & understands using the Cherokee name online.

Over 23,000 Tweets Containing Stereotypical or Offensive Themes

In total, I was able to quickly find 23,375 tweets containing at least one of the terms tested.

These numbers represent the results of direct queries containing the name of the Cherokee tribe combined with at least one search term. Many (but not all) of these twenty-three thousand tweets can be assumed to refer directly to Native issues.

While more complex & specific text processing could further refine this dataset, I wanted to keep the results as replicable and broadly applicable as possible, so I chose to keep the processing to a minimum.

Even taken purely as corollary, the numbers are striking and difficult to ignore.

Warren’s False Claims & ‘Cherokee Grandmother’ Blood Myths

When testing terms, I set each term’s maximum results to 10,000 tweets. The only term appearing in conjunction with the word “Cherokee” anywhere remotely near this frequency was “Warren”, which easily returned a corpus ten thousand rows long. Most of these tweets are very recent; because the twitter API pulls more recent tweets first, it can be inferred that many more such tweets referencing Warren’s claim of Cherokee heritage exist.

The only other terms to approach this frequency of use in conjunction with “Cherokee” are the words “grandmother” and “grandma”, synonyms that resulted in 5,557 and 2,988 tweets respectively, for a total of 8,545. These tweets can reasonably be inferred to refer to the same blood myths that Warren still has yet to refute.

Sexualization & Blood Myths

Some of the most common offending themes appear in the thousands: the sexualized & derogatory term ‘sexy’ appears in tweets containing ‘Cherokee’ 1,148 times, while tweets containing ‘princess’, many of which could be reasonably inferred to refer to common blood myths, number 2,089.

Although less numerous, the deeply offensive ‘sq-aw’ (146 tweets) and another phrase referencing blood myths, ‘indian blood’ (404 tweets) both appear in the corpus.

An Appropriation Paradox: False Claims of Native Identity as Social Cover for Anti-Indigenous Racism

Interestingly, there are hundreds of instances of the phrase “not offended” co-occurring with “Cherokee”. This matches anecdotal evidence of non-Natives using false claims of Native identity to excuse anti-Indigenous racism.

The frequency of the phrase “not offended” within ‘Cherokee’ query corpora constitutes what might be termed a paradox of appropriation: a social phenomenon where false claims of identity within a particular group are weaponized as a means of excusing offenses perpetuated by colonizing entities against said group. For Natives on social media, this pattern is one that is deeply familiar.

To see what themes these tweet authors were using their purported Cherokee heritage to excuse, I created a dataframe consisting solely of tweets from the “Cherokee”/“not offended” query. Using the same minimal text processing, I created a word cloud & took the top 100 vocabulary words:

Cherokee word cloud, 38,000 tweets, January-February 2020

Themes in the “not offended” corpus included frequent references to controversial sports mascots, such as ‘Braves’, ‘Washington’ & ‘r-dskins’, ‘indian’, ‘mascot’, ‘tomahawk’ & ‘chop’. Additionally, Cherokee heritage was used to excuse inappropriate references to a kidnapped & murdered Indigenous teenager (search term ‘Pocahontas’, aka Matoaka), sexualized so-called “Pocahottie” outfits (term: ‘costume’), and blood myths (‘great grandma’ & ‘16th’, a reference to blood quantum). The corpus also contains the word ‘snowflake’, contemporary slang referring to a person or group of people the author considers to be too easily offended.

Overall, the data backs up Natives’ anecdotal experiences online: when the word “Cherokee” appears with the phrase “not offended”, it frequently represents an invocation of ostensible Native heritage to excuse actions that have been shown to be both offensive, and materially harmful, to Native communities.

Stereotypes & Terms of Offense Likely Underrepresented in the Corpora

These queries should be considered non-exhaustive; obviously there are a number of synonyms for many of these terms. The only pair of synonyms I searched for, “grandmother” and “grandma”, returned a similar number of tweets, and even this query makes assumptions such as standard spelling etc. More targeted queries including synonyms & common alternate spellings would almost certainly return more results.

Given the scope of their inclusion within Cherokee references it is reasonable to infer that these offensive concepts are both highly prominent on social media, and underrepresented in the corpora.

Trying More Positive Terms

In order to get a feel for the broad range of sentiment around Cherokee themes, I decided to test terms more closely and/or positively associated with the Cherokee tribes.

I wanted to see more than the stereotypical things people say online regarding Cherokee heritage & citizenship–I wanted to explore the positive associations people might have with the three Cherokee tribes.

To do this, I started by testing two terms related to Cherokee tribal citizenship & sovereignty–the word “citizen” and the word “sovereignty” itself–as well as “enrollment”, a reference to the process by which one becomes a Cherokee citizen. I chose these themes because anecdotally Cherokee sovereignty over citizenship is often ignored or outright questioned by non-Natives online. Presumably a search for these terms would reveal data on public sentiment around what it means to be Cherokee.

In addition I searched for terms that could describe Cherokee citizens who are members of educated & well-respected professional groups, such as ‘doctor’, ‘lawyer’, ‘engineer’, ‘professor’, ‘journalist’, and ‘CEO’.

Because a major contemporary stereotype against Natives in general is the assumption of a monolithic culture frozen in time, I reasoned that testing for modern, respectable occupations & terms related to professional and technical fields might provide a more accurate look at public perception of Cherokee scholars and professionals.

Of course there are Cherokee citizens representing the three tribes in each of these fields; I wanted to see how closely public perception mirrors this reality online.

At 6,639 rows of tweets, the dataset of positive terms, including those related to Cherokee tribal sovereignty, citizenship, and professional occupations, is less than one third the size of the corpus of stereotype-themed tweets.

For each tweet containing both the word “Cherokee” as well as a reference to tribal autonomy or professional occupations, there were 3 containing a reference to at least 1 harmful stereotype or anti-sovereignty theme.

Exploring More Authentically Shaped Identifiers: Corpora for ᏣᎳᎩ & Tsalagi

To get a more authentic point of comparison for Cherokee attitudes I decided to test terms that many Cherokee people use to refer to themselves. The terms I tested as substitutes for the word “Cherokee” were the Syllabary characters ‘ᏣᎳᎩ’ as well as their latinized transliteration, ‘Tsalagi’.

The Cherokee Syllabary is recognizable and renderable across the internet and even in software development environments such as the Jupyter Notebook development environment I use, as well as the python and markdown code used to create this analysis. It is critically important to note here that support for Cherokee Syllabary characters such as ᏣᎳᎩ is a direct result of the concerted efforts of Cherokee tribes, technologists & engineers.

These facts give the absence of Cherokee engineers & technology in the online conversation a particular irony: Cherokee innovation & technology power basic-to-advanced components of the most familiar technologies today (such as wireless internet & smartphones), yet the vocabulary around these terms remains dissociated from the idea of Natives in tech. It is not unreasonable to infer that while Cherokee innovation powers the internet, the words “Cherokee” and “innovation” are not strongly associated for non-Natives.

Testing ᏣᎳᎩ Tweets

The Cherokee Syllabary is displayed as unicode characters, which require some special handling. Unfortunately unicode characters are not valid file names in Linux computer systems (such as the Ubuntu operating system on which this data work was performed). Because of this, I needed a second pipeline that takes this into account.

Exclusive of retweets, I found 1,980 tweets containing the word ‘ᏣᎳᎩ’ in the Cherokee syllabary. Many of these tweets appear to refer to language learning, and technical aspects of using the Cherokee syllabary online.

Testing Tsalagi Tweets

I was able to obtain over 20,000 tweets containing the word “Tsalagi”. There are more available; the query was only limited by the scope of this project.

Tweets containing the word “Tsalagi” offer a striking difference in content & tone from any corpus created with the “Cherokee” query. The primary ideas in the Tsalagi corpus center around family & lifeways: important words are “love”, “know”, “think”, “women/woman”, “baby”, “mother”, and “child”.

Rather than a reference to a particular politician, hashtag, or mythology, the most important word in the corpus is simply “people”:

Tsalagi word cloud, 22,000 tweets

Creating a Corpus Around Cherokee Citizenship

To find more tweets authentically reflecting the realities of everyday Cherokee lives, I wanted to find a query that would reflect Cherokee sovereignty. I chose to use the tandem search terms “Cherokee” and “citizen”, a reference to tribal citizenship in one of the three Cherokee tribes.

I named this dataset ᏣᎳᎩ ᎡᎲᎢ, for ‘Cherokee citizen’, using the syllabary for the variable name in honor of the Cherokee engineers, technologists, & other leaders who made possible this Indigenous language’s inclusion in state of the art software engineering. There are few better metaphors for Native ingenuity & resilience than the ability to use this ancient natural (human) language to communicate with a sophisticated, modern, & dynamically interpreted machine language. The ᏣᎳᎩ_ᎡᎲᎢ dataset represents a unity between Indigenous & modern that has always existed.

ᏣᎳᎩ ᎡᎲᎢ (Cherokee Citizen) Tweets

Although Senator Warren’s claims still appear referenced in this corpus, the major themes of this corpus are much more closely aligned with tribal sovereignty. The words “tribal” and “nation” are prominent, as well as direct references to sovereign tribal nations such as the Cherokee Nation and the Eastern Band of Cherokee Indians:

ᏣᎳᎩ ᎡᎲᎢ (Cherokee Citizen) word cloud

It is clear from the corpus vocabulary that the ideas expressed in tweets containing the words “Cherokee” and “citizen” are centered around sovereignty more than any other subject. Cherokee citizenship could thus reasonably be inferred to be an important marker for tweets that reference authentic Cherokee political concerns.

This becomes especially apparent in comparison to tweets containing simply the word “Cherokee”, which are demonstrably correlated to a number of social themes–from politics to cars–with little to no relation to the three Cherokee tribes.

One More Try for Authentic Cherokee Mentions: After the February 25th Debate

Checking references containing “Cherokee” after the February 25th Democratic debate showed any interference from Warren’s mentions were eclipsed by an open letter to the Senator from a group of concerned Indigenous signatories regarding her claims of Cherokee heritage, published on Wednesday February 26th. The letter, signed by over two hundred Cherokee and other Native leaders, laid out several priorities for Senator Warren to set a public example in regarding her ancestry claims.

The Warren campaign quickly released a 12-page response containing multiple references to Natives who approve of the senator, as well as her own policy initiatives, in addition to an apology. In this letter, Warren continued to insist that she received no professional benefit whatsoever from her claims of Native ancestry, despite her having been touted as Harvard’s first “womxn of color” law professor. Crucially, Warren failed to admit (as requested) that her claims of Cherokee & Delaware Native heritage were false.

Although this word cloud was generated within the typical 24 hour post-debate time window, this particular dataset shows the impact of Cherokee citizens’ voices in its most important concepts: while Warren’s response & apology still occupy a prominent place in the corpus, words associated with tribal sovereignty such as “citizens” & “Nation” feature prominently–and powerfully–within the data:

Cherokee word cloud, February 26th 2020

When the voices of Cherokee citizens are amplified, tribal sovereignty emerges as the theme.

More than 100 years after the Dawes Act attempted to strip tribes of their self-determination and their land, the fight for tribes’ right to govern themselves continues. Cherokee identity remains key to Cherokee self-determination.

Where self-identified “Cherokee” authors can be found espousing everything from disproven family mythologies to blatant racism in the form of excusing anti-Indigenous actions on behalf of Natives, tribal citizens paint a completely different picture of Cherokee political interests. While Cherokee people & allies find various ways to identify themselves and connect online, tribal citizenship remains an especially powerful predictor of alignment with Cherokee tribal interests.

The data shows that as in the real word, Cherokee identity and tribal sovereignty remain inextricably linked online.

All code used in this project is available here.