Jekyll2022-11-06T09:22:09+00:00https://christianbrickhouse.com/feed.xmlChristian BrickhousePersonal websiteChristian BrickhouseMastodon and the history of Computer People2022-11-06T00:00:00+00:002022-11-06T00:00:00+00:00https://christianbrickhouse.com/weblog/2022/11/06/mastodons-place-in-internet-history<p>With many users looking for alternatives to Twitter, Mastodon has grown in popularity, but for many people the decentralized social network can be confusing, lonely, and unattractive. This has led to a growing sense that Mastodon is only for Computer People™ or that these federated social networks have flaws which proponents refuse to acknowledge or fix. While Mastodon and other federated social networks like PeerTube and Pixelfed certainly have their issues, I argue that the underlying complaint is one of ideology and culture.</p>
<p>What seem like design flaws to outsiders are often deliberate design decisions made by communities after decades of experimentation. The point of this post is not to convince you to join a federated social network, and the point is not to explain how federated social networks operate. Instead, the goal of this post is to explain the cultural history of federated social networks since the late 1980s in order to explain <em>why</em> Mastodon (and others) is designed in ways that might seem alien to new users. The hope is that by providing greater cultural understanding, users can better decide whether federated social networks fit their beliefs and goals, or if they would prefer a more centralized system like Cohost or tumblr.</p>
<h2 id="prelude-the-history-of-free-software-and-free-culture">Prelude: The history of Free Software and Free Culture</h2>
<p>Our archaeology of knowledge begins in the 1950s with the first electronic computers. At this point in time, software was distributed alongside hardware without cost and included the entire source code so that purchasers could examine, learn from, modify, and repair the machines they purchased. This custom was largely a necessity: these were largely experimental machines and users would need to tailor them for particular purposes. The knowledge was esoteric and restricted to a particular class who had the capital (economic, institutional, or social; <a href="#Bourdieu">Bourdieu 1986</a>) to access them. This class of system operators developed its own culture commonly referred to as <strong>hacker culture</strong>. Novice initiations were possible, in part, because of the ability for them to freely access, study, modify, and distribute software to members of the community. Large-scale projects were accomplished because of the ease with which community members could collaborate without legal or financial restrictions. In order to facilitate the distribution of software, early hackers developed the Advance Research Projects Agency Network (ARPANet), the predecessor to the modern internet, which for the first time connected computer networks across continents.</p>
<p>While there was an economic undercurrent moving towards monetization of computer systems, prior to the 1980s computing machines were largely for scientific or hobbyist pursuits. The goal was development, monetization would come later, and come it did. These transcontinental software projects became large and complex, requiring greater financial investment to sustain the labor and hardware costs. Companies which focused exclusively on software began to take shape, and these competed with hardware companies like International Business Machines (IBM) who would bundle hardware and software. Companies began to charge money for software licenses. In 1974 the US Government declared that software was covered by copyright, all software was considered public domain prior to that point, allowing corporations to restrict licensees from sharing or modifying the software. In 1976 Bill Gates wrote an <em>Open Letter to Hobbyists</em> which criticized the sharing Microsoft’s BASIC interpreter despite the decades long history of that practice. In 1979 American Telephone and Telegraph (AT&T), which to this point had freely distributed the UNIX system (a predecessor to Macintosh OSX and Linux), began enforcing its license rights in preparation for commercial exploitation. In 1983 IBM announced that it would no longer distribute source code with software purchases, restricting the ability for end users to learn from and modify code to fit their needs. These changes were part of a cultural narrative of restriction contrary to the prevailing discourse of sharing that thrived in the hacker culture.</p>
<p>In response, a counter-narrative emerged in the 1980s led by the GNU Project and the Free Software Foundation (FSF). In 1986 the first definition of free software was published:</p>
<blockquote>
<p>First, the freedom to copy a program and redistribute it to your neighbors, so that they can use it as well as you. Second, the freedom to change a program, so that you can control it instead of it controlling you; for this, the source code must be made available to you (<a href="#Stallman">Stallman 1986</a>).</p>
</blockquote>
<p>In 1996 these were expanded to three freedoms, and later a zero-th freedom was added resulting in the contemporary <em>Four Essential Freedoms of Free Software</em>:</p>
<ul>
<li>The freedom to run the program as you wish (freedom 0)</li>
<li>The freedom to study how the program works, and change it so it does your computing as you wish (freedom 1). Access to the source code is a precondition for this.</li>
<li>The freedom to redistribute copies so you can help your neighbor (freedom 2).</li>
<li>The freedom to distribute copies of your modified versions to others (freedom 3). By doing this you can give the whole community a chance to benefit from your changes. Access to the source code is a precondition for this (<a href="#FSF">Free Software Foundation 2022</a>).</li>
</ul>
<p>These freedoms, while more formalized than the original two freedoms, are clearly an artifact of the historical discourse which precipitated them. Freedom 1, for example, enshrines the existing cultural practice of peer learning and hacking which defined the early culture. The coda to freedoms 1 and 3 (analogous to the second freedom in Stallman 1986) states that it is impossible to exercise this freedom without access to the source code. This statement was also present in Stallman’s (1986) original two freedoms, and in that historical context, this is a clear rebuttal to IBMs announcement in 1983 that they would no longer distribute source code. Freedom 2 (the first freedom in Stallman 1986) rebuts Gates’ and AT&T’s exercise of their copyright in restricting the sharing of software. This rebuttal frames this in terms of community: hackers share software to “help [our] neighbor” and this sets up the practices of Microsoft and AT&T as harmful to the neighborhood (hacker culture). This framing is reiterated in the coda to freedom 3.</p>
<p>In 1989 the FSF established <strong>copyleft</strong> as a foil to <em>copyright</em> and published the first version of the GNU General Public License (GPL) which utilized the legal framework of software licenses to protect the Four Freedoms rather than restrict them. This strategy was not merely altruistic, but militant. The goal of the GNU Project was, in part, to write software so good and so indispensable that for-profit corporations would incorporate it into their products. When they do so, the terms of the GPL would require them to liberate their code, and if they refused, the legal mechanisms they used to restrict freedom would be used against them in order to force access to their source code. In the modern day, legal non-profits like the Software Freedom Law Center have brought suit on behalf of free software developers against corporations such as BestBuy, Verizon, JVC, and Samsung. These cases usually resulted in the companies releasing their code under the GPL, though in the case of BestBuy, a default judgment resulted in the award of monetary damages and the seizure of televisions.</p>
<p>In the 1990s, the free software movement influenced the development of a parallel discourse: free art and free culture. This is the real start of our Mastodon discussion, so I will speed through the details. In 1995 the first wiki, Ward’s Wiki, was developed which allowed anyone to edit content on the website. This grew into many offshoots, most notably Wikipedia, and as these projects grew the need to prevent legal problems grew as well. At first, projects utilized licenses like the GNU Free Documentation License which was made for documenting open source software, but these were not fit to purpose. The need for licenses tailored to <em>cultural</em> works resulted in the development of the Creative Commons and their family of licenses (only some of which are considered “free”). The English Wikipedia and its sister wikis adopted the Creative Commons Attribution ShareAlike 3.0 in 2009 which was quite a big deal, requiring the FSF, Creative Commons, and Wikimedia Foundation to coordinate changes to legal license texts so that the license of existing works could be changed without needing to contact the thousands of (often unknown) editors. The adoption of Creative Commons by one of the largest websites in the world brought it serious prominence, and helped develop a commons of material that facilitated further free cultural works. Now, there exists an artistic community aligned with the free software community who mutually work towards creating a commons of material that not only facilitates sharing, but requires corporations to contribute back to the commons if they want to use it.</p>
<h2 id="a-history-of-social-networks">A history of social networks</h2>
<h3 id="the-old-net-1970-to-1993">The Old Net (1970 to 1993)</h3>
<p>For hacker culture to exist, there needed to be some way for members to communicate, socialize, and coordinate. Unsurprisingly, hackers often made their own tools for this purpose, and they are among the first electronic social networks. In fact, the history of electronic social networks predates the internet itself by over a decade. The first electronic social network, <strong>Community Memory</strong>, was founded in 1973 and hosted in a record store in Berkeley California. Community Memory was a kind of electronic bulletin board system (BBS) accessible from computers (terminals) located around the San Francisco Bay. Users would log in to the Berkeley computer system from these terminals and post messages which would be stored on the computer, and others could log in and search through the posts for messages of interest. Imagine a Google Doc where people would just write stuff or go to find things; that’s quite literally what this was. It was a computer’s storage (Memory) accessible and writable by the whole Community. Community Memory became a cultural mainstay in the Bay Area, and was among the first time that computers were used solely for recreation rather than research. Users would post ads for band mates, share recipes, post poems, organize carpools or study groups, review restaurants, or find chess partners. The system becoming so popular that keyword systems and search functions were eventually added to facilitate discovery, and prior to that terminal locations would print out relevant entries to post near the terminal; at its peak Community Memory was accessed about 50 times per day from each location, with 10 additions per day per location (<a href="#Colstad">Colstad and Lipkin 1975</a>). Community Memory was shut down in 1975.</p>
<p>Rossman (<a href="#Rossman">1975</a>) explores the implications of Community Memory contemporaneously, and it is worth revisiting these early reflections on the first social network. As hinted at above, the name Community Memory is largely a reference to the technical aspect of computer memory, but Rossman suggests this name is misleading. Community Memory was not about the past, posts were in fact quite short lived due to the memory limitations of the machine hosting it. Instead, Rossman (1975) suggests the idea of <strong>community data</strong>, saying Community Memory is “an attempt to deal with the real-time complex of community data….perhaps ‘Community Data Connection’, or ‘Community Data Exchange’. ‘Community Information’ may be best.” The point Rossman emphasizes is that Community Memory emphasized <em>connections</em> between people by allowing the <em>exchange</em> of data. People had these needs before Community Memory, and often they were met by community bulletin boards in coffee shops with geographically limited readership. The true genius of Community Memory is how it facilitated and leveraged existing social skills and practices already familiar to the community (c.f. cultural idioms, <a href="#Obeyesekere">Obeyesekere 1981</a>). Rossman further highlights the <em>political</em> nature of Community Memory which is worth considering in light of the cultural transformations going on at this time (a year before Gates’ <em>Open Letter</em> for example). The structure of the system was, in part, a political act which highlighted the potential for success of decentralized systems: “the operational politics of the system are deeply democratic—rather than implicitly authoritarian/centralized, as most of our systems of information sharing are.” This quote is worth highlighting because it is not referring to online systems of information sharing; the internet won’t exist for another 14 years. These “authoritarian/centralized…systems of information sharing” were traditional mass media. The promise of social media, and popular computing in general, was the democratization of knowledge away from authoritarian models of control, a discursive thread we will follow through to the present day.</p>
<p>Following the success of Community Memory, other Bulletin Board Systems were set up tailored to particular places and times, but these were ultimately superseded by <strong>Usenet</strong> in 1980. A major issue with early BBS was the lack of syndication. Some posts and events were hyper local, sure, but what about postings of wide interest? Put another way, if BBS are like bulletin boards in a coffee shop, what would be the local newspaper or national magazine equivalent? The solution was, essentially, to federate these BBS. Users would post to a local server, and that server would forward the post to servers it knew, and so on and so forth, until the post propagated to every server that wanted it. Inspired in part by ARPANET, Usenet was a network not for research consortia but user interactions. Where Community Memory and other BBS were essentially local networks, Usenet was the first community-oriented <em>inter</em>network. Posts from ARPANET were syndicated to Usenet, allowing for readers to keep up with that network as well. Posts were organized by topic, and threaded conversations were invented and implemented. Jargon from this era still survives such as “spam” which Usenet popularized after importing it from BBS and Multi-User Dungeon culture (there’s a parallel history of MUDs that I’m largely ignoring here as its contributions are largely to forum, art, and gaming cultures). Usenet was an important place for disseminating information; it was where the internet was first announced. It served as an organizing space for technologists, researchers, artists, and activists, but following 1993’s Eternal September, precipitated by America On-Line, Usenet culture was unable to accommodate and socialize the influx of new users leading to its slow decline (see <a href="#Brickhouse">Brickhouse 2021</a>).</p>
<h3 id="the-world-wide-web-1988-to-2015">The World Wide Web (1988 to 2015)</h3>
<p>Many members of the Usenet culture found their way to new social media across the internet, an interconnection of local networks pioneered in 1989 at CERN. The major contributions of the World Wide Web to inter-networking were:</p>
<ul>
<li>uniform resource locators (URLs) which allowed any network to find a document on another network</li>
<li>HyperText Markup Language (HTML) which allowed for the formatting of documents as <em>hypermedia</em> with content other than plain text</li>
<li>HyperText Transfer Protocol which created a universal standard for document transfer replacing previous heterogeneous protocols</li>
</ul>
<p>Around this same time, a new social medium was being developed: Internet Relay Chat (IRC). Released in 1988, IRC grew out of a revamp of a Finnish BBS. Jarkko Oikarinen wanted to modify the system to allow Usenet-like posting, but also enable real-time communication between users. IRC allowed for real-time communication in a manner similar to Usenet: users would connect to a server and a particular channel. A user would send a chat message to whatever server they’re connected to, and then that server would send the message along some network of servers so that it reached everyone in the channel; the message would be relayed across the internet until it reached its destination. Users had two identities: a <strong>nick name</strong> and a <strong>host name</strong>. The nickname (nick, for short) was user-defined and could be changed at any point as long as it didn’t conflict with someone already in the channel. The host name was the name of the computer you were using—the place where messages would be relayed. This could result in your IP address being publicly visible which is an obvious privacy issue, and different networks resolved this problem in different ways.</p>
<p>Freenode, an IRC network started in 1995, and LiberaChat (a fork of Freenode started in 2021) are the two main networks today, and they both use <strong>cloaks</strong> which users can choose to mask their IP address. These cloaks are stylized and identify a user’s affiliation and (to some degree) authenticity. For example, as a Wikipedia editor, I am entitled to a cloak identifying me as such, and so no matter what channel I’m chatting in, my affiliation with Wikipedia is part of my identity—it’s where I’m “from”. Other projects have cloaks as well. Debian Developers, people who write and maintain the Debian operating system, are entitled to cloaks which identify them as affiliated with that project. If you’re familiar with Academic regalia, think of a cloak like an academic robe; the colors on the robe represent your affiliation with your alma mater regardless of what institution you’re at, and your affiliation with that institution, along with the fact that you <em>have</em> those robes to wear, is a signal of your rank and status in that community. An IRC cloak is similar as it demonstrates some project views you as a member, conferring a level of prestige proportional to the prestige of your affiliated project. Another analogy might be a sticker on your laptop with the name of your home town: a cloak shows people that you’re from somewhere and gives them some information to start an interaction with you.</p>
<p>IRC culture was quite important, though its impacts on modern culture are modest. It was quite influential on wiki culture, pioneering the “trout” method of social discipline still in use on the English Wikipedia for example. On a larger scale, it was the predecessor to chat rooms and instant messaging systems like AOL Instant Messenger (AIM). IRC allowed users to set away messages, for example, which was integrated in to AIM eventually becoming a mainstay of the latter culture in the late 90s and early 00s. During it’s peak, IRC proved to be an important method of Community Data Exchange (recall Community Memory and Rossman 1975). During the 1991 Soviet <em>coup d’état</em> attempt, Soviet media was blacked out with state media playing <em>Swan Lake</em> on loop to prevent dissemination of news about the <em>coup</em>. Meanwhile users across the globe, not subject to the blackout, were able to transmit news into the Soviet Bloc through IRC to circumvent the media blackout, a technique pioneered a year earlier during the Gulf War (it is worth remembering that these technologies and events are not divorced from wider geopolitical moments).</p>
<p>At the same time however, IRC also came to pioneer <strong>forking</strong> as a response to project-internal conflict. IRC, like other open protocols and free software, could be “forked” whereby a project splits in two. In the first decade of IRC, there were 5 network forks due to various technological, ideological, and cultural disagreements on the direction of networks. Forking is not always a response to conflict, of course. If you’ve ever used GitHub, for example, when you copy a project you “fork” it which derives from these earlier cultural conflicts. Forking is still a contemporary method of community self-governance, with LiberaChat forking from Freenode in 2021 due to disagreements with the companies new owner. The fork was successful, and within the year LiberaChat overtook its parent as the largest IRC network in operation. Even the mere <em>possibility</em> of a fork can be used as a means of social control (<a href="#Tkacz">Tkacz 2014</a>). A fork is, in essence, a strike whereby the developers, maintainers, and users withhold their labor and participation from the project. The threat of the strike is a tool to bring about negotiations and leverage grassroots power, but it can also be difficult to pull off. Forking as both a technical matter and a political tool was pioneered on IRC where it eventually made its way into software and wiki culture.</p>
<p>While IRC was growing, Usenet was dying, and at this confluence was the rise of wikis. The first wiki was established in 1995 by Ward Cunningham and was, quite simply, a website that anyone could edit. The goal of Wiki (variously known as WikiWiki, WikiWikiWeb, Ward’s Wiki, and C2) was to be a place for software developers to collaboratively document “patterns”: common ways of designing software. It eventually became home to proponents of “Extreme Programming”, a software development paradigm that was a major influence on contemporary startup culture. While Cunningham was the owner of the website and server, the community was largely self-governing. Anyone could delete a page, for example, and decisions about what kinds of pages were within scope were made through community discussion. This burgeoning community on what was essentially a <em>much</em> larger Community Memory, spawned a subculture interested in the meta aspects of Wiki and community organization. Alongside pages about how to write software, discussions about programming languages, and community lore, there began to be pages about Wiki as a culture: how conflicts developed and resolved, what structural aspects promoted or prevented conflict, philosophies on dealing with unhelpful edits or outright spam. This did not last long, however, and in 2000 the community decided that these meta posts were out-of-scope, and they were forked to a new wiki: MeatBall.</p>
<p>MeatBall was at first a documentation of Wiki culture, but soon after its establishment, the wikipedia emerged leading to the analysis of social patterns across multiple communities with various goals. Wiki, for example, was interested in software, while Wikipedia was interested in encyclopedic writing. How do these differing goals affect the way people interact in cyberspace? Eventually, Wikipedia would develop a new wiki software, MediaWiki, which replaced UseModWiki that ran Wiki and MeatBall (and other wikis), introducing a number of structural changes previously unseen on wikis but theorized by those interested in the relationship between software structure, project goals, and user behavior. MediaWiki, for example, implemented namespaces which separated user-facing content (encyclopedia articles) from project-internal discussions. Where previously, discussions took place right underneath the content, MediaWiki created a system that incentivized greater separation between discussion and product. Free linking was introduced, allowing for any phrase to be turned into a link using <code class="language-plaintext highlighter-rouge">[[</code> and <code class="language-plaintext highlighter-rouge">]]</code>—previously links were made by typing the phrase in CamelCase—and free linking allowed for more readable prose, increasing the accessibility of the product. Beyond the technical features, many community organizational philosophies were imported to Wikipedia such as beliefs in openness and non-violent communication.</p>
<p>As Wikipedia grew throughout the early 00s, its peer wikis shrank, with Wiki and MeatBall shutting down editing around 2014. The major cause was an influx of spam, and an inability for the existing community to effectively handle the disruption. This was, in part, an indictment of the politics of “radical inclusion” they espoused, but it was also a result of external factors such as recruitment issues and the development of the modern information economy. By the mid 2010s, Wiki was approaching its 20th year, and many of its earliest community members had moved on to vastly different life stages. Even the contemporary English Wikipedia which celebrated its 20th year in 2022 has lost many of its earliest contributors. Wikipedia was not spared from the spam and vandalism issues plaguing other wikis in the 10s (see Brickhouse 2021), what differed was that it had sufficient human, financial, and technical resources to manage the disruption without failing. This points to a common cause: by the mid-2010s the information economy had incentivized (1) boosting website rankings by spamming links to them on open wikis leading to greater ad revenue, (2) a large number of internet users socialized into <em>online</em> authoritarian/centralized systems of data exchange, (3) and a cultural emphasis on individualism and monetization following the establishment of a global neoliberal hegemony following the collapse of the Soviet Union in the 1990s.</p>
<h3 id="the-information-economy-2010-to-present">The information economy (2010 to present)</h3>
<p>In the last decade-or-so we have seen the rise of an information economy notable for its inversion of the product-consumer relationship. Prior to this point, computing was a product. Consumers would buy hardware and software to serve their computing needs. With the advent of the Word Wide Web, a new kind of economic venture was pioneered: the DotCom company. Rather than hardware or software, these DotCom companies provided a service. You didn’t need to buy a contraption or install software; you opened a browser, went to a website, and received a service. These companies had a number of business models. There were companies which provided services free, but served advertisements which provided cashflow proportional to the number of customers. Others gained money by selling goods and services, a time-tested strategy. Often there was some combination of the two, but the point to be made here is that the clear “consumer pays money for product” relationship begins to fray. Following the DotCom bubble burst in the early 00s, the information economy completed this inversion: users were the product and corporations were the consumers.</p>
<p>The rise of e-commerce and social networks like MySpace, Facebook, and Twitter, companies began to realize that they had a wealth of data that could be monetized. Advertisers would pay more to serve targeted ads, and data on purchases, likes, social ties, and interactions could be sold to advertisers either through on-platform advertising or in bulk for the companies to use for spamming. This economic incentive led to the development of more gated content; if a website can get you to log in to view their (free) content, then they can gather information on you to supplement their ad income. Even without these gates, websites can use cookies or even hidden technical details about your computer, a “browser fingerprint”, in order to track you across sites and build an advertising portfolio on you. Information, information about you, is the product.</p>
<p>This economic incentive led to various design decisions on the part of social media companies. The longer people stayed on the platform, the more ads you can serve. The more ads you can serve, the more money you make. The first real innovation was phone applications and the death of the mobile site. If a company can force you to use <em>their</em> app, it provides them with a level of system access not always possible from a web browser allowing them to gather more information about your phone and what’s on it. After this we see the development of in-app browsers. Even if you got users to use the app, when they click a link they <em>leave</em> the app. By building a browser into the app, users are more likely to engage with content because they never left in the first place, and <em>all</em> their browsing information can be made available because they are browsing from inside the app. Lastly we have the death of inter-operability. By making it difficult or impossible to switch services, it creates a barrier to leaving the ecosystem and increases user retention.</p>
<p>Beyond these macro-level design decisions, we also have micro-level structures which incentivize retention through social pressure. Conflict, for example, is often unpleasant and unhealthy, but it drives engagement. A flame war can keep two users on the site for hours or days, and is likely to draw in supporters of various factions. A friendly conversation, however, is unlikely to spiral or go on very long. By creating an algorithm (or user interface) which incentivizes conflict, companies can boost engagement and, by extension, revenue. This is why posting outrageous opinions on social media is effective. Opponents want to argue against the obviously bad point, but in doing so, spread the original message far beyond its original audience. This happens on many platforms, and an entire genre of content has arisen where creators simply look for and comment on outrageous things people do on the internet, paradoxically amplifying the message of an ideological opponent. On a large scale we have seen this cause serious social and political harm, but it’s good for engagement metrics which makes advertisers and shareholders happy.</p>
<p>In contemporary Discourse, we talk a lot about the harms of social media, but this discourse treats social media as something that arose out of nowhere in the mid-to-late 00s. My criticism is that the popular perspective on “social media” is myopic and ahistorical. My goal in this essay is to situate the modern concept of “social media” within a larger historical struggle between particular ideologies of how we ought to socialize in cyberspace, and to demonstrate that contemporary problems are the result of <em>design decisions</em> which can be mitigated through <em>alternative</em> design decisions.</p>
<h2 id="mastodon-and-the-fediverse-as-a-counter-narrative">Mastodon and the Fediverse as a counter-narrative</h2>
<p>Earlier in this essay we considered the tension between corporate monetization efforts which restricted users and the hacker culture which sought freedom for users. This struggle did not end; it has been ongoing for 50 years. Old leaders have moved on, new members have joined, and the cultures and the geopolitics surrounding them have changed. The fundamental issue is still the same. In 1975, Rossman reflected on the implications of the first electronic social network, and it points to a critical problem of the social media era: how should community data be exchanged? In this section I present a counter-narrative to the information economy and demonstrate that social media technologies arising from particular discursive histories are not simply alternatives to mass social media, but political opponents.</p>
<p>Let’s return to Rossman (1975) who argued that Community Memory, the first electronic social medium, was inherently political. Prior to the internet, dominant modes of information access were “authoritarian/centralized”. These modes were libraries, newspapers, book publishers, government reports, etc. Rossman, in 1975, chose to describe these alternatives to Community Memory as not just “centralized” but <em>authoritarian</em>. Why authoritarian? What value is there in framing Community Memory as anti-authoritarian, decentralized, or democratic? Community Memory was established in 1973, two years prior to the end of the Vietnam War. The system served Berkeley, California, and the surrounding area which had a series of protests in the decade prior opposing the Vietnam War, opposing the draft, supporting the Civil Rights Movement, and supporting freedom of speech. Rossman (1975) points to the political nature not merely because Community Memory was decentralized, rather, he points to the political value of Community Memory as a liberatory technology which could counter the social ills perceived by the community who utilized it. This ideology is fundamental to the counter-narrative: software can be designed to facilitate political and social liberation.</p>
<p>In 1974, the US Government declared software to be covered by copyright, and corporations began to restrict the liberal culture of “shareware” with government-enforced legal penalties. This corporate-led restriction of previously enjoyed freedoms was unacceptable to the culture which was symbiotic with these corporation. In response, the community organized, establishing the Free Software Foundation and declaring their <em>Four Essential Freedoms</em> which they sought to secure and protect. This was not an idle statement, but part of a political plan of action. The GNU Project created cutting-edge software which is still foundational to much of modern computing and devised a legal strategy to infiltrate and liberate corporate codebases. Through the copyleft protections of the GNU GPL, this cutting edge software could be used by corporations, but they must publish any and all source code that interacts with it to anyone on request. This put companies in a dilemma. Either waste R&D funds solving a problem that is already solved for free, release proprietary source code to everyone, or break the law. Many companies chose to break the law, developers sued them, and won the liberation of entire code bases. But others choose to comply, recognizing the symbiotic relationship they have with hobbyists.</p>
<p>This strategy, pioneered in free software, was eventually imported into early social networks which themselves relied on free software to run. Usenet was largely symbiotic with ARPANET and the researchers working on the forefront of technological development. They socialized with hobbyists who learned about hacker culture and ideology. The rise of IRC led to the development of new political tactics for handling internal disagreements, with project forking becoming a political tool both as a threat and as a practice. Because the software was free and association was voluntary, irreconcilable differences could be resolved through fracturing. Unlike geopolitics which are predicated on finite territorial resources, there is no limit to cyberspace. People can, do, and should create new spaces which conform to the community they want to foster, and free software facilitates that process of community splits. Throughout this time of fracturing on IRC, the Soviet Union was collapsing as various Soviets declared their independence. IRC and other distributed social networks were able to bypass media blackouts and report on live events censored by governments in traditional (centralized/authoritarian) media.</p>
<p>Wikis came to prominence soon after, and questioned the concept of authorship and ownership of culture outright. Websites could be edited by anyone, and in fact people were <em>welcomed</em> to edit pages. Additions were initially not logged, and the additions rarely signed; it was all community property. Each text was univocal, but the result of a multivocal collaboration of unknown authors. Wikis grew to cover not just software, but metasociology, encyclopedias, news, dictionaries, and fandoms in various languages. The Wikipedia eventually created the Wikimedia Foundation which supports a federation of wikis all of which are dedicated to the mission of ensuring everyone everywhere has access to the sum total of human knowledge. Leveraging the strategy of free software, Wikipedia and the Wikimedia Foundation adopted copyleft licenses for what is the largest open assemblage of human knowledge in history, creating the same dilemma: waste funds writing the same content, allow your content to be reused, or break the law. The Wikimedia Foundations adoption of the Creative Commons licenses helped to popularize and normalize them leading to even wider adoption supporting an ecosystem of free cultural works. All the while, these wikis remain independent of each other: community-led and self-governing but aligned with each other in a common political purpose.</p>
<p>Since the 1970s social media have been organized around a coherent set of ideologies and often in contrast to dominant cultural notions of hierarchical organization. These social media are fiercely egalitarian and generally avoid hierarchies except where necessary to protect the community. They operate largely on systems of collaborative anarchy with few restrictions on users on the belief that most people will collaborate or simply not care enough to cause harm. They are communalist and often organized around a superordinate goal that gives a shared purpose. They are heterodox, at best allowing various ideological disagreements to exist in the community and at worst encouraging ideological minorities to establish and govern their own community. Lastly, these social media collectively constitute a fission-fusion society where membership is fluid and individual groups federate for common political goals. These principals are what underlie the counter-narrative which leads to the contemporary Fediverse.</p>
<p>The Fediverse, like Usenet and IRC, is decentralized, syndicated, and subject to local control. When a user posts to Usenet, the post is immediately available to everyone on their server as well as being forwarded to affiliated Usenet servers so that it is available to those users, and so on and so forth. Similar for IRC which grew out of it. Media which are part of the Fediverse operate similarly. Users belong to a local instance which manages posts for everyone within that community. When a user posts, it is immediately available to everyone on their instance, but instances are <em>federated</em>. Your post also gets sent to every instance that is federated with yours, allowing for propagation across instances. Federation is critically voluntary. Many groups have instances, and some groups are not aligned or outright hostile to each other. An instance, even an individual user, can refuse federation and prevent interactions with <em>everything</em> from another instance. This serves the dual role of giving material effect to a group’s ideology as well as providing a means to handle harassment.</p>
<p>The reliance on free software and open protocols encourages a fission-fusion society as a way of both marking affiliation and resolving internal disputes. Creating instances is not particularly hard or difficult; many people run their own instance on their website for just themselves. Moving between instances is also easy, allowing for migration as people or cultures change. The choice of instance is important primarily because it marks your affiliation. Users are identified by not only their username, but what instance they come from. This functions much like an IRC cloak. On IRC, the cloak demonstrates your affiliation with some project and confers some level of reputation proportional to the reputation of the project. Your instance likewise marks your affiliation, though not always with a project. Often it is with a topic or ideology. I am from fosstodon.org, marking me as aligned with the free software movement. Others are aligned with instances for artists, scholars, anarchists, etc. Some instances are large and well known, others are quite small; membership in some is restrictive, while others are open to membership from almost anyone. For those in-the-know, these features of your instance provide information on your trustworthiness. If you are from a well known instance with restrictive membership, that gives you a level of prestige distinct from someone who hosts their own instance or affiliates with a generic, open instance. An IRC cloak also conveys information about your interests and facilitates conversations; so too your instance. If someone is posting from an instance for linguists, I can bet we share an interest in linguistics and have something to talk about.</p>
<p>On the other hand, this ease of setting up an instance and migrating to it allow for internal disputes to be resolved by forking. If an instance is making decisions I disagree with, I can move to another or create my own. If some minority believes the instance should be run one way, they can create their own instance run as they wish while still federating with the original. If the dispute gets resolved, they can migrate back and abandon the fork. It allows for communities to self-govern without preventing collaboration or precluding reconciliation. This was a major problem with IRC forks: you cannot communicate across networks, so when a community forked they cut off communication with each other. The fediverse improves on this by allowing for continued interaction while still giving users and groups greater freedom and control over their policies and governance.</p>
<p>Additionally, members of the Fediverse have made particular design decisions in light of both past failures and contemporary problems with mass social media. Users new to Mastodon will notice that the software has a feature to “reblog” or “retweet” something, but there is no analogue to the “quote tweet”. This is by design. Quote tweets encourage conflict and harassment, and providing this as a built-in feature facilitates harmful interactions. If you have a comment to make about a post, then write a comment. Often quote tweets are used to display opinions the retweeter doesn’t like in order to criticize or harass the original poster, and worse, displays them to a sympathetic audience who may be motivated to <em>directly</em> harass the target. As we discussed, this kind of interaction <em>benefits</em> a corporation by increasing engagement and allowing them to serve more ads, but it is harmful to the users. Recall our earlier discussion on Rossman (1975) and the ideology that social software design is a political act. Mastodon instances are not monetized and there’s no incentive to keep users engaged by any means necessary. The design decision was made to not allow users to comment on posts they reblog as a way of creating a software that facilitates healthy interactions.</p>
<p>There are a lot of these design decisions, motivated in part by opposition to the tactics of the information economy but also motivated by ideological beliefs on software and socialization. These federated social media are organized around ideologies that are often opposed to dominant ideologies driving the use of mass social media, and many users participate, to some degree, as a way of participating in the political and ideological struggle between centralized and decentralized community data exchange. The Fediverse, its culture, software, and users, arise out of paradigms not in wide circulation and simply incompatible with the belief systems of some. That’s the point. My partner works for a tech startup, and sometimes he voices his frustration that a software library he wants to use for work is published under the GPL. His company has lots of proprietary code that they don’t want to release, but if they (even accidentally) use GPL code, they need to release their proprietary code. They don’t like that. That’s the point.</p>
<h2 id="epilogue">Epilogue</h2>
<h3 id="the-space-traveler">The space traveler</h3>
<p>You’re sitting at home, and you write in your journal. You go outside and see your neighbor. You both say hi and start making small talk. Another neighbor sees you and comes outside to join. You have fun, but the interaction comes to a natural end and you keep walking. You go to the library and look through the fiction section. Another library patron is browsing the same shelf. She asks if you’ve read this one. You have. You both talk about it and similar books in the genre. She pulls a book of the shelf; she thinks you’ll like it. You check out the books and leave the library. It’s getting dark. You pass the gym where you kept getting body-shamed by patrons and staff—you’ll never go there again. You go the bar to meet up with your friends. You show them the books you got. You have some drinks and let loose. You buy a round for the bartender. Last call. Bar’s closing. Into the night. You’re home. Who were they?</p>
<p>You’re sitting at your computer, and you write in your text document. You open your browser and see a #introduction post on your instance. You say hi and make some small talk. Another user sees the conversation and joins in. You have fun, but the interaction comes to a natural end and you keep scrolling. You go to the federated feed and scroll through posts from other instances. Someone on a literature-themed instance is talking about fiction. She asks if anyone has read this one. You have. You both talk in the comments about it and similar books in the genre. She tags an author who is also on her instance; she thinks you’ll like their new book. You grab your phone and head to the bar. It’s getting dark. You pull out your phone and see someone post from an instance where the gym bros who called you fat hang out—you block the instance so you’ll never worry about seeing that again. You look at what your friends have been posting today. You reblog the comment about the book so your friends can see. You have some drinks and let loose. You give $5 to help pay for server costs. Last call. Bar’s closing. Into the night. You’re home. Who were they?</p>
<p>A social medium is a place that facilitates social interactions. As we move between social media, we will discover that different media facilitate different kinds of interaction. How these interactions turn out is partly the result of how the space is designed, and it is partly how you decide to engage with the space. We go to spaces that foster interactions we like, and we avoid spaces that foster interactions we dislike.</p>
<p>Our interactions are shaped by our relationship to people and to space. I may not know all my neighbors well, but we are invested in the success of our common space and should avoid antagonism. When we go to new spaces, the people there likely share interests because everyone came there for the same purpose.</p>
<h3 id="conclusion">Conclusion</h3>
<p>Twitter and other mass social media facilitate the development and dissemination of a brand that content creators and the platform can exploit for income. They were built for this and are extremely successful at it. Likewise, there is a market for it, and many people desire the content or notoriety that these mass social media facilitate. I participate in society. For those who found success in the attention economy, the idea of a platform designed to prevent your success is obviously unattractive. If you’re playing Major League Baseball, you will probably not enjoy a sudden switch to coaching community little league. If you’ve built an audience on Twitter garnering triple-digit interactions on the regular, moving to a platform designed to <em>prevent</em> that doesn’t make much sense.</p>
<p>I have no such clout, and I doubt many people do. Most of us have small networks of friends we keep up with, and if we move to email, Discord, Mastodon, or IRC it won’t be catastrophic, just different. Not because of who we are, but because of the design of the space we all choose. The Fediverse aligns with my politics and the vision I have for the future of cyberspace. I believe that small, self-governing communities organized around shared interests can prosper—they have for all of human history and I believe they can online as well. On the Fediverse, I’m <em>from</em> somewhere. I have neighbors, and we work together to build a community through our labor, social network, and capital. Our monthly budget surplus is donated to other open source software projects so that we can support other communities who share our ideological alignment. It is, for me, political. I am not content to simply acknowledge the problems with mass social media and do nothing; I want to participate in creating its antagonist. Others who participate in the Fediverse may have no such motivation, but like Community Memory, the mere presence of this radical, alternative organizational system is a political act.</p>
<p>I encourage others to move to Mastodon because I think it’s cool. I like how it’s designed. I believe it is <em>good</em> for humanity and fosters healthy interactions. I want my friends on the platform I use. I want my friends to find and build communities that fit them. I want to see where my friends come from, where they move to, meet their neighbors. Not everyone will join me. Maybe they don’t like the platform, maybe they aspire to mass media, or maybe they disagree with the philosophy. That’s fine. We all have different goals. My hope in explaining this cultural history is that you will come to appreciate those who built these things and the culture you are moving into.</p>
<h2 id="references">References</h2>
<ul>
<li><span id="Bourdieu">Bourdieu, Pierre (1986), “<a href="https://www.marxists.org/reference/subject/philosophy/works/fr/bourdieu-forms-capital.htm">The Forms of Capital</a>” in <em>Handbook of Theory and Research for the Sociology of Education</em>. J. Richardson ed., Greenwood: New York.</span></li>
<li><span id="Brickhouse">Brickhouse, Christian (2021), “<a href="https://www.youtube.com/watch?v=4XWsUdDa0e0">WikiBreathing: how communities respond to changes in public engagement with wikis</a>”. Talk presented at <em>Wikimania 2021</em>.</span></li>
<li><span id="Colstad">Colstad, Ken and Efrem Lipkin (1975), “<a href="https://doi.org/10.1145/958785.958788">Community Memory: a public information network</a>”. <em>Computers and Society 6</em>(4).</span></li>
<li><span id="FSF">Free Software Foundation (2022), “<a href="https://www.gnu.org/philosophy/free-sw.html.en">What is Free Software?</a>”. Accessed November 6, 2022.</span></li>
<li><span id="Obeyesekere">Obeyesekere, Gananath (1981), <em>Medusa’s Hair: an essay on personal symbols and religious experience</em>. University of Chicago Press: Chicago.</span></li>
<li><span id="Rossman">Rossman, Michael (1975), “<a href="https://doi.org/10.1145/958785.958789">Implications of Community Memory</a>”. <em>Computers and Society 6</em>(4).</span></li>
<li><span id="Stallman">Stallman, Richard (1986), “<a href="https://www.gnu.org/bulletins/bull1.txt">What is the Free Software Foundation?</a>”. <em>GNU’s Bulletin 1</em>(1).</span></li>
<li><span id="Tkacz">Tkacz, Nathaniel (2014), <em>Wikipedia and the politics of openness</em>. University of Chicago Press: Chicago.</span></li>
</ul>Christian BrickhouseWith many users looking for alternatives to Twitter, Mastodon has grown in popularity, but for many people the decentralized social network can be confusing, lonely, and unattractive. This has led to a growing sense that Mastodon is only for Computer People™ or that these federated social networks have flaws which proponents refuse to acknowledge or fix. While Mastodon and other federated social networks like PeerTube and Pixelfed certainly have their issues, I argue that the underlying complaint is one of ideology and culture.A deep dive into a Decemberists song2022-05-14T00:00:00+00:002022-05-14T00:00:00+00:00https://christianbrickhouse.com/weblog/2022/05/14/decemberists-music-theory<p>I’ve always enjoyed <em>The Decemberists</em> and one of my favorite songs is “Everything I Try to do Nothing Seems to Turn Out Right” from their EP <em>Billy Liar</em>. It’s a haunting song where the narrator recounts a past relationship which started and ended poorly. Like any song by <em>The Decemberists</em> the lyricism is worth its own analysis, but this post looks instead at the musicality. What makes the music so haunting, and why does it pair well with the lyrics at all? I decided to try my hand at music theory and dive into a song I enjoy to see what makes it tick.</p>
<p><img src="/assets/Everything_I_try_to_do.png" alt="Musical score for the song Everything I Try to do Nothing Seems to Turn out Right" /></p>
<p>Let’s start with the chord progression. Based on <a href="https://tabs.ultimate-guitar.com/tab/the-decemberists/everything-i-try-to-do-nothing-seems-to-turn-out-right-chords-3129632">a guitar tab I found</a> and <a href="https://www.youtube.com/watch?v=EQvFAJ0nLrE">a lot of listening</a>, I was able to piece together the chords:
C C7 C6 F#dim C6 F#dim G G7 Gadd4 Gadd2 G. The 6ths, 7ths, etc are melodic so we can reduce this down to C F#dim C F#dim G (we’ll return to the melody later). Given the F#, we seem to be in the key of G making the chord progression: IV -> vii o -> IV -> vii o -> I. This looks like a common tonic->predominant->dominant->tonic progression, but with some noticable differences. Most obviously the initial tonic chord is missing. The song <em>starts</em> on the predominant. The harmonic function of the predominant is to bridge from the stability of the tonic to the tension of the dominant, but there is nothing to bridge. This implies a tonic that has yet to be heard, creating a sense that something is missing. The song proceeds to the dominant which creates a sense of tension to be resolved, but in the first instance we return to the predominant; the tension does not get resolved. The predominant again proceeds to the dominant which finally resolves to the tonic. So why does it feel like we didn’t go anywhere?</p>
<figure>
<img class="centered" src="/assets/harmony-function-diagram-major.svg" alt="Diagram of harmonic functions and chord progressions for the major scale." />
<figcaption class="centered">Copyright 2017 Robert Hutchinson. Licensed under the <a href="https://www.gnu.org/licenses/old-licenses/fdl-1.2.en.html">GFDL 1.2</a></figcaption>
</figure>
<p>The melody is where the song gets particularly interesting. I would argue that the song is in a <em>lydian</em> scale, but that’s not important for our analysis so let’s ignore sharps and flats for now. The main melody for the predominant and dominant sections is C B A F E F, and the main melody for the tonic sections is G F G/D B A G F D C B. In both sections the melody is simply a long slow decline down the scale. If we return to the original chord progression, this helps us understand the variations on the C and G chords. The chords generally remain static with the root played repeatedly across the melody creating a stationary feeling despite the melodic progression. We start on the root (C), descend to the 7th then the 6th. When the melody should hit the 5th, we instead decend to the 4th and change to the F#dim chord which has dominant function. This chord also contains C and so the harmony remains relatively static. Interestingly, the C we get in the F#dim chord is now a 5th rather than the root, so in the context of our walk down the scale, we still get the next degree, a 5th, <em>but it’s the wrong 5th</em>. The next note in our walk down the scale is E, the 3rd of the root C. This has been part of the C chords being played causing the melody to become ambiguous with the harmony, before going back up to F and transitioning back to an F#dim. This whole time, the harmony has been relatively static, the 1st (C) and 3rd (E) anchoring the melody as it drifts downwards towards them.</p>
<p><img src="/assets/Everything_I_try_to_do-melodies.png" alt="Melodies for the song." /></p>
<p>The motif continues in the tonic section with the descent starting on G and some wonkiness as we descend. Like the predominant section, the melody starts on the root of the chord, in this case G. It descends to the 7th with a G7 chord, but then returns to an inverted G chord. This creates an ambiguous melody as the phrase could proceed up to the 1st or down to the 5th. This ambiguity is not resolved as the Gadd4 chord contains both the 3rd (up from the 1st) and the 4th (down from the 5th). The ambiguity continues with the Gadd2 chord as the top part moves down to the added 2nd and the 4th moves down to the third. The top melodic phrase continues down to the root as the bottom melodic phrase remains stationary on the 3rd, resolving both before continuing to the predominant section. These two melodies are interesting when we consider one as a <em>counter-melody</em>. Distinct from a harmony, a counter-melody has its own melodic line that stands in a subordinate relationship to the main melody. But which is the main melody and which is the counter? Considering the wider melodic theme of walking down the scale, the bottom line seems to be the main melody as it continues down and resolves on the 3rd like the predominant melody. The counter-melody ascends before descending back to the note where both phrases started.</p>
<p>Throughout the intro and verse we see two ideas: (1) stagnation and (2) messing up. We see stagnation in the chord progression, harmonies, and melodies. Starting with the predominant, we begin in a liminal space between the stability of the tonic and the tension of the dominant. When we finally proceed to the dominant (a transition which breaks the expected melodic progression), we hesitate and return to the predominant before moving forward to the dominant and tonic. The whole time, the harmony is stationary. Through most of the phrase the C and E dyad are played along with every melodic step, and even when we move to the dominant the repetition of the same C holds back the tension. Lastly we see stagnation in the melody and counter-melody. The main melodic theme descends from the root to the 3rd which is also part of the monotonous harmony. The counter-melody in the tonic phrase offers a clearer picture as it ascends from the root only to return, ultimately going nowhere as the main melody resolves to the droning 3rd.</p>
<p>The second idea, “messing-up”, is more abstract but perhaps the most interesting considering the lyrics. The melodic ideas are simple in concept: step down the scale from 1st to 3rd while playing a C chord then do the same thing for the G chord. The song does not accomplish this melodic feat. The first measure sets up that idea and is accomplished successfully. The second measure however begins with an aborted dominant chord where the 5th is the root of the predominant and the root is the 4th of the predominant. We get our 1, 7, 6, 5, 4 progression, but the 5th and 4th are out of place. It’s the <em>wrong</em> fifth in the <em>right</em> spot and the <em>right</em> 4th in the <em>wrong</em> spot. Undeterred, the song continues to the 3rd in the next chord but in doing so misses the opportunity to resolve the predominant to the tonic. The tonic section builds upon this idea by introducing a counter-melody. The counter-melody is not “messing-up” in itself, but rather represents the narrator messing up their relationship. What should be a simple descending melody instead splits into two melodic lines, foreshadowing and ultimately mirroring the narrator and their former lover splitting at the end of the song.</p>
<p>But perhaps there is an upside here: viewing these as “mistakes” is only one possible perspective. An important aspect of the lyrics is that the narrator <em>feels</em> like nothing turns out right, and it is this anxiety that turns out to be self-fulfilling. The relationship described is not perfect, but there is nothing wrong with it either; it’s just fine. The narrator focuses on the aspects that are not perfect and uses this to reinforce their self-conception. When we view the music from this lens, we get a different perspective. The music is not stagnant, but instead has a vibrant melody moving within it. The melodic lines are not mistakes but interesting and complex variations on an otherwise mundane walk down the scale. The narrator’s framing restricts our analysis of the music in the same way that it restricts their own analysis of the relationship. To truly appreciate the song, and for the narrator to truly appreciate themself, we need to break out of an analytical frame which focuses only on the “mistakes”; as the card given to the narrator says: “try not to take it so hard.”</p>
<p>This song is a wonderful example of how musicality and lyrics can combine to create feelings that could not be conveyed alone, and ultimately yield insight into ourselves and our emotional development. I’ve always been moved by both the lyrics and music, and while I understood the lyrics rather well, I never sat down and analyzed the music on its own. I haven’t done music theory since high school, and I’ve never analyzed a piece of music at this level of detail before. It was educational, and I learned a lot about music theory. The best part though was just getting to spend some time picking apart one of my favorite songs, even if it didn’t turn out right.</p>Christian BrickhouseI’ve always enjoyed The Decemberists and one of my favorite songs is “Everything I Try to do Nothing Seems to Turn Out Right” from their EP Billy Liar. It’s a haunting song where the narrator recounts a past relationship which started and ended poorly. Like any song by The Decemberists the lyricism is worth its own analysis, but this post looks instead at the musicality. What makes the music so haunting, and why does it pair well with the lyrics at all? I decided to try my hand at music theory and dive into a song I enjoy to see what makes it tick.The Language Lab channel2022-04-21T00:00:00+00:002022-04-21T00:00:00+00:00https://christianbrickhouse.com/weblog/2022/04/21/lingcom-youtube<p>A short post, but exciting still! I’m excited to start on a new outreach project, <em>The Language Lab</em>, a YouTube channel
which will cover cutting-edge linguistics research for general audiences. Using creative non-fiction, multi-media demonstrations, and
expert interviews, <em>The Language Lab</em> will invite viewers to explore the frontiers of linguistic science and learn about the ideas and
passions which go into knowledge creation.</p>
<p>The project is still in its planning phase, but I’m excited for the support I’ve already received! The <a href="https://lingthusiasm.com/"><em>Lingthusiasm</em> podcast</a> has
offered a grant for start-up costs, and <a href="https://sites.google.com/berkeley.edu/eremirez/">Emily Remirez</a> is working on branding. I’ve been
talking with <a href="https://kbmcgowan.github.io/">Kevin McGowan</a> and <a href="https://www.zionmengesha.com/">Zion Mengesha</a> about their work to prepare
some early videos. I’m looking forward to working with more people and sharing the amazing work being done with the world!</p>
<p>A lot of my inspiration comes from <a href="https://www.youtube.com/channel/UCoxcjq-8xIDTYp3uz647V5A"><em>Numberphile</em></a> and <a href="https://www.youtube.com/channel/UC9_p50tH3WmMslWRWKnM7dQ">Adam Ragusea</a>,
two channels that I really enjoy. While they often deal with complex topics and questions, they work hard to engage the audience by sharing a researcher’s passion and
bringing us along for the ride. We all got into linguistics because it is interesting, and while the specific questions can become
pretty complex, the same curiosity that led us there can lead others there too. There’s a lot of work still to do, but I look forward to more updates on what’s to come!</p>Christian BrickhouseA short post, but exciting still! I’m excited to start on a new outreach project, The Language Lab, a YouTube channel which will cover cutting-edge linguistics research for general audiences. Using creative non-fiction, multi-media demonstrations, and expert interviews, The Language Lab will invite viewers to explore the frontiers of linguistic science and learn about the ideas and passions which go into knowledge creation.Regular expressions tutorial2021-09-14T00:00:00+00:002021-09-14T00:00:00+00:00https://christianbrickhouse.com/weblog/2021/09/14/regex-tutorial<p>I like regular expressions. Constructing a regular expression is like solving a puzzle, and they are a powerful tool for lots of situations. In my work on Chess software for MediaWiki, we use regular expressions to validate chess game files before commiting to the full parser in order to save resources. These game files are complex, but as a file format highly regular and well defined. You can see in <a href="https://gerrit.wikimedia.org/r/c/mediawiki/extensions/ChessBrowser/+/719449/3/includes/ChessBrowser.php">patch set 719449</a> the kinds of regular expression we use for validation alongside detailed documentation of how it works. The old version was one massive expression, and while it worked most of the time, the false positives and complexity made it hard to maintain. In the newer patch, it gets broken up into multiple expressions that act like filters, and overall the regular expressions are easier to understand and maintain. While the reasons for the change are related to that specific software, the difference between the old and new version exemplifies some useful insights that are helpful for those new to regular expressions, and I thought it would be useful to write up some similar advice I gave to another graduate student.</p>
<p>Their regular expressions were effective for the most part, and despite a couple missed matches, the main problem turned out to be a python coding error. Since I was asked for more general advice on regular expressions, I wrote them the following tutorial to help improve their expressions and better understand how to approach these kinds of coding problems in the future.</p>
<p>They had the following text from the Corpus of Contemporary American English, and wanted to parse it to remove all the names.</p>
<blockquote>
<p>@@4123272 @!MITT-ROMNEY- ( R ) , PRESIDENTIAL CANDIDATE : This is time for America to choose whether they want more of the same , whether unemployment above 8 percent month after month after month is satisfactory or not . @!BARACK-OBAMA# The choice in this election could not be clearer and it could not be bigger , the stakes could not be bigger . @!UNIDENTIFIED-MALE# You can do it ! @!OBAMA# I know . @(END-VIDEO-CLIP) @!SCHIEFFER# Or can he ? We will talk about it with Republican John McCain , who found out in 2008 what it is like to run for president when the economy turns bad . We will talk politics with Haley Barbour , the former chairman of the Republican Party , and the Senate ‘s number two Democrat , Dick Durbin . And what is the fallout from the bombshell leak about Chief Justice John Roberts reversing his position to ensure that the court upheld the president ‘s health care plan ? Will it affect the court ‘s future deliberations ? CBS News Jan Crawford , the reporter who broke the story , has a follow-up @ @ @ @ @ @ @ @ @ @ House correspondent Norah O’Donnell will also join us for analysis on that and more . And at baseball ‘s All-Star Break , we will talk about America ‘s game with historian Doris Kearns Goodwin , Sports Illustrated ‘s Frank Deford , former all-star Harold Reynolds , and ESPN ‘s Jayson Stark .</p>
</blockquote>
<p>There are two important insights for regular expressions. First is to think in groups rather than words or characters. Consider the examples you gave below of inconsistent speaker identification. Think of it as two groups: a name and (optionally) a title. Using some pseudo code we can conceptualize all of your examples as: “(NAME), (TITLE)”. Now, within those groups, what are the regularities? Well, we know that names are pretty regular, and titles can be just about anything, so it’s probably best to focus on the name group and just match whatever is left over using the non-greedy wildcard (i.e., <code class="language-plaintext highlighter-rouge">.*?</code>).</p>
<p>Anyway, this gets us to the second insight: think of regular expressions as filters on a text. You want to start by matching the largest chunks of text and then match slightly more specific text, and so on until you are left with what you want. Part of your problem, I think, is that you started by matching specific text first, namely, the “@”. What you’ll notice in the example is that all speakers begin with “@!” and this is incredibly useful for matching!</p>
<p>With these in mind, we can look at the example source to try and make sense of it a little more. What is the most general group that we can start filtering by? We see the “@” repetition, so that’s likely to be helpful, then we notice the “name title” pattern. Looking closer we see that “name title” group is always preceded by “@!” and always terminated by either “#” or “:”. So our most general filter is:</p>
<p>(1) <code class="language-plaintext highlighter-rouge">r"@!.*?[#:]"</code></p>
<p>This works, but it is way too general and opens us up to a lot of false matches. It also doesn’t match bare “@” or “@” followed by digits but we’ll get to that in a later filter. so we have this group, how can we break it down further? Well, we already have the “name title” pattern, so let’s use that:</p>
<p>(2) <code class="language-plaintext highlighter-rouge">r"@!.*?,.*?[#:]"</code></p>
<p>We can improve on this further by limiting what gets matched in these groups and how we define them. They got to about here it seems, using <code class="language-plaintext highlighter-rouge">r"[A-Z1-9\-\s\'\(\),]+"</code> but we can not only make this more elegant, we can do it with more precision. They were trying to work around the problem of “…ROMNEY- (R), …” since that breaks our pattern, but the buggy-expression collapses the groups and just tries to match the whole thing. Instead, what if we redefined our groups? As written, our groups are NAME and TITLE delimited by “, “ and being surrounded by some symbols. Since we have a pretty good idea of name patterns, what if we defined the groups as NAME and NOTNAME? We take the chunk from (1), match the name using a stringent expression, and then match the rest so that we don’t need to worry about the “- (R), “ problem. This gets us:</p>
<p>(3) <code class="language-plaintext highlighter-rouge">r"@![A-Z0-9\-]+.*?[#:]"</code></p>
<p>But we can clean this up a bit more. Names are composed of alphanumeric symbols, “word characters”, and regex has a short-hand notation for that: <code class="language-plaintext highlighter-rouge">\w</code>. We know that these sequences of word characters (some people call them names) are delimited by a hyphen, and that this pattern repeats for as many names as someone has. So we can simplify the above to:</p>
<p>(4) <code class="language-plaintext highlighter-rouge">r"@!((\w+)-?)+.*?[#:]"</code></p>
<p>Now, it may not look “simple” but it has the advantage of conveying all the insights we compiled above. The string we are looking for is composed of one or more word characters, <code class="language-plaintext highlighter-rouge">\w+</code>, followed by 0 or 1 hyphen, <code class="language-plaintext highlighter-rouge">(\w+)-?</code>, with one or more such sequences in a row, <code class="language-plaintext highlighter-rouge">((\w+)-?)+</code>; that regular string is initiated by “@!” and precedes some unknown sequence of characters terminated by either “#” or “:”.</p>
<p>So why go through all that, and why prefer (4) over (3) which is roughly equivalent? The reason is that (4) is easier to debug and extend. If you run into names that are separated by a space instead of a hyphen, you know exactly what to fix, use <code class="language-plaintext highlighter-rouge">[\-\s]?</code> instead of <code class="language-plaintext highlighter-rouge">-?</code>. If you find that these speaker codes are terminated by something other than <code class="language-plaintext highlighter-rouge">[#:]</code>, you just add it to the list. If you find a pattern within the NOTNAME group, you replace <code class="language-plaintext highlighter-rouge">.*?</code> with something more specific. Etc, etc. You can even run similar patterns multiple times if you find that just adding to the list matches the wrong things.</p>
<p>Now that we’ve got that big chunk out of the way, we’re left with “@@13352534 … @ @ @ @ @ …” (I made the number up, it won’t matter). Now that we’ve removed everything else, these are simple to catch because we don’t have to worry about false positives now that the “@!…” speaker tags are gone. We see that an “@” can be followed by nothing, or it can be followed by some number of digits. Regular expressions has a symbol for digits,<code class="language-plaintext highlighter-rouge">\d</code>, so we can just do:</p>
<p>(5) <code class="language-plaintext highlighter-rouge">r"@(\d+)?"</code></p>
<p>And that gets all of them. Importantly, (5) will match all the “@” symbols that start speaker tags, and if you run it before (4) then (4) will not work. This is why they’re like filters. By handling the bulky stuff first, we can use simple expressions for the truly simple stuff. If the big filters can get the fine stuff too, then that’s just a bonus. By thinking about patterns in terms of groups, we are able to complement this filtering by capturing chunks, refining those chunks into more chunks, and so on and so forth until we get an expression that not only works but has a semantic relationship to the thing it is matching.</p>Christian BrickhouseI like regular expressions. Constructing a regular expression is like solving a puzzle, and they are a powerful tool for lots of situations. In my work on Chess software for MediaWiki, we use regular expressions to validate chess game files before commiting to the full parser in order to save resources. These game files are complex, but as a file format highly regular and well defined. I thought it would be useful to write up some similar advice I gave to another graduate student.Now with more links!2021-09-10T00:00:00+00:002021-09-10T00:00:00+00:00https://christianbrickhouse.com/weblog/2021/09/10/more-pdfs-and-software<p>Now that my proposal has been sent off to the committee, I finally had time to finish the bits I talked about in <a href="/weblog/2021/08/26/website-redone-reprise.html">my last post</a>. Have a look! You can <a href="/cv">see my CV</a> as an embedded PDF and read <a href="/software">a list of my software contributions</a>. I also fixed some links in the previous post that gave 404 errors.</p>Christian BrickhouseNow that my proposal has been sent off to the committee, I finally had time to finish the bits I talked about in my last post. Have a look! You can see my CV as an embedded PDF and read a list of my software contributions. I also fixed some links in the previous post that gave 404 errors.Website redone-er2021-08-26T00:00:00+00:002021-08-26T00:00:00+00:00https://christianbrickhouse.com/weblog/2021/08/26/website-redone-reprise<p>Astute readers will notice the solid year gap between this and the last post. Turns out that with the pandemic and multiple moves I forgot that I had set this up. Well, wait no more: I’ve updated most every part of the site. As of writing, I’m still working on embedding PDFs <a href="/cv">for my cv page</a>, cataloging my software contributions at <a href="/software">my software page</a>, adding some long-form citations to various pages, and in the long term I need some new headshots so that I can decorate the site with pictures of my face. I’ll try to update the blog more often, probably with content that’s too long for twitter. We’ll see how that goes.</p>Christian BrickhouseAstute readers will notice the solid year gap between this and the last post. Turns out that with the pandemic and multiple moves I forgot that I had set this up. Well, wait no more: I’ve updated most every part of the site. As of writing, I’m still working on embedding PDFs for my cv page, cataloging my software contributions at my software page, adding some long-form citations to various pages, and in the long term I need some new headshots so that I can decorate the site with pictures of my face. I’ll try to update the blog more often, probably with content that’s too long for twitter. We’ll see how that goes.Website redone2020-07-07T00:00:00+00:002020-07-07T00:00:00+00:00https://christianbrickhouse.com/weblog/2020/07/07/website-revamp<p>After a number of years of my old website, I’ve revamped it to not only be easier for visitors, but easier for me to maintain.
It uses Jekyll on Github and was created using <a href="http://jmcglone.com/guides/github-pages/">a guide</a> by Jonathan McGlone. If you’re looking for a way to update your website easily, I recommend reading through it as it’s geared towards people who may not be familiar with continuous integration practices.</p>Christian BrickhouseAfter a number of years of my old website, I’ve revamped it to not only be easier for visitors, but easier for me to maintain. It uses Jekyll on Github and was created using a guide by Jonathan McGlone. If you’re looking for a way to update your website easily, I recommend reading through it as it’s geared towards people who may not be familiar with continuous integration practices.