Participation and the Problem of Measurement

Participation and the Problem of Measurement

Ryan M. Omizo

Abstract

This chapter extends Critel's study on the assessment of participation in composition classrooms. Critel argues that the issue of participation in the classroom is fraught, especially when grades are involved. Based on Critel's research, teachers often define the units of participation ambiguously and calculate these units intuitively rather than with evidence-based metrics, which can result in participation as a proxy for attendance or as a way of disciplining behavior. This chapter interrogates such commonplaces by offering a reconfigured approach to understanding participation as an emergent macro-phenomenon, which can be evidenced through shared textual production and offers a novel method to measure this form of participation through computation. Consequently, this chapter proposes a series of techniques for understanding participation in the composition classroom both rhetorically and computationally in the form of network graphs and network graph metrics, applying them to a case study of an online discussion forum.

Introduction

"Participation is defined when the author believes he or she is deviating from a standard definition, which demonstrates that there is assumed to be a common definition with which all writing instructors would be familiar. The standard definition is largely based on oral contribution in the classroom or a conference with the instructor.

"The fact that participation is discussed as a result of a change the instructor is making demonstrates the null state of student participation. It is a vessel, waiting to be filled. The baseline version of participation need not even be discussed because it’s something with which we are all familiar." (Critel 55)

When Critel's dissertation first began taking shape, we had several conversations about how I incorporated participation in my syllabus and grading rubric. As an early informant for Critel's research, I reported that I did grade for participation, and that this grading asked students to be "prepared to work each day in class," but I was not working with a concrete rationale for how this participation metric might best facilitate learning. I had no theory of participation.

My inclination to include participation as a graded part of the class, I said, derived from my experiences with participation components as an undergraduate student, sample syllabi that I had encountered during course preparation, and the feeling that students would be less motivated to complete daily classwork without the incentive of grades. Thus, my understanding of participation as a commonplace of the composition classroom would be considered lore (North)—as sedimented practices enacted out of convention of routine.

Critel's findings suggest that formal assessment of participation in the composition classroom trends towards "holistic" and "impressionistic" scoring with one respondent reporting that he/she does not feel the need "to count student participation in any particular way" (150-152). Critel argues that this trend in assessment has led to the phenomenon of grade inflation. The metric of participation grants instructors "wiggle room" to bump up grades to compensate for other scores. Thus, participation as an assessment category seems to be generally characterized by its fuzziness—one that allows instructors the flexibility to finesse the application of their own grading scales to fit specific situations. A dilemma arises when such informal or tacit conceptualizations are formalized into grading scales and oblige certain behaviors (primarily oral communication in class, according to Critel (55)). Fuzzy definitions of participation (such as my own "prepared for class") can lead to inconsistent applications of the participation standard by instructors and students, which are then crystalized in the form of grades. This situation is made even more complicated because teachers are hard-pressed to record all incidents of classroom participation as part of routine course interaction. Moreover, some valuable forms of participation might be best enacted outside the attention of the instructor. For example, students discussing their understanding of an assignment before querying the instructor could be considered a strategic and collegial form of classroom participation that leverages communal knowledge and expedites the question and answer process. In this case, instructor monitoring of such activities may inhibit the very community building mandated classroom participation is supposed to achieve.

The quote from Critel that begins this chapter highlights another problematic feature in the measurement of participation: the notion that all participants understand what participation means in the context of specific classrooms. Critel's research enumerates several reasons why we should be suspicious of this assumption. For this understanding of participation to work, students would have to come from similar cultural and linguistic backgrounds in order for them to negotiate the participatory space in the classroom (Critel 117). The presumption that students can understand participation as instructors have sketched it in syllabi or described it throughout courses also assumes that these students come to college with the expertise to participate and are ready to be judged. Barthalomae's "Inventing the University", a guiding article in the field, suggests that these expectations may be misplaced. Participation, rather than a de facto concept, is better considered a term with which students must regularly negotiate.

Critel recommends that instructors state the requirements for participation in their syllabi and establish consistent systems of grading to address both grade inflation and potential ambiguity in the performance of participation. This would help clarify the concept of participation for their students and apply a standardized rubric for assessment. However, I think larger questions loom about how composition classrooms operationalize and measure participation, especially given Critel's finding that 53.3% of survey respondents said that the instructor was "Always" and 45.9% "Sometimes" the primary audience for student participation performances. If participation is a floating, context-dependent concept that emerges from the interrelationships between classroom agents, then does focusing on transactional activities between individual students and instructors (e.g., calling on a student for answers) present the best method for sampling this web of agents and actions? Could there be other means to understand how students are participating in composition classrooms that capture more global perspectives, which would then provide additional context to understand individual performances of participation?

In order to pose these methods, we need to invent a research object that is indicative of global rather than individual performances of participation and assessable in consistent and replicable ways. The research object I propose in this chapter are in the purview of all composition instructors: textual interactions between agents on common topics. Of course, writing output is not a novel means to measure participation. The effort expended during freewriting or brainstorming or mandatory discussion board posting on topics of the day often factor into participation grades. However, rather than treating such texts as individual student tasks, I argue in this chapter that such texts taken together can be reflective of how participation emerges globally from an assortment of actions.

Taking an aggregated look at the discursive practices surrounding common prompts can provide instructors insights into how participation is occurring class-wide. At the same time, this approach can be complicated by the ways that instructors are obliged to interpret such texts—sequentially, one at a time, with a focus that excludes context and other relational features and with biases that privilege instructor concerns. The alternative that I propose is the use of computation--specifically natural language processing and network graph analytics--to read and represent texts and participation in a different way. While computational analyses are never free of biases and often produce blunt interpretations of human communication, the application of computation on natural language does afford the ability to rapidly process texts in regularized ways, leading to aggregated views and insights that may are not easily derived through conventional, human reading practices.

The specifics of this computerized processing constitute the bulk of this chapter. In brief, I will be using natural language texts and turning these texts into a directed network graph in which words are nodes and the proximal relationships between these words create edges. Doing so, we can take a more global look at how participation is occurring.

Methodology

Toward a Computational Rhetoric

In Inventing Computational Rhetoric, Wojcik posits a discipline of computational rhetoric that musters theories from rhetoric, philosophy, natural language processing, big data, and computational linguistics and produces techniques and applications used in the service of automated and semiautomated text analysis and argument generation. Wojcik's project asks the field of rhetoric to see computation as a means to complicate our theories through the use of machine processing. Similarly, Hart-Davidson and McLeod augur a future for writing that involves microrobots that automate repetitive writing task for users (e.g., filling out forms). Hart-Davidson and McLeod offer a sketch for an app called Garden—a tool that computationally organizes collaborative tasks and provides user feedback on the status of these tasks. These writing bots would serve under the direction of users, facilitate communication, and streamline the completion of tasks.

Perhaps the most sustained work in the area of computational rhetoric has been done by David Kaufer and Suguru Ishizaki, the creators of DocuScope. DocuScope is a rhetorical text analysis platform that relies on a dictionary-based approach to reveal rhetorical patterns present within a text or within a collection of texts (Ishizaki and Kaufer). Much like a human being looking up a word to uncover multiple meanings and usages, the dictionary approach employed in DocuScope measures the concordance between an input text and a database of word strings (single words and word phrases) annotated with multiple rhetorical code categories. These categories aim to capture sense and function of linguistic units. Ishizaki and Kaufer term these strings and associated code categories as "Language Action Types" or LATs. One example provided by Ishizaki and Kaufer of an LAT is "time duration," which describes the marking of spans of time (e.g. "over the last ten years"). A composite of recorded LATs are then statistically analyzed for patterns that might be said to govern the form of the text (Ishizaki and Kaufer, n.p.) or, as Collins et al. argue, "prime distinct representational effects for readers" (18). Taken holistically, these patterns of LATs or priming mechanisms could be indicative of macroscopic genre features. Such patterns might reveal differences in style among different discourse communities or differences in style within the same discourse community (Ishizaki and Kaufer, n.p.).

Like Ishizaki and Kaufer and Collins et al., Hart explores genre through computational text analysis. Hart, employing the dictionary-based program DICTION, proposes a means to differentiate genres such as legal and financial reports through the prevalence of words that imply certainty of argument and "narrative force" (159-163). The presence of word strings that imply "tenacity," "insistence," and categorical "collectives" of objects create a standard score for certainty; meanwhile, word strings that focus on action, time, space, and human characters combine to a standard score indicative of narrative or storytelling. The level to which a text is certain and narrative in nature are then correlated to existing genre categories. Hart's findings are suggestive. For example, DICTION reveals that genres of news reporting and political discourse (Hart's terms) trend strongly toward narrative rhetorical conventions and high certainty. In contrast, scholarly essays evidence fewer narrative elements but equal certainty as defined by DICTION (for more information on DICTION and an overview of other automated text analysis programs see Hoffman and Waisanen).

In the field of composition, Susan Lang and Craig Baehr argue that the amount of data accrued by composition departments make for valuable but unwieldy datasets, not easily parsed in the investigations of outcomes and assessments. Data and text-mining approaches can obviate that by processing document archives that would be too large for human researchers to efficiently handle. Lang and Baehr frame composition in terms of the "big data" movement, but I would venture to say that "big data" is a relative prospect and that tracking class-wide participation on a daily basis while moderating other classroom activities represents a "big data" (or, at least a "medium data") challenge for instructors whose goal is to account for and improve participation levels in their classroom.

Though the techniques proposed in this chapter are still exploratory in nature within the research methods of rhetoric and composition, my hope is that this computational orientation towards analysis will expand upon existing scholarship and answer calls issued by Matthew Jockers and Chris Anson (making similar cases from different disciplines) that attempt to spark humanities research in new, more robust directions.

In Macroanalysis: Digital Methods and Literary History, Jockers argues that the research game in the humanities has changed with the availability of large corpora of digitized texts. To paraphrase Jockers' argument, no longer can scholars be content with methodologies that rely on the close reading of small samples of exemplar texts to explain an entire period of literary history. Current technologies of data management, search, and statistical wrangling afford scholars the ability to sample more broadly and arrive at more comprehensive conclusions (7-8). Further, humanities scholars should take advantage of these opportunities to develop new research methodologies that might make analyses more representative of the wealth of extant textual data that otherwise would be impossible for an individual scholar to parse by hand. Jockers targets literary studies in his argument, but the relevance to composition and rhetoric should be clear: writing classrooms generate numerous texts, much of which is siloed within the confines of individual classrooms and only read by the assigned instructor. How representative is our sampling of data in the field of composition? How generalizable are our findings for the field? How can we increase the scale of composition research in order to make our findings more generalizable? Although computational or statistical methods contain specific affordances and drawbacks as all methods do, such methods do offer us a means to dramatically change our vantage points on familiar texts and with speed and efficiency. The national studies of writing initiated by Connors and Lunsford and Lunsford and Lunsford no longer need to be the exceptions in composition research because data-mining software has significantly reduced the human labor involved in the preparation and dissemination of data.1

Computational approaches also respond to Anson's (21) urging for evidence-based or RAD (Replicable - Aggregable - Data-supported) scholarship (see also Haswell) in the theorization of composition as a means to test disciplinary truisms and defend against external naysayers. In the short run of this chapter, my aim is to define participation according to a narrow set of quantitative and qualitative parameters and offer a series of methods that will enable others to measure participation as a stable concept. In the long run, however, my goal is to inspire work on computational methodologies that might be generalized for use by other researchers on other datasets. This chapter, and this project, is an appeal to rethink our longstanding research and pedagogical commonplaces, just as Critel does in a dissertation that marshals the best of empirical data-gathering, exploration, and the rhetorical tradition.

From Text to Network Graph

. . . the metaphor of the Text is that of the network. (Barthes 161)

In "From Work to Text," Barthes ushers hermeneutic practices into the poststructuralist era by putting into tension two conceptions of writing—the "Work," which is defined by its authorial origins and membership in a tradition of other "works" and the "Text," which describes a field of intersecting meaning-making activities. My interest here is Barthes' comparison of the text to that of a network. The image of the network has garnered considerable interest in the field of rhetoric and composition by the likes Hauser, Johnson-Eilola, Dingo, Spinnuzzi, Hawk, Porter, Rice ("Networked Boxes"; "Urban Mappings"). For example, in "Urban Mapping: The Rhetoric of the Network," Rice draws on Latour's configuration of the network as "bodies of relationships that shift as new bodies are introduced or subtracted" (205). In "Networked Boxes: The Logic of Too Much," Rice describes networks as a "deep interwingling, information transforming other information" (306). The analytical move derived from this framework is to associate rhetorical actions with generalized network attributes—nonlinearity, shiftiness, information flow, openness, expandability, associativity. The term "network" serves as an ideograph substantiating claims and authorizing a particular view of a text as distributed, relational, and ramified with other ideas, texts, and institutions. Further reinforcing this point is the priority Rice places on a particular view of networks:

the power of networks comes not from the identification of certain "things" and how they connect, but from the process of connections themselves. Generalized to a "thing" like a city space or map, the emphasis shifts from pure analysis or representation to working with the types of connections that may or may not be generated within the space's various processes. The emphasis, in other words, is rhetorical as it teaches another perspective regarding how spaces are organized, arranged, and delivered. (Rice, "Networked Boxes", 206)

In the above excerpt, Rice is interpreting Latour, but the affiliations with Barthes' ideas about the text as network are also applicable. For Barthes, the "Text" resists the boundaries that the "Work" attempts to erect for its own consecration; for Rice, it is not the thingness of the network, but the potential energy of the network to arrange connections that is important.2

My focus in this chapter is to explore in more detail the "things" of networks and how these "things" connect"—even if these connections offer static vantage points on texts. I am interested in understanding rhetoric in terms of mathematically derived network properties—which might include density of network relations and information flow between nodes—as chartable and replicable values.

To do this, we need a graph.

Why a Network Graph?

By graph, I mean network graphs, which organize relationships between entities as collection of nodes joined together by edges. Entities could be anything: people, cars, corporations, warehouses. Edges refer to any conceptual relation that could obtain between the components of the network. Figure 1 below demonstrates this type of graph by depicting a small network of friends, or, a social network.

Social Network Example
Figure 1. Social Network

The nodes in Figure 1 represent people. The edges connecting each node-person indicate a tie of friendship. In this network, Jane is a direct friend of Katie, Jim, Bill, and Mary. We might also say that Katie's social network also includes Bob because Bob is a direct friend of Mary. For Katie, Bob is a friend of a friend.

Figure 1 denotes a sociogram. Employed by Jacob Moreno at MIT in the 1930s, sociograms visualize associations between social actors as points and lines as a way to formalize configurations such as roles and influence (Scott 13-14). One formal approach to Figure 1 that will be familiar to users of Facebook or LinkedIn is to treat it as an ego network. This obliges us to focus on particular actor-nodes and determine relationships between the other nodes in the network just as Facebook users can map their friends or Google+ users can measure the size of their "circle" (see Newman, "Scientific Collaboration Networks. II. Shortest Paths, Weighted Networks, and Centrality" for more on ego networks). Let us choose Jane as our node-person in this ego network. It is clear that Jane has the most connections in the network if we count the lines that emanate from her and compare that with those of Katie, Jim, Bill, Bob, and Mary. These counts of edges indicate Jane's overall degree in the network. As shown by Figure 2, the sum of Jane's degree is 4.

NODE DEGREE
Jane 4
Mary 3
Katie 2
Bob 1
Bill 1
Jim 1
Figure 2. Degree Distribution

If we were to define a qualitative metric such "popularity" in term of degree distribution, we might conclude that Jane is the popular person in the social network (for more discussions of degree see Freeman, "A Set of Measures of Centrality Based on Betweenness"; Opshal et al.; Wasserman and Faust). In addition, we can infer that she not only is friends with the most people, but is the necessary go-between for Bill and Jim if they wish to meet Bob, Mary, and Katie. I will discuss the significance of this brokering position in greater detail later, as it is crucial to my formulations and experiments.

In addition to depicting a social network, Figure 1 also is what we would describe as an undirected graph. In this case, the friendship relationship between pairs of nodes are equal. Jane is just as much a friend to Mary as Mary is to Jane.

There is a possibility that the model could contain directed relationships. In this case, edges would represent relationships that operate in only one direction (from-to). If we change the context of Figure 1 to that of a network of Twitter users and add directionality to the edges we see a dramatic change in the nature of the social grouping.

Directed Social Network Example
Figure 3. Directed Social Network

The direction of the arrows in figure 3 point to which actor is being followed. Thus, Mary is being followed by Bob, Katie, and Jane. A count of the edges joining each node in the network reveals that the degree distribution has not changed; however, the relationship between the nodes has. Although Jane still possesses the most links to other actors in the network, her status in the network has shifted. Jane is following five other actors without reciprocity. One could argue that she is now the least popular member of the network and that all the other members are more popular. Even outliers such as Bill and Jim could be considered more popular because Jane is following both of them. With three direct followers, Mary might be considered the most popular member of this social network.

In terms of rhetorical analysis, the degree distribution and directional edges in Figure 3 can help contextualize interactions in the network based on how information can be relayed. If this is a group of Twitter users, then the information that travels throughout the network moves in reverse of the directed edges. Jane will regularly receive tweets from Mary, but Mary will only see Jane's tweets if Jane tweets at or replies to one of Mary's tweets. We can also see that actors such as Bill and Jim are completely cut off from Mary, Bob, and Katie as well as each other and can only receive information from that half of the network if Jane tweets at or replies with quoted material from Mary, Bob, or Katie. While the trope of network openness and commutability is still operant, the reality of the situation is that particular structural relationships have formed that dampen the potential for openness and, thus, for rhetorical exchange

We can also see from this example ego-network how graph theoretic network analysis can enlighten us about forms of participation. In the case of Figure 3, the degree distribution and directionality of the relationships show how participation is constrained by the follower-followee relationship. If we added more data to our example network we might be able to expand our use of the network graph and draw further inferences. The nodes in Figure 4 represent the same edge-relations as Figure 3, but now contain the average public tweets each actor sends per day.

Example Directed Network of Twitter Users
Figure 4. Average Tweets

Mary, now the most popular person in the network, is also the most active tweeter. Bob, Katie, and Jane hew closer to the mean tweets per day (17.5). Bill and Jim fall on the lower end of the distribution. The sample size for this example network is too small to extrapolate any meaningful statistical significance from these numbers, but we might begin to make qualitative claims about the roles and motivations that are animating network interactions—motivational inferences being at the heart of rhetorical production and inquiry. First, we might say that Mary, with the highest follower count, tweets in part to maintain this audience. Or, we could claim that her proportionally large follower count derives from her regular participation in the network. Inversely, we might trace Bill and Jim's more marginal positions due to their lack of active tweeting and following. The interesting anomaly here is Jane who is the most active follower and the second most active tweeter. I will refrain from positing anymore hypotheticals about Jane's rhetorical actions in the network and simply say that there could be multiple reasons why such an active participant is not attracting more followers. What is important is that network analysis can reveal actors such as Jane and focus the investigations of analysts.

Indeed, the inspiration for my example is a study of Usenet discussion board posters conducted by Wesler et al. Wesler et al.'s (n.p.) study seeks to use network visualizations as "structural signatures" that might predict whether or not a discussion board poster occupies the role of an "answer person" or a "discussion person." Part of this analysis relied on the use of directed graphs and degree distributions, backed with content analysis of a selection of Usenet threads. Wesler et al.'s findings suggest that the behavior of "answer people" and "discussion people" produce distinct network patterns that can predict roles. "Answer people" tended to have local network ties to many other participants ("question seekers"), who themselves had few ties to others in the network. Moreover, the ties between "answer people" and "question seekers" were thin, involving few exchanges. This makes a great deal of intuitive sense. "Answer people" in Wesler et al.'s study respond to questions. Once the question is answered, the interaction between the "answer person" and "question seeker" dissolves. The "answer person" then presumably waits for the next question; hence, the lack of ties to other actors with high degree.

"Discussion people," on the other hand, demonstrated high degree of ties with other actors who also had a high degree of ties in Wesler, el al.'s study, suggesting that discussion people are more interested in wide-ranging conversations with other like-minded individuals.

This type of sociological study is a mainstay of network analytical approaches. In the composition classroom, analysis of questions and responses in discussion board postings in course management systems could help visualize student labor as a means of assessment (Inoue, 83-84; Inoue, 177-236)

We can also use network graphs to analyze the words of texts.

From Network Graph to Text Mining

The precedent for using network graphs for text analysis has been established by Carley; Kliennijenhuis, De Ridder, and Rietberg; Sack, Corman, Kuhn, McPhee, and Dooley; Popping; Newman ("Analysis of Weighted Networks"), Diesner and Carly; and Tabayong and Carly. Uniting these efforts is the attempt to extract the semantic meaning of texts by examining the linkages that obtain between words, word phrases, or concepts in a network. For example, Kliennijenhuis, De Ridder, and Rietberg examine the manifest content of Dutch newspaper articles on the state of the economy in the Netherlands by segmenting the natural language text into "nuclear sentences" (i.e., a sentence that grammatically connects a subject to an object under the influence of the subject (193)). The linkage between the subject and the object of sentence represents an instance of relational meaning that could be coded as positive, negative, or ambiguous. In addition, several nuclear sentences can be chained together, and their linkages counted and weighed in a directed graph to uncover "indirect" relationships between sentences. Carley's (and her collaborators') work has focused on a different level of abstraction. Textual units such as words or phrases indicate single ideas. The combination of two or more ideas forms a statement, which is then graphed. The associations between statements create the network, from which larger conceptual categories can be extrapolated to create cognitive mappings of the text.

My approach follows this track; however, as a rhetorician, my aim is to explicitly identify rhetorical moves and then reconcile that discovery process to the themes of participation. Specifically, I am interested in finding topoi.

Topoi can, in the words of Michael Leff, refer to “recurrent themes in literature, to heuristic devices that encourage the innovation of ideas, to regions of experience from which one draws the substance of an argument” (23-24). Carolyn Miller describes topoi as “a point in semantic space that is particularly rich in connectivity to other significant or highly connected points” and as an “aid to pattern recognition, specifically as a region that permits or invites the connection between the abstract and the concrete” (142). Thus, topoi can be viewed as elements within discourse that facilitate the transmission of information from a rhetor to an audience through the invocation of familiar or agreed upon frames of meaning-making—whether these frames refer to a literary genre to attract a devoted readership, stock language games that can give form to a business memo, or commonsense understandings of the world that might substantiate a proposition.

How do we graph topoi?

Let us return to our example social network and reconsider the metric we have been discussing: degree. We have already seen how measuring node-degree can enable a more comprehensive view of the rhetorical situation of communicative exchange. For the sake of space, let us reduce the size of the network.

Betweenness Centrality Example
Figure 5. Friends Network

Figure 5 sports five nodes (Mary, Jim, Bob, Jane, Katie) and four edges ({Mary, Jim}, {Jim, Bob}, {Bob, Jane}, {Jane, Katie}). The graph is undirected, so information passing through the edges moves without preference.

NODE DEGREE
Bob 2
Jim 2
Jane 2
Mary 1
Katie 1
Figure 6. Friend Network Degree Distribution

Jane, Bob, and Jim have the same degree; but, are these nodes equally important in this network? That question depends on the actions and relationships undertaken within this network.

Let us suppose that we need to ship a package through the network in Figure 5. Each person-node is obliged to pass the package to its neighbor node until the package arrives at its designated recipient node. In order for this shipping network to function, each node must transfer its package, but certain nodes are more involved in the dissemination process than others. We can illustrate this by tracking those nodes that function as passthroughs in the network—those nodes that intervene between the sender of the package and the recipient of the package. Figure 7 presents those passthroughs that arise in a directed pass. Given Figure 5, we are moving left to right through each pair of sender and receiver nodes and approximating half of potential network traffic. The remaining half would reverse and repeat the passthrough node routes.

Sender/Receiver Passthrough
Mary to Jim None
Mary to Bob Jim
Mary to Jane Jim, Bob
Mary to Katie Jim, Bob, Jane
Jim to Bob None
Jim to Jane Bob
Jim to Katie Bob, Jane
Bob to Jane None
Bob to Katie Jane
Jane to Katie None
Figure 7. Betweenness Centrality Passthroughs

Counting the instances a node occurs in the Passthrough category of Figure 7, we see that Jane and Jim occur three times and Bob occurs four times. Bob occupies the most central position in the network. Thus, Bob has the most potential power to dictate how packages flow through the network even though his degree is the same as that of Jim and Jane.

Bob functions as the “broker node” (Freeman "Centrality in Social Networks Conceptual Clarification"; Freeman, Roeder, Mulholland, 1980) in the network because it is more “between” than the others. Thus, it operates as a gatekeeper between one cluster of nodes in the network and a cluster in a different region. The relevant metric here is called betweenness centrality.

The concept of betweenness centrality was first developed by Bavelas and the Group Network Labs at M.I.T. in the 1940s (Freeman, "A Set of Measures of Centrality Based on Betweenness"; Freeman, "Centrality in Social Networks Conceptual Clarification") and has since become an indispensable measure in the field of social network analysis. As its name suggests, betweenness centrality refers to the amount that a node falls between two other nodes on the shortest paths connecting them. This amount is calculated by dividing the number of paths that pass through the mediating node by the sum of all potential paths connecting the initial node to its endpoint (Freeman, "Centrality in Social Networks Conceptual Clarification," Social Networks; Borgatti).

The conceptual overlap between betweenness centrality and rhetorical topoi, especially Miller's definition, is tantalizing. Both topoi and network nodes with high betweenness centrality theoretically control the movement of information. Topoi ground statements by posing a commonly held proposition, value, or belief, which, in turn, delimits the scope of subsequent statements and appropriate responses. For example, a business email that asks for an extension on a deadline will likely inspire a formal response agreeing to or denying the extension because of two prevailing topoi: (1) that the norms of business correspondence require that a professional request be met with a professional response and (2) that a request for action be addressed in terms of that request. Topoi set the stage for how information develops and what information will count. In a similar fashion, a node with high betweenness centrality determines how information flows through a network because it has the ability to control which node receives the transmission.

Additional support for this proposition comes from both Corman, Kuhn, Mcphee, and Dooley and Tabayong and Carley. In their text mining of Sudanese newspaper articles, Tabayong and Carley use betweenness centrality to measure the prominence of political agents in corpus (determining whose name stood out more and at what time). They then attempted to correlate the textual salience of these agents with their historical prominence in Sudanese affairs. Tabayong and Carley's results suggest that their text-mining methods did parallel historical events; however, what is important to me and my efforts is the conceptualization of betweenness centrality as a marker of significance in a textual network. Corman, Kuhn, Mcphee, and Dooley's study makes the connection between textual structure and betweenness centrality more explicit. Their argument is worth quoting at length:

. . . Some words are more influential that others in channeling meaning. They are literally more meaning-full than other words in the network. Thus, identifying the structural influence of words allows one to measure this property. We operationalize this idea of influence as the centrality of a given word in the CRA network. Although a variety of measures could be used, centering theory points us most clearly toward betweenness centrality. To our knowledge, the concept was first formalized by Anthonisse (1971), who described it as the rush in a graph: "The rush in an element is the total flow through the element, resulting from a flow between each pair of vertices" (p.1). (176)

What remains for us is to determine which terms/nodes in a textual network graph demonstrate structuring betweenness centrality and test whether or not these "broker" nodes are rhetorically controlling the information circulating throughout the graph. To do this, I take as my data a durable discussion forum thread hosted by Science Buzz (described below).

Data

The primary data for this study are discussion board postings to Science Buzz, a popular science-themed forum hosted by the Science Museum of Minnesota. These posts were culled for a research project titled, Take Two: A Study of the Co-Creation of Knowledge on Museum Web 2.0 Sites conducted by Michigan State University's Writing in Digital Environments (WIDE) Research Group. WIDE's agenda was to examine Science Buzz as an online, social platform for informal learning and to trace how learning was facilitated via the rhetorical moves performed by learners and discussion leaders (see Grabill and Pigg and Grabill, Pigg, and Wittenauer for an extended description of this study and another perspective on the data). The data was segmented at the T-unit level and coded with a two-level scheme in a typical discourse analytic method.

My study focuses on one Science Buzz discussion board thread "The chicken and the egg." "The chicken and the egg" features 151 posts submitted between September 23, 2006 and May 4, 2007 and involves various conversations about chicken and egg production. As of this writing, the thread is still active, so this one-year span represents only a snapshot of the longer discussion.

Text Normalization

To turn “The chicken and the egg” thread into a computational object amenable to graphing, I subject the text to the following natural language processing techniques:

Results

In this section, I present the findings of an experiment that applies the network graph analytics above to the "The chicken and the egg" in the search for topoi.

After processing, "The chicken and the egg" amounts to 7,292 non-unique tokens. I turn these 7,292 tokens into a directed, multi-edged network graph using the Python-based NetworkX graphing software.5

In comparison to the previous examples, this directed multigraph of the "The chicken and the egg" would resemble Figure 3. Each token-word gleaned from the normalization process would represent a unique node. An edge would be drawn from one node to the next based on the nodes' adjacency in the natural language text. For example, if an original sentence in a natural language text read “The fire engine is red”, the normalization process would return three nodes: ["fire", "engine", "red”]. Given their position in the node sequence an edge would be drawn from "fire" > "engine" > "red" and then to whatever node followed. The normalization process returns unique nodes (token-words). There are 1,629 unique nodes in "The chicken and the egg". The disparity between this number and the original 7,292 is owed to the repetition of words in the natural language text. In the network graph, this repetition of terms (and other larger units such as phrases) is accounted for in the edges that connect the nodes, hence, the use of multi-edged graph. After processing, the network graph of the chicken thread contains 7,291 edges with self-loops (i.e., nodes that link to themselves).

The "The chicken and the egg" graph is too dense to visualize in any meaningful way, but Figure 8 presents a zoomed view.

Directed Multi-edged "The chicken and the egg" Thread Graph
Figure 8. Directed Multi-edged "The chicken and the egg" Thread Graph

For the sake of space, I present only the top-ten nodes with the highest betweenness centrality in Figure 9. These figures have been normalized (all values add to one). Figure 10 provides a more global view of the betweenness centrality figures for all 1,629 unique nodes. As you can see from the curve, most of the nodes have low betweenness centrality. A select few nodes disproportionately outrank the majority. Recall that betweenness centrality indicates the amount of times a node appears along the shortest paths (geodesics) between other nodes, meaning that "egg" is in the middle of more potential paths of information just as Bob (see Figure 7) possesses the highest betweenness centrality in the toy network. This suggests that as conversations develop—as words are strung together to form sentences and sentences strung together to form statements—Science Buzz posters are routinely routing their statements through the node-tokens “egg,” “chicken,” “hen” and that these node-tokens are the primary subject words in the discussion. This inference from the betweenness centrality results is borne out by the qualitative analysis of the data: the main concern of posters in this thread is how to store and raise chickens, eggs, and the variables that affect egg fertilization.

NODE ID BETWEEENNESS CENTRALITY VALUE (normalized)
egg 0.38136330088112713
chicken 0.11524988129773855
hen 0.11002127346866694
if 0.10040466817155196
can 0.07346898085691343
will 0.07317190184255012
rooster 0.05474743061420335
day 0.05367643407591807
chick 0.04753207822020994
fertilized 0.0389919504973716
Figure 9. Top 10 Betweenness Centrality Nodes in "The chicken and the egg" Thread
Betweenness Centrality Distribution in The chicken and the egg Thread
Figure 10. Betweenness Centrality Distribution

Discussion

With these findings, we have moved incrementally toward a sharper, computationally-derived analysis of topoi. Given the conceptual definition of betweenness centrality, we can claim that nodes such as "egg", "chicken", "hen", "if", "can", and "will" are not only salient in network, but function as controller nodes. We can now begin tracing the prevailing rhetorical strategies of the chicken thread. The "The chicken and the egg" thread is primarily concerned with the topics of chickens and eggs. The high incidence of modal verbs such as "can" and "will" and the conditional "if" further suggest that the modes favored by interactants in the "The chicken and the egg" thread are inquiry and explanation. The question now is: how do these nodes structure the flow of information that is passing through the "The chicken and the egg" Thread network?

One way to understand how high betweenness nodes organize the textual network is to examine how these nodes are spatially structuring the text by plotting the recurrence of the high betweenness nodes. We already know that these nodes are recurring more frequently than others due to their high betweenness centrality (a result of their sheer counts and positioning within the text). Identifying what these nodes are conveying or connecting will help illuminate what is rushing through these brokerage points. Programmatically, the next step would be to capture those node-tokens that fall between the cycle of the same high betweenness centrality nodes.

An example of a cycle for the highest ranked betweenness centrality node "egg" would resemble the following:

['egg', 'fertilized', 'how', 'come', 'never', 'come', 'across', 'one', 'embryo', 'little', 'chicken', 'inside', 'if', 'why', 'chicken', 'spend', 'energy', 'required', 'produce', 'unfertilized']

The initial token is "egg". The subsequent tokens proceed from "egg" and end at "unfertilized". The next instance of "egg" does not appear in this cycle, but it will be the initial token in the next cycle.

The "The chicken and the egg" thread network graph contains 435 cycles between the node-token "egg." Twelve cycles only included the instance of "egg." The text normalization procedure, which removes function words such as prepositions and pronouns, gives rise to the possibility that two instances of "egg" appear side by side in the network. For example: ". . . That is how I like my eggs. These eggs are always tasty." In this case, "eggs" would be stripped of its plural "s." The period and "These" would also be deleted, leading to a listing of "egg" > "egg." While these singleton cycles may not provide much interpretive value (i.e., "Hey, look: they are talking about eggs a lot in this section"), I have retained them for fidelity and for the structural significance they hold for the topology of the network.

Conceptually speaking, the goal of extracting these cycles is to determine how information is "rushing" through betweenness centrality nodes, thereby structuring the network. This would then lead us to a computationally-derived assay of topoi because we would be able to discern not simply the key subject words of the conversation but also the patterns that they produce. Such patterns, my hypothesis goes, can be connected to attempts by authors to establish common ground and recognizability during interactions.

However, this theory and the results of this experiment need to be further operationalized to both better understand the data and enable additional theory building. At the most basic level, the cycles extracted from instances of "egg" in the "The chicken and the egg" thread network graph represent the information passing through critical junction points in the network. At another level of abstraction, we might already judge them to be patterns—patterns of citation. Each cycle indicates a particular path that the discourse of the original "The chicken and the egg" thread took on the way to another mention of "egg" or "eggs." The importance of this information is anticipated by Bakhtin in “Problems with Speech Genres,” a connection which was pointed out to me by William Hart-Davidson (personal communication):

However monological the utterance may be (for example, a scientific or philosophical treatise), however much it may concentrate on its own object, it cannot but be, in some measure, a response to what has already been said about the given topic, on the given issue, even though this responsiveness may not have assumed a clear-cut external expression . . . After all, our thought itself—philosophical, scientific, and artistic—is born and shaped in the process of interaction and struggle with others’ thought, and this cannot but be reflected in the forms that verbally express our thought as well (92).

In order to be a part of a discourse community, communicators show membership by articulating precedents in the conversation—through the performance of rules and norms and/or the manifestation of dialogic markers. In a discussion board thread about chickens and egg production, the repeated cycling back to the subject of eggs would indicate one expected rhetorical strategy if posters hope to demonstrate to others that they are on topic and ready to participate.

Cycle Length Distribution
Figure 11. Cycle Length Distribution
Figure 12. Histogram of Cycle Lengths (bin=50)

We can also consider these patterns of citation as structural constraints on the discourse that enable us to infer important features of the discussion board thread. If we examine the scatter plot in Figure 11 and the histogram in Figure 12, then we can see that most of the cycles in this thread are under twenty tokens in length (keep in mind that these lengths figures or tokens counts apply to the processed text in which stopwords and other functional punctuation has been removed). We also see that there are only a handful outliers to this distribution. "Egg" cycle 49 is 182 node-tokens long and "Egg" cycle 269 is 274 node-tokens. Figure 13 presents a more interactive view of the same data.6 At the most basic level, this means that the interactants in these sections of the thread deviated from the established pattern and did not utter "egg" for an extended period of time. What this tells us then is that the topic of the conversation has shifted, signaled by a modulation in topoi that eschewed the flow-through point of "egg."

Cycle Length: 0 | Tokens Cycle ID: 0

Figure 13. Egg Cycles
(Hover over the red circles to see cycle length in token and cycle index or "id")

People are participating differently here.

These circular representation of "Egg" cycles in the graph cannot tell us how (and that is the point of this chapter: "the rhetorical how of participation"), but it does tell us where to zoom in the raw natural language text. In cycle 269, the largest cycle in the thread (i.e., the cycle that featured the largest gap between utterances of "egg"), we find that the conversation has indeed shifted from eggs to the topic of chicken sexing.

Figure 14 shows a comparison in cycles between the node-token "egg" and the node-token "chicken" through our customized cycle plotting. "Egg" cycles are in blue. "Chicken cycles" are in orange. Each cycle representation is offset by the individual starting point of each cycles in the "The chicken and the egg" thread network graph. The "egg" cycles start at node-token 19 and the "chicken" cycles start at node-token 29. Figure 13 represents a token by token progression over the length of the "The chicken and the egg" thread network graph. Recall here that the size of the circle/cycle is directly related to the proximity of one node-token to its next instantiation. Thus, larger circles/cycles indicate a more prolonged reiteration of a node-token. Smaller cycles indicate a more rapid reiteration. The largest "egg" cycles (274 node-tokens) is shot through with more rapid invocations of the second-highest betweenness centrality node: "chicken."

Figure 14. "Egg" and "Chicken" Cycles Comparison
Egg
Chicken

Taking a more holistic look at the visualization in Figure 14, we can also discern a shift in the overall conversation based on the various radii of the "egg" and "chicken" cycles. In the first 3 rows of the visualization (about 3420 node-tokens), "chicken" is mentioned on sporadically. Meanwhile, "egg" is continuously mentioned. The gap between these utterances lessens beyond this breakpoint, however, and in the final 4 rows of the visualization we see a tighter pattern of reiteration. The "how" of dialogic participation is changing.

And we can now zoom in on the portion of the thread encompassed by the these shifts to understand why. Doing so we would see that "egg" cycle 269 (274 node-tokens in length) is characterized by questions from an anonymous poster and extended responses by the thread's resident poultry expert, Dr. Jacquie Jacobs, a title which is blazoned by her poster ID. We also see a citation of other scientific research made by Jacobs. We can see this from the node-token view alone:

['egg', 'production', 'online', 'tue', 'pm', 'anonymous', 'im', 'sure', 'young', 'hen', 'hen', 'how', 'can', 'tell', 'difference', 'month', 'old', 'look', 'like', 'small', 'spers', 'leg', 'long', 'tale', 'feather', 'sat', 'pm', 'jacquie', 'jacob', 'picture', 'jacquie', 'jacob', 'doctor', 'jacquie', 'jacob', 'university', 'minnesota', 'poultry', 'specialist', 'month', 'age', 'chicken', 'taken', 'typical', 'characterists', 'specific', 'male', 'female', 'male', 'long', 'pointed', 'feather', 'around', 'neck', 'shoulder', 'tail', 'comb', 'wattle', 'larger', 'female', 'male', 'also', 'typically', 'larger', 'spur', 'back', 'leg', 'six', 'month', 'age', 'if', 'chicken', 'rooster', 'should', 'crowing', 'course', 'complicate', 'thing', 'possible', 'hen', 'take', 'characteristic', 'rooster', 'due', 'hormonal', 'problem', 'website', 'show', 'difference', 'feather', 'structure', 'male', 'female', 'tue', 'pm', 'anonymous', 'how', 'tell', 'difference', 'rooster', 'chicken', 'when', 'young', 'sat', 'pm', 'jacquie', 'jacob', 'picture', 'jacquie', 'jacob', 'doctor', 'jacquie', 'jacob', 'university', 'minnesota', 'poultry', 'specialist', 'difficult', 'tell', 'difference', 'male', 'female', 'chick', 'often', 'take', 'specially', 'trained', 'technician', 'sex', 'using', 'feather', 'vent', 'sexing', 'technique', 'according', 'keith', 'bramwell', 'university', 'arkansas', 'best', 'way', 'sex', 'chicken', 'backyard', 'flock', 'watch', 'grow', 'feed', 'water', 'observe', 'enjoy', 'mature', 'develop', 'change', 'will', 'become', 'obvious', 'male', 'will', 'begin', 'act', 'manly', 'voice', 'will', 'change', 'chirping', 'common', 'young', 'chick', 'attempted', 'crow', 'nearly', 'breed', 'chicken', 'young', 'males\xc3\xb5', 'feather', 'will', 'also', 'change', 'round', 'oval', 'shaped', 'feather', 'common', 'hen', 'young', 'bird', 'shiny', 'narrow', 'pointed', 'feather', 'found', 'neck', 'base', 'tail', 'number', 'old', 'wife', 'tale', 'exist', 'sexing', 'chick', 'method', 'better', 'flipping', 'coin', 'feather', 'sexing', 'vent', 'sexing', 'accurate', 'method', 'determining', 'sex', 'chick', 'perhaps', 'best', 'enjoyable', 'method', 'watching', 'bird', 'grow', 'tue', 'pm', 'anonymous', 'rooster', 'thing', 'hanging', 'throat', 'sun', 'pm', 'jacquie', 'jacob', 'picture', 'jacquie', 'jacob', 'doctor', 'jacquie', 'jacob', 'university', 'minnesota', 'poultry', 'specialist', 'believe', 'referring', 'wattle', 'hen', 'rooster', 'wattle', 'larger', 'male', 'information', 'chicken', 'anatomy', 'tue', 'pm', 'anonymous', 'hey', 'pair', 'chinese', 'silkies', 'rooster', 'hen', 'three', 'week', 'ago', 'hen', 'showing', 'broody', 'behaviour', 'week', 'observed', 'stayed'] 274 8 ['egg', 'time', 'coming', 'eat', 'doctorink', 'however', 'sitting', 'one']

We can draw two conservative conclusions based on this data. First, departures from the subject of "egg" are present in this thread, but the departures encompass related subjects ("hens," "chickens," "roosters"). Second, the presence of marked expert knowledge makes this deviation more feasible.

These interpretations, however, depend on the contradistinction between the rarity of cycles (or citational pattern) of longer length compared to the prevailing cycle length of under 20 node-tokens. How might these shorter cycles be explained in terms of rhetorical moves transpiring in this thread?

One thing that is apparent from a purely structural viewpoint is that the conversation does not often move on the from the word "egg." People not only frequently invoke the term as a subject of conversation but also use it in close proximity with other instances of "egg." This pattern of reiteration signals a set of pragmatic constraints that can help shed light on the rhetoric of the "The chicken and the egg" thread. We can begin this scaled-up inquiry by asking a simple question: in what discursive situation would these patterns operate? I will offer two potential types with the proviso that they are heuristic in nature. In terms of short cycles between one and five node-tokens in length, we can infer that very little is going on except for establishing the topic "egg." For example, the phrase "According to the American Egg Board, eggs . . ." would produce a cycle length of 2 (egg-board). Though not referring to an "egg" as an object, an entity is being declared in this cycle. Thus, these regions of the graph can indicate functional rhetorical moves to ground the subject of a sentence or group of sentences. In terms of slightly longer cycle lengths (5-15 node-tokens in length) more information is being conveyed because more node-tokens are traveling through the betweenness centrality node.

These cycles do more than establish "egg" as a subject, and I would argue that this "doing more" involves a type of procedural form of rhetoric that emphasizes explanation. For example, when describing aspects of egg fertilization or storage, a person would repeatedly write the word egg because it is the central object of manipulation. For example, "egg" cycles 3 to 5 include a question and an elaboration of this question. Put another way, if a person were describing how to make an omelet instead of what temperature to properly store fertilized eggs, they would likely need to say or write "egg" throughout each step until the completion of the process. What follows is an example of such cyclic usage:

. . . eggs fertilized?" you get a lot of responses. Guess lots of other people had the same question. The answer is that chickens will lay eggs even when they've had no contact with a rooster. According to the "Ask a Scientist" feature of the Howard Hughes Medical Institute, "If an egg has been fertilized, then the embryo inside has already divided several times but remains a group of unspecialized cells [at the time the egg . . .]. (The chicken and the egg)

The cumulative frequency distribution of the "egg" cycles in Figure 11 indicates that most of the "The chicken and the egg" thread network graph is characterized by these cyclic patterns. This suggests that the global rhetoric of this thread trends toward the establishment of "egg" as a subject and an explanation of this subject in ways that do not radically depart from the basic usages of the term. This further suggests that the "The chicken and the egg" thread discussion forum favors rhetorical strategies that focus on a limited set of related topics and methodical elaboration of these topics in terms of questions and explanations that involves the repetition of key terms—which is to say that the people participating in the discussion forum are participating in a way that favors repetition and elaboration of key terms.

This experiment demonstrates writers structuring their texts in ways that can/should be intelligible to readers within a particular rhetorical situation. The linguistics units of the text conform to a recognizable pattern. If such patterns exist, then when units of natural language are converted into numeric units, these numerical units should show corollary patterns when reproduced by a computer. In this case, a pattern of citation, dependent upon token frequency and the length of token cycles within a network graph returns a pattern that is revealing for its overall regularity and its outliers. When reconciled with the original source text, we find that the outliers in cycle length capture transitional points in the in conversation, in which both the substance of the discussion board posts and the roles of the interlocutors have shifted.

Conclusions

How can we better measure participation?

That is the question that has animated this chapter and much of the trenchant discussion in Critel's work. How do we go beyond imposed standards of orality, beyond publicly performing for the "sage on the stage," beyond equating participation with attendance, beyond the duplication of already counted work? While the case in this chapter does not originate from the classroom, the Science Buzz discussion board thread does constitute an informal learning environment in which people are motivated to increase their knowledge with the help of experts and facilitators, and I believe that some of the lessons learned from a computational rhetoric approach to this forum can be instructive to the classroom.

The experiment conducted in this chapter suggests that in a threaded discussion, a type of rhetorical inertia takes hold when it comes to patterns of citation, and this pattern is likely set in the opening stages of the conversation. Consequently, instructors might reconsider how they encourage participation and pay close attention to the words they use because, as we have seen, words as mundane as "egg" and "chicken" can exert a significant anchoring force on how topics develop and how people might be recognized as active participants. Instructors might also take a lesson from the predominance of modal verbs and if-conditional constructions in the "The chicken and the egg" thread network graph. These, more than the key words of "egg," "chicken," and "hen," imply a delimited set of rhetorical moves focused on the negotiation of ideas in a subjunctive mood as opposed to an imperative or denotative one. For example, early on the original "chicken thread," a poster cites the Howard Hughes Medical Institute on the possibility that grocery store eggs might have been fertilized:

If an egg has been fertilized, then the embryo inside has already divided several times but remains a group of unspecialized cells [at the time the egg is laid]. When the egg is incubated at about 37 to 38 ¡C, the embryonic cells differentiate to form a chick, which will hatch after 21 days. If the egg has not been fertilized, then the oocyte [or egg cell] within will never grow or divide, and the egg will never hatch. The eggs you buy at the supermarket are eggs that have never been fertilized. . . . ("The chicken and the egg")

In this instance, the poster has boilerplated information found on the Internet, but this does not make the selection any less compelling. Though quoted, the if-conditional statements allow the poster to provide a fuller account of egg fertilization and enables the audience to consider the various scenarios of egg production before deciding the issue. One might argue that this survey of possibilities renders the information more generalizable and more open to wide-scale response and reuse. As the discussion of betweenness centrality shows, "if" word-tokens are prevalent, suggesting that this type of knowledge transmission is a salient feature of the "The chicken and the egg" thread's rhetoric of participation. Instructors charged with supervising participation in an online discussion forum might be encouraged to consider how they are framing responses to conversations or inquiries so that they are not simply serving information but promoting participation in ways that have empirical support.

Lastly, I would like to discuss the notion of the steering or intervention. Part of facilitating participation among people is to understand when to intercede in the conversation and when to recede. The rhetorical inertia that operates in the "The chicken and the egg" thread network graph suggests an adherence to discursive protocols (i.e. people should be talking about eggs, and if they're not talking about eggs, they should be talking about how hens and roosters make eggs). The above tests cannot reveal the specific motivations for such dialogism, but they can return a snapshot view that something dialogic in the Bakhtinian sense is or is not transpiring. The ability to track the lexical units of this dialogism, instructors can gain a better understanding of how participants are adjusting the course of their participation and make more informed decisions on how they themselves can participate in the discussion.

I offer these suggestions as potential areas of exploration and experimentation, which, I think, pays homage to Critel's project. In my reading, Critel's dissertation agitates the notion that our invocations of participation in the composition classroom are or should be as common as they are, and, by calling them into question, we can find uncommon innovations that might promote learning. To close, I submit a series of more specific use cases for the computational metrics discussed in this chapter:

Acknowledgements

This work would not have been possible without the generous support from my Michigan State University colleagues in the Computational Rhetoric Group, WIDE and MATRIX: William Hart-Davidson, Jeff Grabill, Dean Rehberger, and Liza Potts.

I also cannot stop thanking Genevieve Critel, who was always a friend and whose legacy of scholarship and good deeds continues to shape the way I think about the field and life to this day.

Notes

(1) Admittedly, this is a partial view of the projects undertaken by Connors and Lunsford and Lunsford and Lunsford. Certainly, the challenges of gaining multi-institutional IRB approval referred to by Lunsford and Lunsford (786-787) cannot be obviated by text mining software. Thus, I gesture at these works not as a strict comparison to the work found in this chapter, but as a reminder of the scale composition research has pursued in its history—a point in its history that had less technological advantages and less integrated media ecologies than we do now. On another interesting note, the citational network formed by Connors and Lunsford and Lunsford and Lunsford represents in miniature the scholarly presumptions of small sampling that I am warning against in this chapter. Lunsford and Lunsford refer to a 1990 study of frequency of error in student writing by Sloan. In "Frequency of Errors in Essays by College Freshmen and by Professional Writers," Sloan compares the error rate of student writers against professional writers, and suggests that student err at roughly the same percentages as the professional writers, which is consistent with the findings Connors and Lunsford and Lunsford and Lunsford. However, Sloan's sample only consisted of twenty student essays from one class of freshmen composition with a total word count of 9,392 words (300). Sloan states outright that he does not see his findings as definitive owing to this small sample. That said, no such qualification is made in Lunsford and Lunsford's (785) reference to Sloan. <Back to Top>

(2) There are also similarities between Barthes' emphasis on the jouissance of reading and re-writing the text and Rice's sojourn through the streets of Detroit and tracing his own network of meanings. <Back to Top>

(3) In many text processing procedures, a common list of words with little semantic value are automatically removed. My stopword list contains trivial words such as articles, prepositions, pronouns, and conjunctions. The point of this processing step is to reduce noise in the information channel. In nearly all natural language texts written in English, "the" and "a" would vastly outnumber the remaining words. If left in the text, the resulting object would be weighted towards relatively meaningless words. <Back to Top>

(4) Lemmatization generally strips plurals, possessive, and affixes and returns the most basic word root. This is another effort to minimize noise in the computational object. A related procedure is called stemming. Stemming reducing words to artificial roots in order to further normalize the tokens. I have chosen lemmatization instead of stemming because lemmatization returns real words. As a rhetorician interested in rhetoric, I wish to preserve as much of the natural meaning and inflection of the original text as possible. For both tokenizing and lemmatization tasks, I use modules packaged in the Natural Language Toolkit (Bird). <Back to Top>

(5) From the NetworkX website: "NetworkX is a Python language software package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks." In other words, NetworkX is a suite of applications that turns computational objects such as lists or dictionaries into graphs consisting of nodes and edges. In the case of the graphs in this chapter, the NetworkX software transforms lists of word-tokens into nodes and forges these nodes together with edges. The metrics that measure the resulting graphs are also packaged in NetworkX. <Back to Top>

(6) The use of circles or arcs to visualize word repetitions has been previous explored by Paley; Wattenberg; and Clement. Additionally, Omizo and Hart-Davidson have created an experimental web application that builds directly from the cyclic visualization found in this chapter. <Back to Top>

Works Cited

Adamic, Lada A., et al. "Search in Power-law Networks." Physical Review E, vol. 64, no. 4, 2001, DOI:https://doi.org/10.1103/PhysRevE.64.046135.

Anson, Chris M. "The Intelligent Design of Writing Programs: Reliance on Belief or a Future of Evidence." WPA: Writing Program Administration, vol. 32, no. 1, 2008, pp. 11-36.

Bakhtin, Mikhail Mikhaĭlovich. "Problem of Speech Genres". Speech Genres and Other Late Essays. Edited by Caryl Emerson and Michael Holquist U of Texas P, 2010, pp 60-102.

Bartholomae, David. "Inventing the University." Journal of Basic Writing, vol. 5, no. 1, 1986, pp. 4-23.

Barthes, Roland, and Stephen Heath. Image, Music, Text: Essays Selected and Translated by Stephen Heath. Hill and Wang, 1977.

Bird, Steven. "NLTK: The Natural Language Toolkit." Proceedings of the COLING/ACL on Interactive presentation sessions. Association for Computational Linguistics, 2006.

Borgatti, Stephen P. "Centrality and Network Flow." Social networks, vol. 27, no. 1, 2005, pp. 55-71.

Bernhardt, Stephen A. "Seeing the Text." ACM SIGDOC Journal of Computer Documentation, vol. 16, no. 3, 1992, pp. 3-16.

Carley, Kathleen M. "Network Text Analysis: The Network Position of Concepts." Text Analysis for the Social Sciences: Methods for Drawing Statistical Inferences from Texts and Transcripts, vol. 4, 1997, pp. 79-100.

The chicken and the egg. Science Buzz, 3 Aug 2006, http://www.sciencebuzz.org/blog/chicken-and-egg/.

Clauset, Aaron, Cosma Rohilla Shalizi, and Mark EJ Newman. "Power-law Distributions in Empirical Data." SIAM Review, vol. 51, no. 4, 2009, pp. 661-703.

Clement, T. E. ‘A Thing not Beginning and not Ending’: Using Digital Tools to Distant-read Gertrude Stein's The Making of Americans. Literary and Linguistic Computing, vol. 23, no. 3, 2008, pp. 361-81.

Collins, Jeff, et al. "Detecting Collaborations in Text Comparing the Authors' Rhetorical Language Choices in The Federalist Papers." Computers and the Humanities, vol. 38, no. 1, 2004, pp. 15-36.

Connors, Robert J., and Andrea A. Lunsford. "Frequency of Formal Errors in Current College Writing, or Ma and Pa Kettle Do Research." College Composition and Communication, vol. 39, no. 4, 1988, pp. 395-409.

Corman, Steven, R., et al. "Studying Complex Discursive Systems." Human Communication Research, vol. 28, no. 2, 2002. pp. 157-206.

Critel, Genevieve. Investigating The Rhetoric of Student Participation: Uncovering and Historicizing Commonplaces in Composition Studies. Diss. The Ohio State University, 2012.

Diesner, Jana, and Kathleen M. Carley. "Revealing Social Structure from Texts." Causal Mapping for Research in Information Technology, vol. 81, 2004.

Dingo, Rebecca. Networking Arguments: Rhetoric, Transnational Feminism, and Public Policy Writing. U of Pittsburgh P, 2012.

Freeman, Linton C. "A Set of Measures of Centrality Based on Betweenness." Sociometry, 1977. pp. 35-41.

---. "Centrality in Social Networks Conceptual Clarification." Social Networks, vol. 1, no. 3, 1978, pp. 215-39.

Freeman, Linton C., Stephen P. Borgatti, and Douglas R. White. "Centrality in Valued Graphs: A Measure of Betweenness Based on Network Flow." Social Networks, vol. 13, no. 2, 1991, pp. 141-54.

Freeman, Linton C., Douglas Roeder, and Robert R. Mulholland. "Centrality in Social Networks: II. Experimental Results." Social Networks, vol. 2, no. 2, 1979, pp. 119-41.

Fulkerson, Richard. "Composition at the Turn of the Twenty-First Century." College Composition and Communication, vol. 56, no. 4, 2005, pp. 654-87.

Girvan, Michelle, and Mark EJ Newman. "Community Structure in Social and Biological Networks." Proc. Natl. Acad. Sci. USA 99.cond-mat/0112110, 2001, pp. 8271-8276.

Grabill, Jeffrey T., and Stacey Pigg. "Messy Rhetoric: Identity Performance as Rhetorical Agency in Online Public Forums." Rhetoric Society Quarterly, vol. 42, no. 2, 2012, pp. 99-119.

Grabill, Jeffrey T., Stacey Pigg, and Katie Wittenauer. "Take Two: A Study of the Co-Creation of Knowledge on Museum 2.0 Sites." 16 April 2009. Paper presented at Museums and the Web 2009: The International Conference for Culture and Heritage On-line. Indianapolis, Indiana. Retrieved from http://www.museumsandtheweb.com/mw2009/papers/grabill/grabill.html

Hart, Roderick. "Genre and Automated Text Analysis: A Demonstration." Rhetoric and the Digital Humanities. Edited by Jim Ridolfo and William Hart-Davidson. U Chicago P, 2015, pp. 152-68.

Haswell, Richard H. "NCTE/CCCC’s Recent War on Scholarship." Written Communication, vol. 22, no. 2, 2005, pp. 198-223.

Hauser, Gerard A. Vernacular Voices: The Rhetoric of Publics and Public spheres. U of South Carolina P, 1999.

Hawk, Byron. "Toward a Rhetoric of Network (Media) Culture: Notes on Polarities and Potentiality." Journal of Advanced Composition, 2004, pp. 831-50.

Hoffman, David, and Don Waisanen. “At the Digital Frontier of Rhetoric Studies: An Overview of Tools and Methods for Computer-Aided Textual Analysis.” Rhetoric and the Digital Humanities, edited by Jim Ridolfo and William Hart-Davidson. U Chicago P, 2015, pp. 169-83.

Inoue, Asao B. Antiracist writing assessment ecologies: Teaching and assessing writing for a socially just future. WAC Clearinghouse, 2015.

Ishizaki, Suguru, and David Kaufer. "Computer-aided Rhetorical Analysis." Applied Natural Language Processing: Identification, Investigation and Resolution. IGI Global, 2012, pp. 276-296.

Johnson-Eilola, Johndan. Datacloud: Toward a New Theory of Online Work. Hampton Press, 2005.

Kleinnijenhuis, Jan, Jan Ridder, and Ewald Rietberg. "Reasoning in Economic Discourse: An Application of the Network Approach to the Dutch Press. Network text analysis: The network position of concepts." Text Analysis for the Social Sciences: Methods for Drawing Statistical Inferences from Texts and Transcripts edited by Carl Roberts, Erlbaum, New Jersey, pp. 191-208

Lang, Susan, and Craig Baehr. "Data Mining: A Hybrid Methodology for Complex and Dynamic Research." College Composition and Communication, vol. 64, no. 1, 2012, pp. 172-94.

Leff, Michael C. "The Topics of Argumentative Invention in Latin Rhetorical Theory from Cicero to Boethius." Rhetorica: A Journal of the History of Rhetoric 1.1 (1983): 23-44.

Lunsford, Andrea A., and Karen J. Lunsford. "'Mistakes are a Fact of Life': A National Comparative Study." College Composition and Communication, vol. 59, no. 4, 2008, pp. 781-806.

Miller, Carolyn R. "The Aristotelian Topos: Hunting for Novelty." Rereading Aristotle’s Rhetoric, edited by Alan G. Gross and Arthur E. Walzer, Southern Illinois UP, 2002, pp. 130-46.

Newman, Mark EJ. "Scientific Collaboration Networks. II. Shortest Paths, Weighted Networks, and Centrality." Physical Review E, vol. 64, no. 1, 2001, pp. 016132.

---. "Analysis of Weighted Networks." Physical Review E , vol. 70, no. 5, 2004, pp. 056131.

NetworkX. NetworkX. N.p., 20 Sept. 2014.

North, Stephen. M. The Making of Knowledge in Composition: Portrait of an Emerging Field. Boynton/Cook Publishers, 1987.

Omizo, Ryan M., and William Hart-Davidson. "The Cycletron: An Experiment in the Topological Visualization of Text." Proceedings of the 35th ACM International Conference on the Design of Communication. ACM, 2017.

Opsahl, Tore, Filip Agneessens, and John Skvoretz. "Node Centrality in Weighted Networks: Generalizing Degree and Shortest Paths."Social Networks, vol. 32, no. 3, 2010, pp. 245-51.

Paley, W. Bradford. "TextArc: Showing Word Frequency and Distribution in Text." Poster presented at IEEE Symposium on Information Visualization, Vol. 2002, 2002.

Perkins, Jacob. Python text processing with NLTK 2.0 cookbook. Packt Publishing Ltd, 2010.

Popping, Roel. Computer-Assisted Text Analysis. Sage Publications, 2000.

---. "Knowledge Graphs and Network Text Analysis." Social Science Information, vol. 42, 2003, pp. 91-106.

Porter, James E. "Recovering Delivery for Digital Rhetoric." Computers and Composition, vol. 26, 2009, pp. 207-24.

Raban, Daphne R., and Eyal Rabin. "Statistical Inference from Power Law Distributed Web-Based Social Interactions." Internet Research, vol. 19, no. 3, 2009, pp. 266-78.

Rice, Jeff. "Networked Boxes: The Logic of Too Much." College Composition and Communication, vol. 59, no. 2, 2007, pp. 299-311.

---. "Urban Mappings: A Rhetoric of the Network." Rhetoric Society Quarterly, vol. 38, no. 2, 2008, pp. 198-218.

Sack, Warren. "Conversation Map: An Interface for Very-Large-Scale Conversations." Journal of Management Information Systems, vol. 17, no. 3, 2000, pp. 73-92.

Scott, John. Social Network Analysis. Sage, 2012.

Simmons, W. Michelle, and Jeffrey T. Grabill. "Toward a Civic Rhetoric for Technologically and Scientifically Complex Places: Invention, Performance, and Participation." College Composition and Communication, vol. 58, no. 3, 2007, pp. 419-48.

Sloan, Gary. "Frequency of Errors in Essays by College Freshmen and by Professional Writers." College Composition and Communication, vol. 41, no. 3, 1990, pp. 299-308.

Smagorinski, Peter, and Michael W. Smith. "Editors' Introduction: Reconsidering Research in the Teaching of English." Research in the Teaching of English, vol. 37, no. 4, 2003, pp. 417-24.

Spinuzzi, Clay. Network: Theorizing Knowledge Work in Telecommunications. Cambridge UP, 2008.

Swales, John, and Hazem Najjar. "The Writing of Research Article Introductions." Written Communication, vol. 4, no. 2, 1987, pp. 175-91.

Tambayong, Laurent, and Kathleen M. Carley. "Network Text Analysis in Computer-Intensive Rapid Ethnography Retrieval: An Example from Political Networks of Sudan." Journal of Social Structure, vol. 13, 2012, http://www.cmu.edu/joss/content/articles/volindex.html.

Trimbur, John. "Composition and the Circulation of Writing." College Composition and Communication, vol. 52, no. 2, 2000, pp. 188-219.

Walker, Janice R., et al. "Computers and Composition 20/20: A Conversation Piece, or What Some Very Smart People Have to say About the Future." Computers and Composition, vol. 28, no. 4, 2011, pp. 327-46.

Wasserman, Stanley and Katherine Faust. Social Network Analysis. Cambridge UP, 1994.

Wattenberg, Martin. "Arc Diagrams: Visualizing Structure in strings." Information Visualization, 2002. IEEE Symposium on IEEE, 2002.

Welser, Howard, T., et al. "Visualizing the Signatures of Social Roles in Online Discussion Groups." Journal of Social Structure, vol. 8, no. 2, 2007, pp. 1-32.

Wojcik, Michael. Inventing Computational Rhetoric. Master's Thesis. Michigan State University, 2013.