Bryan Tarpley

Employment

Associate Research Scientist of Critical Infrastructure Studies 2021-present

Center of Digital Humanities Research
Texas A&M University

See the description of my role at the center described below. Beginning in 2023, however, the center’s mandate pivoted such that we became chiefly concerned with project development for local TAMU faculty and graduate students.

Software Applications Developer III 2016-2021

Center of Digital Humanities Research
Texas A&M University

As the lead developer for the Center of Digital Humanities Research (CoDHR), I implemented technical solutions for large, international grant initiatives; taught the “Python for Humanists” courses as part of the Programming4Humanists continuing education program; carried out Summer Technical Assistance grants which incrementally developed projects for local faculty and graduate students; and managed the Humanities Visualization Space (a black-box room with a very large screen and surround sound speakers).

Lead Software Applications Developer 2014-2016

Information Technology, Infrastructure, and Operations
Texas A&M University

I played pivotal roles in two major campus-wide initiatives. The first was to engineer a self-service portal for faculty, students, and staff to use to transition their “Zimbra” email inboxes, calendars, and “briefcases” to Google cloud services; specifically Gmail, Google Calendar, and Google Drive. This involved creating a performant, enterprise grade web application with an asynchronous job queue. With this application we were able to migrate over 50,0000 Zimbra accounts, and our team was awarded “Team of the Semester” by TAMU Computing and Information Services. The second major initiative was carried out chiefly by me, and it involved developing “TAMUDirect,” which was another enterprise web application that allowed faculty to access and configure Google Group mailing lists for all of their courses.

Graduate Assistant Researcher 2012-2014

Initiative for Digital Humanities, Media, and Culture
Texas A&M University

As a graduate assistant researcher, I created the MySQL data schema for the Early Modern OCR Project (eMOP), as well as a simple query builder in PHP to explore that data. I also developed an application in C# that allowed eMOP researchers to select ideal instances of glyphs from Early Modern texts, create synthetic training images, and train the Tesseract 3 OCR engine used to OCR 46 million page images.

Graduate Assistant Teacher 2011-2012

Department of English
Texas A&M University

As a graduate assistant teacher, I taught a 2/2 load of composition and rhetoric and literature survey courses. I designed the literature survey courses that I taught. I also served as a summer intern for the writing program where I developed the course modules for the online section of English 210 (Technical Writing) which were taught by the department for several years.

Adjunct Faculty 2009-2011

Department of English
Stephen F. Austin State University

As an adjunct faculty member in the Department of English, I taught a 4/4 load of composition and rhetoric courses. I designed the courses I taught and participated in assessment.

WWW Specialist 2007-2009

Columbia Regional Geospatial Service Center
Stephen F. Austin State University

My role at the CRGSC was to design the center’s web presence, which involved creating a custom content management system (CMS) capable of servicing their unique needs. This CMS allowed them to schedule and orchestrate GIS training sessions for the Texas State Guard for emergency preparedness. I also helped with emergency response—I was part of a two-person team who set up a server with GIS software at the staging area of the Galveston response team after hurricane Rita.

Systems Administrator 2005-2007

Office of Instructional Technology
Stephen F. Austin State University

I administered the university’s Learning Management System, which at first was WebCT and then later Blackboard. I managed the hardware, which consisted of both production and staging instances of RedHat Linux servers (for WebCT) and then Windows servers (for Blackboard), a fiber-connected storage area network, a load balancer, tape backups, uninterruptible power supply, and cooling systems. I also trained and supervised an assistant systems administrator.

Programmer/Analyst 2003-2005

Management Information Systems
Harding University

My role in this position was to create custom data schemas and automated business processes to augment the university’s Student Information System (Banner). This mostly involved working with Oracle databases to write complex queries, create views and tables, and write subroutines using PL/SQL.

Major Projects

Corpora 2018-present

Principal Investigator: Myself
My Role: Lead Developer

Corpora is a web-based, “dataset studio” for the Digital Humanities, allowing scholars to build, enhance, search, transform, and explore digital humanities project data. It is the culmination of my research into infrastructure studies, and has come to serve a crucial role for the following major projects, among others:

Texas Art Project (TAP) 2023-present

Principal Investigator: Tianna Uchacz
My Role: Lead Developer

TAP showcases the work of Texas artists, currently featuring the work of E.M. “Buck” Schiwetz, with more forthcoming. Allows for a filtered browsing/search of the artwork, zoomable images, and a mapping interface for showing the locations of Schiwetz’ subjects. Project data is hosted in Corpora. For this project, I also developed a custom Wordpress plugin which queries the Corpora API for project data and then presents images via IIIF tiling and plots GIS data using Leaflet.

Maria Edgeworth Letters Project (MELP) 2023-present

Principal Investigators: Susan Egenolf, Meredith Hale, Hilary Havens, Carrie Johnson, Jessica Richard, and Robin Runia
My Role: Lead Developer and Technical Editor

MELP makes available the collected letters of Maria Edgeworth, providing interfaces to browse, search, and explore them, including a IIIF image viewer of the letter images themselves. Project data hosted by Corpora. I wrote the logic to ingest the data from TEI encoded letters and built a custom Wordpress plugin that queries the Corpora API for project data and presents it. Of note is a custom letter-viewer which places a zoomable letter image side-by-side with its transcription, and keeps images in-sync with the transcription as the user scrolls.

Linking Texts and Data from the Medieval Middle East (LTDMME) 2023-present

Principal Investigators: Daniel Schwartz and Bryan Tarpley
My Role: Co-PI

LTDMME is an NEH funded grant project whose goal is to create an advanced corpus of Syriac texts in translation with contextual discovery tools. My deliverables for this grant include a Tesseract 5 model training pipeline integrated into Corpora, allowing for users to transcribe documents and then use those transcriptions to improve the OCR for similar documents. I am also tasked with creating a similar pipeline which takes transcribed documents, performs named entity recognition, and then tags each named entity, producing a barebones TEI XML version of the document containing tagged entities. Finally, I was tasked with contracting a JavaScript/frontend developer to improve the transcription tool within Corpora. Sadly, while the first two deliverables are achievable without grant funds, given the NEH’s termination of this grant in 2025, this last deliverable will likely not be feasible.

The Carlyle Letters Online (CLO) 2022-present

Principal Investigator: Brent Kinser
My Role: Lead Developer

The CLO makes available the collected letters and photographs of Thomas and Jane Carlyle, providing interfaces to browse, search, and explore them. This project came to me as an admirable yet unsustainable boutique web application. Thankfully project data was almost entirely TEI encoded. I wrote logic to extract TEI data into Corpora. Much like TAP and MELP above, I wrote a custom Wordpress plugin to query the Corpora API and present the data, matching the legacy application’s appearance. Over time, I also implemented an advanced search feature which queries and presents search results obtained from Corpora’s API.

The New Variorum Shakespeare (NVS) 2019-present

Principal Investigator: Robert Stagg
My Role: Lead Developer

This project seeks to digitally represent the play text, textual variations over time across major editions, and scholarly commentary of every Shakespeare play. Excepting some initial HTML/CSS development, my role has been to engineer the entirety of the digital NVS, from the ingestion tasks that extract data from TEI encoded volumes to the presentation of that data via the web-based variorum and paratext viewers. The variorum viewer is an extremely complicated undertaking, involving reconstructing synthetic play lines based on the original textual editorial notations of variants, the construction of a histogram visualization of changes over time across editions, the highlighting of commentary lemmata, and many other ongoing challenges, including the construction of backend tools for the creation of future NVS editions. Everything pertaining to the digital NVS, from the backend to the frontend, is hosted by Corpora.

Linked Infrastructure for Networked Cultural Scholarship (LINCS) 2018-2023

Principal Investigator: Susan Brown
My Role: Principal Investigator for TAMU Sub-award; Lead Developer

LINCS is a sprawling endeavor to schematize, convert, ingest, and explore large datasets consisting of linked open data. It involves many, chiefly Canadian institutions. My role, as principal investigator for Texas A&M’s contributions was twofold: first, to convert the ~2 million bibliographic metadata entries comprising the catalog of the Advanced Research Consortium (ARC) into linked open data for ingestion into the LINCS triplestore; second, to develop a “rich prospect browser” for visually exploring the LINCS dataset. Both of these objectives were accomplished using Corpora.

The Beowulf's Afterlives Bibliographic Database (BABD) 2017-present

Principal Investigator: Britt Mize
My Role: Lead Developer

BABD is the most comprehensive record of texts, representations, and adaptations of Beowulf from 1705 to the present, in all languages, genres, and media forms. It provides a interfaces for filtering, searching, and viewing bibliographic records. My role was to create the data schema and web application using MySQL and Django. I also created a unique interface, akin to a network graph, that allows users to explore influences between artifacts in the database across time.

Reading the First Books: Multilingual, Early-Modern OCR for Primeros Libros 2016-2017

Principal Investigator: Hannah Alpert-Abrams
My Role: Lead Developer for TAMU contributions

This project was a two-year, multi-university effort to develop tools for the automatic transcription of early modern printed books. My role was to adapt the "eMOP Dashboard" (see eMOP project below) to allow a team of scholars from UT Austin to launch massively parallel OCR tasks on the Brazos Supercomputing Cluster. This eventually entailed a complete rewrite of the software, culminating in the basis for Corpora. This allowed the Primeros Libros team to perform OCR on every printed volume published in the Americas before 1601 using cutting edge neural network technology.

The Early Modern OCR Project (eMOP) 2012-2014

Principal Investigator: Laura Mandell
My Role: Graduate Assistant Researcher

This ambitious project involved training the Tesseract 3 OCR engine to produce automated transcriptions for over 46 million page images from documents published 1475-1800. My contributions to the project involved initially architecting and populating the relational database schema for the project, which evolved over time to store information about 45 million page images for historically printed documents published between 1500-1800. I also developed a tool called Franken+ which allowed the eMOP team to train the Tesseract OCR engine using a visual interface.

Selected Presentations

2025

“Corpora: A Dataset Studio for the Digital Humanities.” Cultures of Correspondence Symposium at Texas A&M University in College Station, TX.

2024

“Corpora: A Dataset Studio for the Digital Humanities.” TxDH Symposium at Baylor University in Waco, TX.

2024

“Corpora: A Dataset Studio for the Digital Humanities.” DH Inside Out, a pre-conference workshop for Digital Humanities at the Roy Rosenzweig Center for History and New Media in Arlington, VA.

2023

"Stratocumulus: A Network Graph Interface for Browsing Big Data." Making Links, University of Guelph, Canada. Co-presented with Akseli Palén.

2022

"Cockyboo: Archiving Harvey Matusow’s Journey from Red Baiter to Mr. Rogers." American Literature Association, Chicago, IL. Co-presented with Nick Kocurek.

2019

"Introducing the ESTC21: Converting the English Short Title Catalogue to Linked Data, Original Goals and Lessons Learned." Consortium of European Research Libraries Annual Seminar, Göttingen, Germany. Co-presented with Brian Geiger.

2017

"'So yo then man what’s your story?': David Foster Wallace, Paul Ricoeur, and Narrative Identity." Ricoeur Studies, Boston, MA. Co-presented with Greg McKinzie.

2017

"Breakdowns in Machine Reading: Attempting to De-privilege Modern English Print with the Power of Supercomputing and the DH Dashboard." Digital Frontiers at University of North Texas in Denton, TX.

2017

"The Psalter Project: Providing Mediated Access to Religio-Political Subjects in Early Modern England." Digital Humanities at McGill University in Montreal, Canada. Co-presented with Dr. Nandra Perry.

2016

"Enabling Enterprise Web Services with Asynchronous Job Queues." Texas A&M Tech Summit in Galveston, TX.

2013

"Early Modern OCR Project (eMOP) at Texas A&M." Document Engineering in Florence, Italy. Co-presented with Katayoun Torabi.

Bryan Tarpley, Ph.D.

Education

Employment

Other Appointments

Major Projects

Selected Publications

Selected Presentations

Professional Development

Awards

Pedagogy

Technical Skills