Origins of Humanities Computing

math equations banner titled Origins of Humanities Computing

Punched Cards and a Concordance program

The 56 volume Index Thomisticus, a complete lemmatization of the works of Saint Thomas Aquinas and of a few related authors

The aim of this project was to catalogue all the words appearing in Aquinas’ works, with cards mentioning the location of the word in the text, along with a quotation of the sentence containing the word. 

The inferences made were a revelation in the context of the word making a difference in the its cataloguing. For example, both praesen and praesenti mean presence, but the significance varied. In Latin also, one word could have many different meanings.
The cultural objective was to understand the author’s mind. Why would Shakespeare use, or make up the words he did is a hotly debated discussion. Naturally, understanding a scholar’s mind is the a fundamental tenet in the humanities. Robert Busa was now looking for mechanical aid, as with everything tech, to speed up the process. He had completed only 10,000 words, and there was a lot more to be done.

In an interesting mythical anecdote, Father Busa recollects that Thomas J. Watson (The CEO of IBM and yes, the Watson, IBM’s Watson is named after) had a report deeming the concordance program impossible to bring to fruition. Then Father Busa pointed towards an old IBM poster with the slogan

“The difficult we do right away; the impossible takes a little longer”.

Watson stood by this, on the condition that IBM would remain International Business Machines and not “International Busa Machines”. I love this kind of stories simply because of their good marketing value. Instant human touch, personal experience in a pretty momentous conversation in the history of Humanities Computing. 
The first impediment was IBM machinery. Back then, in the burgeoning days of eighty characters recorded on a card could fit only one line of Aquinas’ hendecasyllabic poetry. That’s a line of eleven syllables. Eleven syllables per card was not nearly enough. Couple that with the processing time and the kind of quality Busa was expecting, a side project was born – The Dead Sea Scrolls project. Now, instead of punch cards, the progression to magnetic tapes was made. 

Now the idea in the old system of IBM705? was to first sync up the text – phrase by phrase, locate the phrase, break down the sentence into words, locate the word, note the last letter of the preceding word and first word of the subsequent word, the number denoting the location, followed by a special character. Now the duplicates had to eliminated. Note that the duplicates could be same words with different meanings. Father Busa elaborates in Inquisitiones Lexicologicae, that each card had to be understood or interpreted by the machine. In the case of the Dead Sea Scrolls project, there were many re-writing attempts by the data-processing machine, especially when white space or missing words were encountered. 

Paul Tasman from IBM helped father Robert Busa in linguistic automation. Together they formed the the “Centro Automazione Analisi Linguistica” (CAAL), the “Comitato Promotore” and the “Collegio d’Iniziativa” to monitor the outcome of lemmatization as they put together the works of Thomas Aquinas. 

If you are wondering what lemmatization exactly entailed in the branch of linguistics, it is the root of a word – identified by the word’s lemma, or dictionary form. It is not to be confused with stemming, which does not care for the context.  For example, if you enter the word “art” in the Google search bar, the prefix or suffix to the word “art-ist”, this can be considered Lemmatization, while “articulate” is not, even though it contains “art”. It is based on this complex NLP that Busa and Tasman’s team tried to come up with at least a semi-automatic way to categorize words by their dictionary heading. No wonder the Index Thomisticus took the better part of 4 decades to publish.

Slowly, but surely, this indexing and coding technique from a literature searching engine was comparatively faster than hand-written cards. The new era od language engineering was here. Father Busa expected these to be improved for sophisticated use in libraries and analysis. He is called IBM’s pivot point for providing the trigger to make the impossible happen. 

Essentially, throughout the history of Humanities Computing, scholars took a look at what technology was affecting the industrial fields, and applied the same in humanities context. 
Hewlett-packard’s Packard Institute of Humanities also concentrated its efforts similarly.
In fact if one were to point out major breakthroughs in Computing, they will eerily coincide with Humanities Computing. The mini timeline within this particular project can also be traced back from the World Wide Web.

DH TIMELINE.JS


1989 World Wide Web Tim Berners-Lee/ CERN
1985Perseus Project 
1980Ibycus mini mainframeDavid Packard
1974Index ThomisticusR. Busa, CAEL
1972TLG Planning Committee
1968David Packard’s program-produced Concordance to Livy
1967Thomas Aquinas text card-punching completed.Robert Busa
1957(Published) Dead Sea Scrolls machine-readable.Robert Busa
1957Magnetic-tape assisted Bible Concordance. J. Ellison/ Remington Rand
1957FORTRAN made public
1953Computers made public (IBM Ships IBM 70 1)
1952Programming invented.

1951Machine-generated concordance
R. Busa/ IBM
1946ENIAC (electronic tube computer)
U.S.Army.

1943MARK I,Electronic relay computer.IBM/ Harvard
1890U.S. Census recorded on punch-cards
Hollerith founds IBM predecessor

1884Punch-cardsHerman Hollerith patent

By the 1960s, other researchers began to index their texts of interest. 

Early Middle High German textsRoy Wisbey
Matthew Arnold and W B. Yeats poemsStephen Parrish

At around the same time, computing facilities became mainstream, not only in educational institutions, but also in research centers around the world, but principally around Europe. 

Examples include the Trésor de la Langue Française (Gorcy 1983), which was established in Nancy to build up an archive of French literary material, and the Institute of Dutch Lexicology in Leiden (De Tollenaere 1973). Consolidation was the main buzzword for all Humanities Computing works well into the mid 1980s.

The next step came with knowledge representation and semantic evaluation in visual form. 

Benefits of Apple’s Mac in the 1990s for Humanities Computing:

  • GUI
  • Hyper-card
  • Simple programming tool

With Apple’s Macintosh systems, the attraction to the graphical user interface proved to be of paramount importance for displaying special characters from different languages. The HyperCard enabled linking between the cards. Soon, the collection of these consolidated documents came to be called ‘archives’.  

References

Winter, Thomas Nelson, “Roberto Busa, S.J., and the Invention of the Machine-Generated Concordance” (1999). Faculty Publications, Classics and Religious Studies Department. 70.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.