in Work

Supercharged culture-geeking

This past Monday, IBM and the Office of Digital Humanities of the NEH convened a bunch of smart folks to talk about what humanities scholars would do with access to a supercomputer, real or distributed. I had been looking forward to this discussion for months, if not years in the abstract. It was a wonderful convergence of two of my life interests.

We had a broad representation of disciplines — a librarian, a historian, a few English profs, an Afro-American studies professor, some freakishly accomplished computer scientists, and a bunch of “general unclassifiable” folks who perfectly straddle the worlds of technology and culture.

The Library of Trinity College, Dublin

The topic was how to get scholars thinking in terms of problems that require high performance computing to solve. The NEH is out front on this, partnering with the Department of Energy (of all places) who run most of the world’s fastest machines (modeling nuclear blasts, etc) and who have graciously and enthusiastically offered time on their monster boxes to humanist scholars. IBM is in the mix too (besides being the maker of said monster boxes, e.g. Blue Gene) with our World Community Grid project, a distributed “virtual” supercomputer.

The grid has about a million devices on it and packs some serious processing power, but to date the only projects that have run on it have been in the life sciences. We were trying to think beyond that yesterday.

My job was to pose some questions to help form problems — mostly because, outside the sciences, researchers just don’t think in terms of issues that need high performance computing. But that doesn’t mean they don’t exist. It’s funny how our tools limit how we even conceptualize problems.

On the other hand you might argue that this is a hammer in search of a nail. OK, fine. But have you seen this hammer?

Here’s some of what I asked:

Are there long-standing problems or disputes in the humanities that are unresolved because of an inability to adequately analyze (rather than interpret)?
Where are the massive data sets in the humanities? Are they digital?
Can we think of arts and culture more broadly than typical: across millennia, language, or discipline?
Is large-scale simulation valuable to humanistic disciplines?
What are some disciplinary intersections that have not been explored for lack of suitable starting points of commonality?
Where is pattern-discovery most valuable?
How do we formulate large problems with non-textual media?

I also offered some pie-in-the-sky ideas to jumpstart discussion, all completely personal fantasy projects. What if we …

Perform an analysis of the entire English literary canon looking for rivers of influence and pools of plagiarism. (Literary forensics on steroids.)
Map global linguistic “mutation” and migration to our knowledge of genetic variation and dispersal. (That’s right, get all language geek on the Genographic project!)
Analyze all French paintings ever made for commonalities of approach, color, subject, object sizes.
Map all the paintings in a given collection (or country) to their real world inspirations (Giverny, etc.) and provided ways to slice that up over time.
Analyze imagery from of satellite photos of the jungles of southeast Asia to try to discover ancient structures covered by overgrowth.
Determine the exact order of Plato’s dialogues by analyzing all the translations and “originals” for patterns of language use.

(Due credit for the last four of these go to Don Turnbull, a moonlighting humanist and fully-accredited nerd.)

Discussion swirled around but landed on two major topics both having to do with the relative unavailability of ready-to-process data in the humanities (compared to that in the sciences). Some noted that their own data sets were, at maximum, a few dozen gigabytes. Not exactly something you need a supercomputer for. The question I posed — where is the data? — was always in service of another goal, doing something with it.

But we soon realized that we were getting ahead of ourselves. Perhaps the very problem that massive processing power could solve was getting the data into a usable form in the first place.

The Great Library of the Jedi, Coruscant

At present it seems to me — I don’t speak for IBM here — that the biggest single problem we can solve with the grid in the humanities isn’t discipline-specific (yet), but is in taking digital-but-unstructured data and making it useful. OCR is one way, musical notation recognition and semantic tagging of visual art are others — basically any form of un-described data that can be given structure through analysis is promising. If the scope were large enough this would be a stunning contribution to scholars and ultimately to humanitiy.

The possibilities make me giddy. Supercomputer-grade OCR married to 400,000 volunteer humans (the owners/users of the million devices hooked to the grid) who might be enjoined to correct OCR errors, reCAPTCHA-style. Wetware meets hardware, falls in love, discuses poetry.

The other topic generating much discussion was grid-as-a-service. That is, using the grid not for a single project but for a bunch of smaller, humanities-related projects, divorcing the code that runs a project from the content that a scholar could load into it. You’d still need some sort of vetting process for the data that got loaded onto people’s machines, but individual scholars would not have to worry about whether their project was supercomputer-caliber or what program they would need to run. In a word, a service.

Who knows if either of these will happen. It’s time now to noodle on things. As always, if you have ideas for how you might use a humanitarian grid to solve a problem in arts or culture, drop a line. We’re open to anything at this point.

A few months ago Wired proclaimed The End of Theory, basically noting that more and more science is not being done in the classical hypothesize-model-test mode. This they claim is because we now have access to such large data sets and such powerful tools for recognizing patterns that there’s no need to form models beforehand.

This has not happened in arts and culture (and you can argue that Wired overstated the magnitude of the shift even in the sciences). But I have to believe that access to high performance computing will change the way insight is derived in the study of the humanities.

« Previous post

Marginalia

Stuff I’ve found interesting from around the web lately.

2024 Commencement Address by Roger Federer at Dartmouth

Tennis great-turned-philanthropist Roger Federer delivered the Commencement address at Dartmouth on June 9, 2024. The eight-time Wimbledon champion gave pointers on how to win at life. Federer received a Doctor of Humane Letters degree at the Commencement ceremony. More from our 2024 Commencement:

Deep-ocean floor produces its own oxygen

An international team of researchers, including a Northwestern University chemist, has discovered that metallic minerals on the deep-ocean floor produce oxygen -- 13,000 feet below the surface.

One Small Controversy About Neil Armstrong’s Giant Leap

When, exactly, did the astronaut set foot on the moon? No one knows. The Apollo 11 mission was, in most respects, a feat of extraordinary precision.

Welcome to Fright Club: How I Fought the Greatest Horror Movie Villains of All Time

The Ringer’s resident survivalist watched more than 200 horror movies to figure out the unimaginable: Could he take Freddy, Jason, and dozens of other villains one-on-one?

I Will Fucking Piledrive You If You Mention AI Again — Ludicity

The recent innovations in the AI space, most notably those such as GPT-4, obviously have far-reaching implications for society, ranging from the utopian eliminating of drudgery, to the dystopian damage to the livelihood of artists in a capitalist society, to existential threats to humanity itself.

The Lunacy of Artemis

A little over 51 years ago, a rocket lifted off from Cape Canaveral carrying three astronauts and a space car.

Hear the Song Written on a Sinner’s Buttock in Hieronymus Bosch’s Painting The Garden of Earthly Delights

There’s something unusually exciting about finding a hidden or discreetly placed element in a well-known painting.

The Cicada’s Love Affair With Prime Numbers

As far back as the seventeen-hundreds, fur trappers for the Hudson’s Bay Company noted that while in some years they would collect an enormous number of Canadian lynx pelts, in the following years hardly any of the wild snow cats could be found—until, some years later, when the trappers found th

On the Implausibility of the Death Star’s Trash Compactor

The Trash Compactor Debate turns on whether the Death Star ejects its trash into space. I, for one, believe it does. Though we never see the Death Star ejecting its trash, we do see another Empire ship, the so-called Star Destroyer, ejecting its trash into space.

When Do We Stop Finding New Music? A Statistical Analysis

I recently tried Spotify's new DJ feature in which an AI bot curates personalized listening sessions, introducing songs while explaining the intention behind its selections (much like a real-life disc jockey).

Supercharged culture-geeking

Hi, I’m John Tolva!

The Ampcamper

The Terror Tourist

Views From The Tank

Latest Photos

Marginalia

2024 Commencement Address by Roger Federer at Dartmouth

Deep-ocean floor produces its own oxygen

One Small Controversy About Neil Armstrong’s Giant Leap

Welcome to Fright Club: How I Fought the Greatest Horror Movie Villains of All Time

I Will Fucking Piledrive You If You Mention AI Again — Ludicity

The Lunacy of Artemis

Hear the Song Written on a Sinner’s Buttock in Hieronymus Bosch’s Painting The Garden of Earthly Delights

The Cicada’s Love Affair With Prime Numbers

On the Implausibility of the Death Star’s Trash Compactor

When Do We Stop Finding New Music? A Statistical Analysis

Recently Read

Categories

Archives

Featured Posts

Terrible Lizards

Land of Fire and Ice

Doom Shanty