[apologies to the typosphere for forsaking the Selectric for some o’ this digital ink. I’ll soon make it up to you with typewriter-movie-eye-candy.]
This post describes my efforts to use software that analyzes visual images in order to “re-visualize” Brad Bird’s 1999 animated film The Iron Giant, which was adapted from Ted Hughes 1968 children’s novella. By “re-visualize” I have in mind two things. First, it does not make quite as much sense to talk of “visualizing” a film text as it does to talk about other visualizations such as visualizations of census data, or of place names mentioned in novels, or of word frequencies in political speeches. Since a film (along with, most often, its audio track) operates primarily by visual means, we should recognize the film itself as already a “visualization.” Just as with data visualization, which demonstrates according to Manuel Lima a
broad palette of visual elements and variations that consider color, text, imagery, size, shape, contrast, transparency, position, orientation, layout, and configuration [and yet still] follow noticeable trends and common principles . . . in a type of emergent taxonomy,
film from its earliest moments has been consistently spoken of as having a language and grammar that serves to isolate and emphasize details in order to make sense of a complex world; indeed, almost all of the words Lima uses to describe principles of data visualization are the same ones used in introductory film texts to describe mise-en-scene (barring, perhaps, “transparency,” although that wouldn’t be out of place in discussing montage).
Second, by “re-visualization” I have in mind critic Victor Shklovsky’s discussion of ostranenie, which grounds his exhortation that:
The purpose of art is to impart the sensation of things as they are perceived and not as they are known. The technique of art is to make objects ‘unfamiliar,’ to make forms difficult, to increase the difficulty and length of perception because the process of perception is an aesthetic end in itself and must be prolonged.
In this, I am endorsing a claim that interpretation is art: that (re-)visualizing critics are, like the texts they study, involved in an effort to “impart the sensation of things as they are perceived” by taking the text and making it unfamiliar (just as for Shklovsky the text should take the too-familiar world and render it unfamiliar through artistic presentation). I am coming to realize that the aesthetic dimension of contemporary data visualizations comes precisely from the fact that, rather than return a defamiliarizing art back to reality (that is, to return our perceived “sensation of things” back to an intellectualized “how things are known”), visualization interprets art into a third direction (or “third meaning,” shades of Barthes), where the real world, having been presented in defamiliarized terms through art, is once again defamiliarized through a method of art/criticism which at once extends originally from the work of art but at the same time spins back out from the real world with which it is connected.
OK, how about a gif?
Mathilde Lesueur, animation. “Evolutionary schema showing relationships between characters in Eric Rohmer’s Pauline à la plage.” <http://mathildelesueur.com/spip.php?article20>.
Back to The Iron Giant. I became interested in this film and presented a paper on it at MLA 2009. At the time, I was using a psychoanalytic model to account for film adaptations of children’s literature. Children’s stories are often filled with “living dolls” and other inanimate objects that come to life, move, or talk (e.g., Pinocchio, the Tin Woodman, Toy Story, or WALL-E). While a welcome feature of childish narratives, such uncanny figures are horrific in adult literature and film (e.g., Child’s Play, the Puppet Master series, and most sci fi involving robots). In adapting children’s stories to cinema, filmmakers actualize children’s fantasies of “living dolls,” and I wondered in general how this process worked. Freud puts the puzzle like this: “Children have no fear of their dolls coming to life, they may even desire it.” I was also specifically curious about one particular narrative change (of many) that The Iron Giant made to the source text: in both, the Iron Giant’s body is fragmented and he must put himself back together, however while the novella begins with a fragmented Giant, the film runs this process in reverse, beginning with an intact Iron Giant and ending with the robot in pieces. This theme of the fragmented body and the image of body parts in autonomous motion are at the heart of cinematic practice, which also constructs the illusion of coherent space and time by putting together discrete fragments, and in order to escape an uncomfortable, uncanny meaning, it seems that the film must avoid a discussion of the machine being’s fragmentary origins and begin instead from an illusion of coherence.
That was as far as I got trying to turn my conference paper into a publishable article (new job, new house, new baby, etc.). Now, I’m hoping that by re-visualizing the film, I can take better account of the fragmentation of the Giant and of the space of the film (viz., the various locations where actions occur). Hughes subtitled his novella “A Children’s Story in Five Nights,” and I likewise wanted to see if I could take the theme of temporal fragmentation as a way to account for the process of adaptation. I have already seen the film one way (a number of times): chronologically frame-by-frame. Now I want to see it again in new ways (in literally new ways, not just new perceptions via the old way). Imagining the film arranged not as a succession of frames on a strip, but stacked up in a cube, I want to watch the film sideways, and see what comes of it.
Here is a step-by-step account of what I did to prepare the film for analysis. It is a combination of following instructions and being dunderheaded, with one very useful insight born of trial and error.
1) Working from a digital copy of the film, I used Quicktime’s “Export” feature to convert the movie into still images. The digital copy had a frame rate of 29.97 frames per second, so I chose this same ratio when exporting, resulting in 139,517 png image files. This was undoubtedly overkill. I could have chosen less (such as one frame per each second) or more (such as 100 frames per second), but the latter would have simply resulted in redundant frames and the former might have missed shot changes that occurred in less than one second. I should note that while this method approximates the conventional technique of scanning individual frames from a long strip of film, in reality my source material was a compressed digital intermediary that works by comparing reference frames to changes between frames, rather than what happens with film-based materials, where each new frame is an entirely, well, new frame. This poses a subtle philosophical problem for scholars working on visual analyses of films, since issues of compression and interpolation, although perhaps not noticeable or statistically significant, are nonetheless present. Having said all that, I have to turn right back around and note that The Iron Giant is an animated film and thus was itself never “real” (in the limited sense of the word)–the film was animated using a combination of 2D and 3D animation. [I should also note, in case you could not tell, that I lack a fundamental understanding of how Quicktime’s “Export” feature works and what it actually does when I ask it to generate “29.97 frames per second.” No doubt I should have just picked “30 frames per second,” which was the number I later used when calculating shot lengths.]
2) After removing the end credits and initial frames announcing production and distribution companies, I manually went through the files to identify shot changes. Stepping through 130,000-plus files was as time-consuming as it sounds, but in holding the “down arrow” as I browsed through the image folder I essentially watched the film in slow motion. When I noticed a shot change, I marked the first and last frame of each shot by appending an “a” or “b” respectively to the file name.
3) While I was scrolling through the film, I also noted some basic information about each shot in a separate spreadsheet: whether it was XCU, CU, MS, LS, or XLS (or other); whether it was Still, Track, Pan, Zoom (or other), and what the Transition out of the shot was (cut, dissolve). Last, I kept track of two other parameters that I hoped to use to redirect the film’s chronological organization: the Location of the shot (i.e., “The Town,” “At Sea,” “Barn,” “The Woods,” etc), and which characters were in each shot. While it was simple to predetermine the choice of basic shot information, I needed to expand the categories for Location and Character as I went along.
4) A quick sort enabled me to create a new folder with just the first frame of each shot in the film (I will going to get to the last frames later, so they’re in another folder). I ended up with 1,174 shots, and because the files retained the sequential numbering from the original 130,000-plus images, a quick Excel formula gave me a list of all the shot lengths (and the average shot length: 4.04 seconds).
Thus far I have actually followed a very traditional method for “marking up” a film. Other than working with a digital text, I have done very little that could not have been accomplished before the advent of computers. My next step, however, is novel and takes advantage of new means of computer-aided data visualization.
5) To analyze and display the sequence of film shots, I used the image processing program ImageJ and in particular custom macros developed by Lev Manovich and Software Studios along with QTIP. All of this software is freely distributed online, and links to download and other help documentation is here: http://lab.softwarestudies.com/p/software-for-digital-humanities.html. The central feature for my purposes was the ImagePlot macro, which creates a sort of visual bivariate spreadsheet of data–a scatterplot, say, but with actual images instead of just circles for data points.
ImageJ works by referring to a folder of images (whether film stills, paintings, or manga) and to a database stored as a CSV or excel file. Accompanying the ImageJ package linked to above, there are additional pieces of software that help create this database, so that users can automatically generate information about a batch of images, such as their hue, saturation, and brightness values; their contrast and intensity/brightness; and even more exotic measures like “number of shapes” and “entropy.”
Importantly for my purposes, users can easily modify the resulting database in a spreadsheet program. So, after using the ImageJ program and QTIP to automatically analyze my folder of frames, I combined those results with the results I had previously come up with regarding Shot Parameters, Characters, and Locations. The result was 122,000+ cells of Excel data:
6) ImageJ works with numbers, not letters; it wasn’t going to know what to do with “CU” or “Diner” or “Annie.” Thus, I needed a system to code the variables I had myself identified into the database. I did this simply by assigning a whole number to each of the 13 Characters I identified and to each of the 17 Locations.
One additional limitation of ImageJ was that each cell must only have one number in it. Easy enough for “Locations,” since all shots by necessity take place in only one location (OK, there is actually one shot when the Giant shoots his weapon which begins in the town and ends at sea, and it is not difficult to imagine a film with many complex shots that begin and end in different locations, much less films that use split-screen effects à la lots of De Palma or Figgis’s Timecode). But I ran into a problem with Characters, since naturally many shots have more than one character, and thus I would need to have more than one number associated with the data row for that shot. Luckily, however, ImageJ has no problem passing through data that is repetitive. So, in the instances where there were, say, two characters in one shot that was analyzed in Row 100, I simply marked one character for Row 100, and then copied the contents of that row at the bottom of the spreadsheet and marked the second character there. Via cut-and-paste and paying careful attention, I created a sort of a “manual recursivity” (and a ridiculously complex database). Since ImageJ works its way down the database row by row, once it reaches the end of the original data set, it simply starts again on the second (or third or more) Characters/Locations/whatnot.
Some Initial Results
Here are some of the resulting visualizations. (Note, the original files are TIFFs each well over 100MB, so I have reduced them to JPEGs for this blog. Still, by clicking on the image you should be able to explore each visualization and make out the shot.)
First are visualizations where the film’s chronology is more-or-less maintained; the frames are ordered in sequential order along the x-axis, and thus changes or patterns in the y-axis variable are easy to spot.
Changes in Median Hue:
Changes in Median Brightness:
Changes in Shot Length:
Changes in Entropy:
These are all examples of a kind of time-series plot. In the Visual Display of Quantitative Information, Edward Tufte discusses the problem with time-series: “the simple passage of time is not a good explanatory variable: descriptive chronology is not causal explanation” (37). Narratologists might debate this point regarding the above visualizations, where a film’s narrative and visual unfolding is controlled in a way that is calculated to produce recognizable effects. This is most apparent in the second visualization (median brightness), where there is a noticeable periodic alternation between light and dark.
Next are visualizations where the x-axis variable is not chronological, but simply another data point. Tufte calls these “relational graphics,” “the greatest of all graphical designs” since they “encourag[e] and even implor[e] the viewer to assess the possible causal relationship between the plotted variables” (47). Unlike the time-series plots which present causal arguments, these next visualizations emphasize relationships between variables.
Median Brightness (x-axis) vs. Median Hue (y-axis):
Shot Length (x-axis) vs. Entropy (y-axis):
Many of these relational visualizations are not all that surprising for anyone with a passing familiarity with Hollywood films–of course there will be limited clusters of similarly-hued shots since narrative films strive to create a series of visually coherent spaces, and hue (background color, set design, costuming, color palette) is an expression of that.
Last are the visualizations I am most interested in further exploring–those that plot character or location on one axis in an effort to identify patterns or outliers along the other. I would expect there to be a noticeable difference in, say, “hue” when looking at shots taking place in The Woods (green) vs. The Barn (brown), but what about brightness, or shot length, or even a more general idea of how the film moves chronologically through various spaces?
Location (x-axis) vs. Entropy (y-axis):
Location (x-axis) vs. Shot Length (y-axis):
Shot number, i.e. chronology (x-axis) vs. Location (y-axis):
Looking at these graphs, particularly the last one (whose y-axis I forgot to label), I see the beginnings of a new kind of mapping of film space. While I am suspicious of the value of, say, Standard Deviation of Image Saturation in developing a critical argument about a particular film, the ability to use even an artificial-seeming method of sorting a huge number of images and identifying patterns or outliers is valuable. By hand-identifying the parameters I am interested in–locations in this case–these kinds of visualizations make it easier to see how a film’s use of a locations is or is not internally consistent (well, maybe).
A Few Future Questions
I need to figure out a better method for handling the co-presence of multiple qualities in a single shot (i.e., multiple entries in a single cell, like characters).
Are the shots “true”? 624 of the 1,174 shots are static, which means that just under half of the shots “ended up” someplace different than they began. By just analyzing the first frame of each shot, I am looking at a different picture than if I were to analyze the last frame (or for that matter to analyze every frame). Analyzing every frame might give a truer “picture” of the film in a conventional sense, but such a method would obscure a film that had a more ambulatory camera; comparing analyses of first and last shots might draw out differences in such films. I need to imagine the best way to do that. I also imagine that this kind of analysis might be just as useful (albeit in a different way) as the old-fashioned marking of camera movements with “dolly right,” “pan left,” or “static.” In other words, if the first and last shot differ significantly, camera movement can be inferred.
One of my students, Matt Power, imagined using ImageJ to trace a line where the Giant and Hogarth intersect over time throughout the film. My first response was “impossible, since that’s three variables–time, character X, and character Y–and we can only work with two.” On the car ride home, though, I realized it would be possible to pass three variables through ImageJ. I need to look into that further.
I need to do some simple, analogous, processing of the text of Ted Hughes The Iron Giant. The next step of my research will involve identifying and pursuing an appropriately analogous method for analyzing the text so as to draw comparisons between the film and novella. The conceptual artist side of me wants to fire up my scanner and see what QTIP can do with scanned pages.
This would be infinitely more useful if I were able to compare visualizations of The Iron Giant with data sets from other films (sorted, perhaps, into categories like “animated films,” children’s movies, films from 1990s). I knew that before I started, so I’m not sure it’s a future question so much as a problem of scale.
This will never be a truly digital project, and the thought of scaling up to an analysis of even a dozen films is daunting considering the amount of labor still required to identify film components such as character or location. However, it is my belief that a hybrid method of hand and digital film analysis is worth the effort. Personally, as a stickler for details, I still do not have a good enough grasp on the principle of “good enough,” or how much human effort is required in order to achieve meaningful results. On the one hand, there is an aesthetic dimension to knowing I have reverse-engineered the film’s editing pattern; on the other, it took me quite a few weekends to finish stepping through every film frame.
 I cannot get it to work, and not sure I would trust it if I could, but here is a link to “Shotdetect . . . a free software (LGPL) which detects shots and scenes from a video.”
 Again, if you’re not able to tell yet: I actually have no real understanding of how ImageJ “works.”
 A million qualifications to this thought–the shot whose camera begins and ends in the same place but roves throughout the shot; the shot which begins one place and ends another, but settles for a time on a third or fourth place.
 “Takeaway” is a corporate word my wife has infected our home with–she says they would always end meetings with a “takeaway,” an answer to the “what’s the point” question.