We’ve Got Company!

Over on Twitter, @chopperfireball asks:

My thought: perfect question for a digital humanities approach to cinema and media studies.

I’ve been playing with the data dump of subtitles from OpenSubtitles.org, and in particular the English corpus files from here: a 6.5GB folder of subfolders of subfolders of zipped xml files. So, why not search the OpenSubtitles corpus to see what matches we get for “We’ve got company,” and then crosscheck with IMDb to find the earliest?

I decided to just search “got company,” on the assumption that “We’ve” might limit the results too much: it would exclude “I’ve got company,” “you’ve got company,” “you have got company,” “Weve got company,” and other subtitle font encoding weirdness that a contraction might introduce.

My first problem was that the subtitles folder has 119 subfolders (for years), each with more subfolders and files within. So I needed to search recursively. The second problem is that I need to search these files without unzipping each one first. I used this command in Terminal to do so:

find . -iname ‘*.gz’ -exec zgrep “got company” {} +

Here are what the results looked like:

./0/26801/3470214_1of1.xml.gz:We got company.
./0/26801/3491073_1of1.xml.gz:- We got company.
./0/26801/3573503_1of1.xml.gz:We got company.
./0/26801/3599826_1of1.xml.gz:- We got company!
./0/26801/3631978_1of1.xml.gz:Guys, we’ve got company.
./0/26801/3920449_1of1.xml.gz:Wake up, you’ve got company.
./0/26801/3951555_1of1.xml.gz:We got company.
./0/26801/3951555_1of1.xml.gz:We got company.
./1901/11382/128016_1of1.xml.gz:You’ve got company, Trip.
./1909/10696/118940_1of12.xml.gz:- You got company?
./1911/63709/4026450_1of1.xml.gz:Santiago has got company
./1912/11913/216415_1of1.xml.gz:So, you’ve got company.
./1913/37828/3308112_1of1.xml.gz:We’ve got company.
./1929/64395/4061542_1of1.xml.gz:we got company.
./1930/27386/3248842_1of1.xml.gz:Oh, you got company.
./1930/27386/3440567_1of1.xml.gz:Oh, you got company.
./1933/1234/2102_2of2.xml.gz:I’ve got company right now.
./1933/1234/3122453_1of1.xml.gz: I’ve got company right now.
./1933/1234/3133430_1of3.xml.gz:I’ve got company right now.
./1933/1234/3133430_3of3.xml.gz:I’ve got company right now.
./1933/1234/3137076_2of2.xml.gz:I’ve got company right now.
./1933/1234/3211259_2of2.xml.gz:I’ve got company right now.
./1937/47447/3512534_1of1.xml.gz:but we got company.
./1937/7104/240514_1of1.xml.gz: I’ve got company.
./1937/7104/3094233_1of1.xml.gz: I’ve got company.
./1937/7104/3223919_1of1.xml.gz: I’ve got company.
./1937/7104/93813_1of1.xml.gz: I’ve got company.
./1938/1000/3148641_1of1.xml.gz:I see we’ve got company.
./1938/1000/3344629_1of2.xml.gz:I see we’ve got company.
./1938/1000/3506669_1of1.xml.gz:I see we’ve got company.
./1938/1000/68331_1of1.xml.gz:I see we’ve got company.
./1938/38647/3478797_1of1.xml.gz:But you got company.
./1938/38647/3478802_1of1.xml.gz:But you got company.
./1938/7760/3087673_1of1.xml.gz:We’ve got company tonight.
./1938/7760/3213587_1of1.xml.gz:We’ve got company tonight.
./1938/7760/3213588_1of1.xml.gz:We’ve got company tonight.
./1938/7760/3361406_1of1.xml.gz:We’ve got company tonight.
./1938/7760/3822692_1of1.xml.gz:We’ve got company tonight.
./1938/7760/79317_1of1.xml.gz:We’ve got company tonight.
./1938/7760/97751_1of1.xml.gz:We’ve got company tonight.
./1940/21299/3090625_1of1.xml.gz:- Looks like we got company, Shorty.
./1940/21299/3090625_1of1.xml.gz:- You got company.
./1940/21299/3092212_1of3.xml.gz:- Looks like we got company, Shorty.
./1940/21299/3092212_1of3.xml.gz:- You got company.
./1940/21299/3092212_3of3.xml.gz:- Looks like we got company, Shorty.
./1940/21299/3092212_3of3.xml.gz:- You got company.
./1940/24924/3121380_2of3.xml.gz:We got company.
./1940/24924/3121380_3of3.xml.gz:We got company.
./1941/35861/3283752_1of1.xml.gz:Oh, he’s got company.
./1941/53816/3608420_1of1.xml.gz: He got company
./1941/53816/3608422_1of1.xml.gz: He got company
./1941/9150/101449_1of1.xml.gz:We’ve got company.
./1941/9150/158082_2of2.xml.gz:We’ve got company.
./1941/9150/94986_2of2.xml.gz:We’ve got company.
./1942/1005/1738_1of1.xml.gz:Looks like we’ve got company.
./1942/1005/3592893_1of1.xml.gz:we’ve got company.
./1942/1005/97106_1of1.xml.gz:Looks like we’ve got company.
./1942/1005/97121_2of2.xml.gz:Looks like we’ve got company.
./1942/40517/3359340_1of1.xml.gz: We got company.
./1942/40517/3368050_1of1.xml.gz: We got company.
./1942/40517/3428769_1of1.xml.gz: We got company.
./1942/40517/3437623_1of1.xml.gz: We got company.
./1944/2230/132133_1of1.xml.gz:You got company?
./1944/2230/28273_1of1.xml.gz:You got company?
./1944/2230/3081278_1of1.xml.gz:You got company?
./1944/2230/3284537_1of1.xml.gz:You got company?
./1944/2230/3305330_1of1.xml.gz:You got company?
./1944/2230/3671542_1of1.xml.gz:You got company?
./1944/2230/48650_1of1.xml.gz:You got company?
./1944/2230/48746_1of1.xml.gz:You got company?
./1944/2230/87194_1of2.xml.gz:You got company?
./1944/2230/87194_2of2.xml.gz:You got company?
./1944/789/3119319_1of1.xml.gz:He’s got company!
./1945/29969/3166121_1of1.xml.gz:We’ve got company, take off cloth now
./1945/37749/3369358_1of1.xml.gz:We’ve got company.
./1946/27947/3163942_1of1.xml.gz:Look, we’ve got company.

There are 76 hits, just going up to 1946, although the repeated entries would be from multiple releases of the same film. Had I an actual background in computer science, I would like to try doing something such as calling on IMDb’s API (or better, OMDB’s API) to parse these results, or better wrangle the XML to call back OpenSubtitles webpages, but my hope is that to answer the question, it’s easy enough to look at the earliest entry (since the first set of subfolders seem to be organized chronologically) to determine in what sense “got company” was being used. But I need to read the OpenSubtitles documentation a little better to understand what the “0” folder is; turns out the first entry in our list is from Terminator 2: Judgment Day (James Cameron, 1991), so I’m guessing the 0 folder is for films they do not have organized in folders yet.

But the next result is from 1901?! Amazingly early result. Let’s see if the sense matches the action trope, or if this is just a butler introducing a guest or something:

Captain, there’s a small craft closing on the shuttlepod.
It’s a patrol ship.
You’ve got company, Trip.
I see them.
Bearing 1-8-4, mark 2-7.

Um, 1901? Wait a minute, why are there even subtitle results for 1901? Turns out this is another mis-filed result; the dialogue is from “The Breach,” a 2003 episode of Star Trek: Enterprise.

Sacrificing thoroughness’s for time’s sake, I’ll skip ahead a bit:

1913’s result is from the 2008 television show In Plain Sight, s1e4, but 1929 is promising: Applause (Rouben Mamoulian, 1929), a grim-sounding film I haven’t yet seen about “a burlesque star [who] seeks to keep her convent-raised daughter away from her low-down life and abusive lover/stage manager.”

Come on in, boys and girls, we got company.
Gee, Kitty, the baby looks great.
Oh, isn’t it cute?
It looks just like Kitty.

You can watch the scene here:

This is certainly not an action film, and “we got company” does not seem to have the same euphemistic or ironic overtones as in later uses, but the menace and threat of just this scene is apparent. Interestingly, the word “company” appears five times in the subtitles for this film, and the last instance is in fact a parsing of the meaning of the word “company” (44 minutes in):

Just looking for somebody to talk to.
Sailors don’t generally have much trouble finding company.
Oh, I don’t want the kind of company you mean.
Listen, all sailors ain’t a bunch of bums.
But you got no idea how tough it is to be all alone and lonesome in a city like this.
Maybe I have.

If you’re interested in further investigation, 1937’s “but we got company” is from the Bing Crosby vehicle Waikiki Wedding (Frank Tuttle, 1937). The next one that looks most promising is 1940 “Looks like we got company, Shorty,” which is from Boom Town (Jack Conway, 1940). If, unlike me, you have a copy at home, fastforward to 10:19-ish and let me know the sense of this exchange:

Well, I didn’t…
Looks like we got company, Shorty.
You got company.
That’s what I thought.
I beat you by four feet, Shorty.
Wait a minute.
I got another chance.

Hopefully by now you’ve realized the major problem with this approach: that the value of this answer is limited by the faith we have in OpenSubtitles’ completeness: both the range of films they have and the thoroughness of the subtitling. But, like any good digital humanities project, this initial result can help confirm or deny our first guesses (no, it’s not Star Wars), and put us on the right track to ask, and answer, even better questions.

Last, for even more of the firehose: see this link.

About Kevin L. Ferguson

Assistant Professor of English and Director of Writing at Queens
This entry was posted in Typecast. Bookmark the permalink.

2 Responses to We’ve Got Company!

  1. Richard P says:

    Very entertaining!

Comments are closed.