Hundreds of thousands of online articles from MTV News were taken offline earlier this summer by its parent company, Paramount. Some observers worried that the content would be lost forever. But, much of it is still available through the nonprofit Internet Archive.
Mark Graham is director of the Wayback Machine at the Internet Archive, and says the organization archives more than a billion URLs every day. Graham says there are lots of reasons why sites can go away — some are benign and others are less so. He joins The Show to talk more about this.
Full conversation
MARK GRAHAM: Sure. Well, I mean, the process is that we are constantly archiving material continuously archiving material from the public web. And we've been doing this for the better part of 28 years. So in actuality, there really was no springing into action. The springing into action started like decades ago.
And over that time — and on a continuous basis — we work diligently to archive large swaths of the public web. In particular, with regard to the MTV News site, we have been archiving that probably since its inception.
MARK BRODIE: Do you try to focus on content that is at risk of going away? It sounds like, in the case of MTV News, you've been doing it for a very long time with no indication that that content would go away.
GRAHAM: Sure. Yeah. No, it's a great question. And I mean, the answer is yes and no. Yes, of course, we do try to focus on exactly the term you used — at-risk information. All right? But one of the challenges is you don't know what you've got until it's gone. So, how does one know what it might be at risk? In theory, anything that's digital or anything at all in any kind of format is theoretically at risk. But there are some things that you might suggest might be riskier — at more risk — than other kind of things. Or at least you might also try to say, well, if we lost this, what might the impact be? Right? Is it of greater significance culturally, historically, or in some other fashion?
But that too is problematic. Because, right? I mean, one person's trash is another person's treasure. So, the answer is: while yes, we do actively pay attention to material that may be published by organizations or governments, etc., that might be at risk of going away, at the same time, we try to focus on everything.
BRODIE: So, what, to you, is the importance of saving all of that content? I mean, you mentioned one person's trash is another person's treasure, so you never know who is going to find what interesting or important, but when you're taking MTV News versus maybe something from a political organization or from a religious organization or a civic organization, what's the importance of having all that?
GRAHAM: Yeah, I guess the importance is because, if it goes away on the web — and there's any one of a number of studies that I could cite. There was a new one — or a recent one — from Pew Charitable Trust that looked at a collection of URLs from about 10 years ago, and they found that for that particular collections, more than 30 percent of those URLs were no longer available on the public web.
So, one way of saying that would be that about a third of the web from this particular collection from 10 years ago is gone. It's unaccessible. The good news is that when we then looked at those URLs, we had archived fully two-thirds of them. So, new headline. Instead of saying a third of the web is gone for that particular set, we could say only a ninth of the web.
But that's still a ninth of billions and billions of things. So, these are pretty big numbers. So, I would say the answer would be we archive this material so that for future generations — and for researchers today, even — the material that might otherwise not be available is accessible, is discoverable, is useful to people.
And, that's our North Star.
BRODIE: I want to ask a philosophical question, which is, for a lot of people, we hear about the “if you post something online, it's there forever,” mostly trying to scare younger people not to take pictures of them doing stupid things that their prospective employers will then find when they're doing a quick search.
But based on what you're saying, that doesn't really seem to be the case. Like, how do you jive the two things that we're hearing at once, which is “online is forever” versus “no, it's really not?”
GRAHAM: Well, a lot of things that we're told aren't true, right? And so, some things are going to persist and other things won't. There's not necessarily a rhyme or reason to it.
Who remembers their MySpace page, for example, or could access it? GeoCities. I mean, there was a time when we thought that's all the rage and these areas, these digital environments, that people were investing so much energy into and would be around forever. Well, no. But — by the way, you can find almost all of the GeoCities from the Internet Archive.
So, yeah, I don't know. It's this — it's kind of funny. We were told that we were building webpages. And, it's almost kind of a cruel joke to think of this metaphorically as a page. You think of a piece of paper — which, actually, paper is pretty persistent. For, not the least of reason is because we tend to make lots of copies of things that are on paper and then spread them around.
So, even if one library burns down, for example, there's often another library, they would have a copy of the book. But these are pages in concept only. I mean, these are bits that live on hard drives that are spinning really fast and sometimes crash. So, yeah, it's, I don't know, it's a bit of a hit or miss, and some things that people think might be around for the millennia, maybe not so much.
Can I tell one particular story?
BRODIE: Of course
GRAHAM: When the Malaysian Airliner was shot down over Ukraine a few years ago, there was a individual who uploaded onto a VK — which is kind of the Russian version of Facebook, it's a social media platform — a claim that they had just shot a plane with a picture of a smoking ruins and the plane in the background. That was taken down within a day, taken offline — probably by the person who posted it after they realized what they had posted and what they had done.
But, not before it had been archived in the Wayback Machine. So, that particular post is still available in the Wayback Machine, and the individual who posted that was one of the three individuals that was found guilty of that crime by the courts last year.
BRODIE: That is Mark Graham, director of the Wayback Machine at the Internet Archive. Mark, thanks for your time. I really appreciate it.
GRAHAM: You're welcome.