When you visit pretty much any website, you’re asked to accept that site’s cookie policy: cookies basically allow sites to use data to remember you for future visits. And they’re just one example of the kind of data we all leave behind when we’re online. Increasingly, it seems Americans are kind of resigned to the fact that some amount of our personal data is out there, and that there’s not really all that much we can do about it.
A study from Pew Research Center done a couple of years ago finds about 67% of respondents don’t actually understand what companies are doing with their data. And more than 70% believe that even if they did have a better sense of it, they have no control over what either companies or the government do with their data.
To talk more about this, The Show spoke with Eileen Guo, senior reporter at MIT Technology Review, who’s written about this issue.
Full conversation
EILEEN GUO: Yes, I do think there’s a lot of wariness about our data, our privacy and who has it. And it does not make my job very fun at times, because a lot of what I do is find out when there are more privacy violations than we knew about.
MARK BRODIE: Well, so how right are we to be resigned about that? How right are we that some amount of our personal data is out there in places maybe we would not like it to be?
GUO: I think we should be very worried about our data being out in all sorts of places. I don’t know that we should be resigned. I think there are steps that we can take to protect ourselves, and there are also advocacy actions that we can take to try to fight for better data protection.
BRODIE: What kinds of things can we as individuals do?
GUO: Yeah. That is the question. I think first of all, everyone has different risk profiles. So that’s something to keep in mind. And I think one of the challenges is that people also think about privacy differently. Some some folks are like, “Well, everything that I’ve got is fine for the entire world to know.”
And there are other people that perhaps because of the jobs that they do or their identities or different factors like that are rightly more concerned about privacy. So I think the steps that you can take from going through all of the privacy settings that are now on most phones, that’s a basic.
Really thinking about what kind of smart devices you have in your homes, just not accepting the standard settings, I would say, is a really good starting point.
BRODIE: And in terms of policy, as you’ve reported, it seems as though right now anyway, it’s more of a state-by-state kind of thing than a federal overarching bill. I wonder if folks in the tech world think that’s the right approach, or if it might be more effective to have a nationwide law, as opposed to a state-by-state effort?
GUO: Yeah, I think the general consensus is actually that there should absolutely be a federal privacy bill. And there have been increasing efforts to get there. But, it’s really challenging. There’s so many different actors that want different things. And Congress is, you know, Congress is going through some things right now. And so I think people don’t expect that that federal privacy bill is going to be coming anytime soon.
BRODIE: Do those state laws tend to be effective in doing what they are intending to do?
GUO: I think the state laws differ quite a bit between them. But I do think that they are — first of all, maybe this is obvious — but they’re better than not having the state laws there. I think the other thing that is noteworthy about state laws is that tech companies don’t want to create and manage different services and different levels of privacy protection for 50 different states.
And of course, we don’t have 50 different privacy laws, but what that tends to mean is that they will often change their settings to make sure that they’re compliant with the most privacy protecting law of the states. And so the great thing is that if California has a strong privacy protection — and it does; it’s not perfect, but it’s good — that also actually benefits the rest of the country.
BRODIE: So how prevalent is this issue in terms of the ability of tech companies to get and store and use data? And I ask because you go to most websites, and there’s some question about cookies or something. I mean, when I went to read one of your pieces on the MIT Technology Review, the website asked how I felt about, you know, the Cookie policy that they were asking about.
So it seems as though, like, you can’t really do anything online without having your data at least somewhat exposed, if that’s the right word.
GUO: Yeah, I think that’s right. And I think the question kind of depends on what you mean by your data, right? Is it because data means so many things? Data is cookies on websites, as we’re talking about right now. So that’s what those privacy little click-throughs are are asking for. But data is also so there’s metadata of the information, the photos that we upload, for example, to Instagram or to Facebook.
I recently just published an article on how some of the largest AI training sets have trained on massive amounts of user data, everything from photos of driver’s licenses to resumes to everything that’s been uploaded on the internet like 10-15 years ago, before we even knew that there was such a thing as generative AI.
So there’s a lot of different types of data, and it depends on what kind we’re talking about.
BRODIE: Well, it seems as though — and you alluded to this, and you obviously wrote about this — AI seems to have almost supercharged this question a little bit because in a sense, even if you are able to get some of your photos or other data off the internet somehow, once it’s in the generative AI models, like it’s almost too late.
GUO: Yeah, that’s right. What I had written about was some recent research that was just published that found that users who had uploaded a resume or a driver’s license or some kind of identity documents — sometimes just on a cloud hosting website, with the idea that only the people that I share this with are going to be able to see it — that information is hard to access unless you have the link.
But generative AI training sets use massive scraped data from across the web. And so what these scraping programs are able to get is very different from what humans are able to search for or perhaps access easily. And so what the research found was that there were cases where someone might upload to one specific website, but a web scraper then scraped that website, uploaded it to another hosting site.
And so suddenly, even if you want to delete your data from that original site that you uploaded to, you can’t because it’s in five other places, as well as these massive training sets that are being used to train AI.
We were talking earlier about the role that state privacy law has on really all of us, but I think the other really important piece here is that there are really divergent paths that the United States is taking when compared to Europe, for example, which has just much stronger privacy protections.
And what you’re starting to see in Europe is just a lot more privacy protections that in the United States, we take a certain amount of our data being used as just the price of using these goods and services, these tech services. But I think Europe is really showing us that we don’t have to take those data privacy violations as the standard. There is a better way.