AI is transforming content moderation. As systems grow in complexity and scope, we have to ask an important question: What does it mean to build ethical AI when people’s lives are at stake?

We began to explore that question at TrustCon and continued it in our recent webinar, Humans in the Loop: How to Build an Ethical AI Content Moderation Model. Leaders from across the AI ecosystem joined to discuss how to layer human insight and machine efficiency. Here are five takeaways that stood out from the discussion:

1. The goal is partnership, not perfection.

Ethical AI in moderation is not about proving whether humans or machines are better. It is about designing systems where both learn from and oversee each other. This creates the checks and balances that improve safety over time. When human oversight and machine learning evolve together, we move toward continuous improvement.

2. Define “ethical” the same way for people and machines.

Fairness, transparency, and accountability are not standards that apply only to AI models. They should also guide how human moderators make and document decisions. Ethics must live across the entire system. Now, both people and ML models shape how communities experience fairness.

3. Context requires a human touch.

AI can recognize patterns at scale, but it cannot reliably interpret cultural nuance, tone, or community norms. Humans bring that essential context, helping ensure moderation decisions align with lived realities. Context-aware systems matter most when managing sensitive topics, reclaimed language, or evolving subcultures within platforms.

4. Ask: What size of loop do you want the human to be in?

Human-in-the-loop does not mean humans review every post. It means they train and correct systems where human judgment has the greatest impact. The most effective moderation frameworks use human expertise to guide models. This helps identify new harms and shape better decision-making over time.

5. Use AI to make moderation more human.

The goal is not to replace empathy with automation. It is to use automation to remove bureaucracy, reduce burnout, and minimize exposure to harmful content. That allows humans to focus on context, care, and judgment. When technology lightens the load, moderation becomes, in many ways, more human.

AI moderation tools are a must-have.

AI moderation is not just an aspiration for Trust and Safety teams. It is an operational necessity. One solution to add to your workflow is Safer Predict, which uses state-of-the-art machine learning classification models (aka “classifiers”) to predict the likelihood that child sexual abuse material and text-based child sexual exploitation may be present. Safer Predict categorizes content and assigns a risk score based on the likelihood of harmful behavior. This helps teams prioritize and escalate content with precision.

Watch the whole conversation to see how leaders across the field are rethinking AI in their moderation strategies. Together, we can keep people at the heart of online communities.

Transcript

Cassie Coccaro: Okay, I think we're going to start. Welcome, everybody. I'm Cassie Coccaro. I lead communications at Thorn. First and foremost, you might notice that we have an unexpected panelist here. Mike Pappas from Modulate is unfortunately under the weather, so in his place, we have Ken Morino, Modulate's Director of Product. So, welcome, Ken. Thank you so much for filling in, and welcome to all of our panelists.

Cassie Coccaro: We're gonna start today with a little housekeeping. So, before we start, we wanted to survey attendees today to see how you might be utilizing humans in your own review process. So, I'm gonna ask my colleagues to pop that survey up to you right now. It's anonymous. I think it would be great if some of you could fill it out. And while you work on that. I'll also let you know that our panelists today have graciously, agreed to take some questions. So, if you have questions, please put them in the Q&A box. We will get to as many as we can.

Cassie Coccaro: And after… I'm gonna give it a couple more seconds here. I see the survey on my end, and let's see if anybody responds to this, and then we'll get going with the… with the webinar. Okay, I think we can… Let's see if we can see those results, if anybody's filled it out.

Cassie Coccaro: Okay, so we asked you all, how do you currently utilize human review and input in your AI development lifecycle? And it looks like the clear winner here is providing cultural and contextual nuance. So, that's really interesting. Maybe that'll give the panelists a little bit more to work with as we talk through some of this. So, thank you all for participating in that.

Cassie Coccaro: Okay. I think that we're good to start, so now that we've had a little glimpse of that, I'm happy to be here with some powerhouse leaders in trust and safety and ethical AI development to expand upon a topic that these experts here today discussed on a TrustCon panel this year. Some of you may have been there.

Cassie Coccaro: Building and implementing ethical AI for content moderation. So there was a lot of interest in that panel, both at TrustCon and after the fact, and one central theme that stood out and sort of kept nagging at us as a more specific thread we wanted to pull on within it is what human in the loop really means in ethical AI terms for content moderation. So this includes the role of human in the loop in ensuring things like model fairness.

Cassie Coccaro: Catching emerging harms, preserving domain expertise, empowering human moderators, protecting user communities, things like that. So there's clearly a lot to unpack here, and before we dive in, I'm going to ask each of our panelists to do a very quick intro of themselves, who they are, what they do, what they're working on right now. So to save us the awkwardness of who might speak up next, I'll just cue you in no particular order. So, Alice, do you want to go ahead first and introduce yourself?

Alice Goguen Hunsberger: Sure, hi! Thank you so much, first off, for having me. I've been hearing a lot about human in the Loop as well, so, really good to spend a little bit of extra time on this. I'm Alice Hunsberger, I'm Head of Trust and Safety at Musubi, which is a startup providing AI content moderation solutions, some of which are using LLMs, some of which are more traditional machine learning models with human-in-the-loop component. I spend all day, every day, thinking about AI and trust and safety now, which is a really fun place to be, because we're learning so much, and things are changing so quickly. So, yeah, that's… that's where I'm at right now.

Cassie Coccaro: Great, thank you so much. Dave, do you want to jump in next?

Dave Willner: Yeah, absolutely. Hi folks, I'm Dave Willner. I'm a co-founder at a company called Zentropy, which works on LLMs for content labeling, as well as agentic systems to help articulate and improve standards for that content labeling to make, sort of, using LLMs to do that process easier. Historically, I've worked in trust and safety was at OpenAI, running their trust and safety team before this, as well as Airbnb and Facebook and a bunch of places. It's good to meet you all.

Cassie Coccaro: Thank you, Dave. Let's go to our panelists, Ken.

Ken Morino: Hi, my name's Ken, I'm the Director of Product at Modulate. I think we are probably best known for Talksmod, which is a voice moderation platform that's used in a number of different games, from indie to AAA. I have been working with policy leaders, I've been working with our partners, and really shaping, kind of, what is the best ways to deliver value, in an ethical fashion, in AI and the trust and safety space. And I've… I'm really honored to be here with this group, and hope I can fill in for Mike's shoes just a little bit.

Cassie Coccaro: Great, thank you so much for joining us. I know it was a bit last minute, but we're so excited you could be here. And, we'll finish with my colleague, Dr. Rebecca Portnoff.

Dr. Rebecca Portnoff: Thanks, Cassie. Hello, everybody. Thanks so much for joining the conversation today. My name is Rebecca. I am Vice President of Data Science and AI at Thorne, so I lead the team that builds our machine learning and AI solutions to defend children from sexual abuse, and also lead our efforts around our Safety by Design initiative to prevent the misuse of generative AI in facilitating child sexual exploitation and abuse. So I, I spend a lot of my daytime and sometimes my nights when I should be sleeping, thinking about what it means to ethically build and use AI in these kinds of really high-stakes contexts, and I'm thrilled to get a chance to, you know, hear from the thoughts of the panelists and also engage with the folks on the call on this important topic.

Cassie Coccaro: Thank you so much, Rebecca. I think being in this field means we all lose a bit of sleep here and there, thinking about these topics. My last disclaimer before we get going is I think this conversation will work best if we have a lot of different responses from panelists, and this can be a bit conversational. So I'm going to assign you each a question, but please feel free to jump in and comment on anything.

Cassie Coccaro: Let's start by setting the frame here. I'm gonna actually call out Ken first for this, so… Ken, in practice, what does ethical AI and content moderation look like in your experience, and how do principles like fairness, transparency, accountability actually shape your day-to-day decisions?

Ken Morino: Yeah, so, we are a little bit more of a consulting branch. We like to work with, our clients in a partnership, so we're not, ultimately the ones making this decision, but I think we've… in practice, been able to influence a lot of good decisions. So I think when you're talking about the ethical framework, there are really two sides to it that you need to consider. There is the side of how is this going to impact the people who are being moderated, and then how is this going to impact the people who are moderating. And I think it's really important to balance those two different aspects, when you're looking into how are you setting up your policies in practice.

Ken Morino: I think first and foremost, as you mentioned, transparency is incredibly important. You need to have a clear understanding of what's acceptable, what's not acceptable, and then kind of what the consequences are if you do cross those lines.

Ken Morino: And I think it's also equally important to look at, if someone is being sanctioned or punished for something, that they understand why that was unacceptable. It's, I think, been a really big sticking point that we've kind of won some people over with, is once you tell people what they did wrong, they are much less likely to do that thing again. If people feel like they're being unjustly punished or don't understand, kind of, why someone has called them out for something, it can be really problematic. So, where AI fits into that is you need to have a predictable AI system if you're incorporating into moderation. It needs to be explainable, it needs to be verifiable.

Ken Morino: So people can judge the fairness of that and feel like they're getting a fair shake within moderation. On the moderator side, just kind of scratching the surface on this, you need to think about how your use of AI is impacting the moderators. Is it making their experience better, or is it potentially exposing them to more harm? There are certain things that AI is going to do very well, very high probability, so there's a really good chance for harm reduction, but there's also opportunities when there's a lot of gray area to do some harm reduction.

Ken Morino: Things like video, things like speech, have a much deeper emotional impact on moderators. So, if you're talking about ways that you can provide full context for those situations without exposing them to the kind of full range of sensory experiences that go along with that, we find that that can really help a moderator process things in a more appropriate way that allows them to make unbiased judgments in those gray areas, and kind of reduce their harm, reduce their exposure to things that could cause long-term harm for them.

Cassie Coccaro: Thank you so much. Do any of the other panelists want to sort of weigh in on what ethical AI and content moderation looks like in general from your perspectives?

Alice Goguen Hunsberger: I can jump in a little bit, sort of overarching philosophical statement I think when we're talking about emerging technology, especially technology like AI that seems so magical in so many ways, people have really, impossible expectations about what outcomes are going to be. So we see on one side the expectation that AI can solve everything, and make perfect decisions, and…

Alice Goguen Hunsberger: you know, we know that there needs to be humans in the loop, which is why we're all here, so, like, that's on one side of the things. But the other thing, too, is I think there's also sometimes a tendency to think that human decision-making is the absolute end-all, be-all of things, and that human moderators are perfect, and that's not true either. I say this as somebody who spent years as a frontline content moderator. I was erratic with my decision-making, if I was tired, if I was cranky, if I was mad at the world, I would make slightly different, decisions, and they would also be based on my own bias… biases, and

Alice Goguen Hunsberger: The sort of factory model of having thousands and thousands of moderators working really, really fast and making super quick decisions, not having time to think about it, that's not gonna lend to great, unbiased decisions either. And so I think just, like, in general, when we're talking about AI and we're talking about human in the loop.

Alice Goguen Hunsberger: the goal can't be perfection on either side. We can't expect that AI decisions are always going to be perfect. We also can't expect that human decisions are always going to be perfect. It's really about layering them together, creating checks and balances, and kind of figuring out who does what in the best way, and what the trade-offs are between precision, and recall, and accuracy, and latency, and like, how, you know, who's exposed to what, just like Ken was saying. So it's… it's complicated and nuanced, but I just think it's worth

Alice Goguen Hunsberger: saying that at the front of this conversation, because everything, I think, that all of us are gonna say should be taken in the context of, we are striving to make things better over time, and we're learning, and it's not going to be total perfect answer, with any one particular solution. It's always going to be contextual and nuanced.

Cassie Coccaro: Yeah, that is a real important disclaimer. Dave, I saw you come off mute, so go ahead.

Dave Willner: Yeah, just building on both of those themes, and maybe tying them together, I think a lot of the framing from Ken around, sort of, the things you would want to see in a system that was good are exactly right, but it's also

Dave Willner: I find very hard to articulate things that we would want out of an ethical AI moderation system that we would not also want out of an ethical, human-powered moderation system. And so the conversation, to Alice's point, very quickly collapses to, okay, what does a good moderation system that works well, that we feel good about running, or the best one we can design, actually look like, what are its features, and then what are the tools available for us to get there? Whether those things are newer versions of machine learning that we now call AI, traditional machine learning, human labor.

Dave Willner: Like, the sort of distinction, the dichotomy there doesn't really hold up when you start to dig into, okay, how do we want these systems to work, and what makes them virtuous?

Cassie Coccaro: Go ahead, Rebecca.

Dr. Rebecca Portnoff: Thanks, I just want to hop in here as well. I'm, I definitely got a lot of food for thought from what Ken shared, and I think there's one point in particular I wanted to build off around,

Dr. Rebecca Portnoff: perceptions of fairness. I think you made a really excellent point that it's not just about how equitable the developers think that the, like, the particular decision is, but in fact, how users experience that and what their perceptions are. And to me, that really gets at the heart of,

Dr. Rebecca Portnoff: the important and difficult question, which is which humans and at what point in the loop, right? Because fairness typically, you know, from a machine learning perspective, is defined as some metric against which you're looking at your demographic, and you're asking, do I make basically the same amount of mistakes, or basically the same amount of correct decisions across each of these points?

Dr. Rebecca Portnoff: And that's one way that you can plug in a human, but to your point, Ken, that won't necessarily be felt by the actual user, who is also a human, who is part of this moderation loop.

Cassie Coccaro: Yeah, that's great. Let's… let's dig into Human in the Loop a little bit more. So, I think… In some cases, human in the loop is seen as totally essential to ensure safety, to ensure fairness. Some people say that if automation can match or exceed human performance, especially when it comes to harmful or egregious content review, we have a responsibility to reduce human exposure. But I know from TrustCon that each of you think about this a little bit differently.

Cassie Coccaro: So, tell me, how do you each think about the role of humans and content moderation generally as AI capabilities advance, and where does a human fit into that picture as things change in advance? Let's start with Alice on this one.

Alice Goguen Hunsberger: I mean, I think I gave away… maybe I jumped the gun a little bit with my… what I was talking about before, I think… The traditional model of content moderation, people sort of making these individual decisions over and over and over on individual pieces of content is not a useful way for people to use their brilliant human brains. Ideally, when a human is in the loop somewhere, the decisions that they're making, affect everything that's gonna happen in the future, and make it better, so you don't have to make those same kinds of decisions over and over again. That said, then, it comes with a lot more responsibility to get that one decision really, really right, if it's going to be amplified over and over. And so, obviously having people who have time, who have training, who have experience, working makes a ton of sense there.

Alice Goguen Hunsberger: I also think… thinking about building systems, that learn from humans with current events and things that AI might not be trained on yet is important and helpful.

Alice Goguen Hunsberger: Having, you know, having that autonomy to make a decision that affects things, I think, is important, both in terms of, like, making the AI better, but it's also in terms… better in terms of, like, less stressful work. I think one of the worst things about being a moderator is feeling as though what you're doing just, like a drop in an endless ocean of horrible content, and so being able to have, the knowledge that your work is making a difference, that you're really contributing meaningfully, that, I think, is really important. Obviously, that all

Alice Goguen Hunsberger: there's a huge industry with, like, tens of thousands of people who are working these jobs, and that's, like, a pretty serious thing that I think that we all have to think about, and trust and safety, and also think about, like when AI is automating so much of the work that we do, like, how are people going to get the experience that is needed to make these really nuanced decisions that are amplified over and over?

Alice Goguen Hunsberger: how are we going to train junior people? What are the new junior roles going to be? I think all of that stuff is things that we're still working out, but, you know, ultimately, I think, having people as a check and balance against AI, but not, not necessarily, like, just doing the hard stuff. I think AI can also do the hard stuff in a lot of cases, so I think that the escalation model of just send the hard things to the people and they can do it, like, I don't know that that's actually better.

Alice Goguen Hunsberger: Either because of exposure and, just general, like. Humans can't necessarily do that. They can do it better, but you don't want to do it over and over.

Cassie Coccaro: Thank you for that. I'm sure the other panelists have things to say about this. Dave, I wanted to get your perspective here, because if I remember from TrustCon, and please correct me if I'm wrong, one of your main goals is to really figure out how to reduce human exposure, which obviously, like, we should all have that same goal, right? But what is the threshold for saying an AI system's performance is good enough, and how do humans consolidate whether that standard is actually met.

Dave Willner: Yeah, I think, building on some of what Alice said, the question here is, what is the altitude of the loop? That you want humans to be in.

Dave Willner: Right, like, the framing of the question sort of assumes that the smallest micro version of the loop, where it's, like, each individual classification, does that require a human in the loop? My answer is that we already make a lot of classifications where we don't require a human in the loop, and it doesn't seem to me that using LLMs to help us do that actually fundamentally changes how we should think about that, right? Like, we're thinking about precision, we're thinking about recall, and we're thinking about the consequences of being wrong in both directions.

Dave Willner: As well as potentially the need for verification for legal purposes, particularly as it touches on some of the stuff that Thorne works on, right? Like, and I don't know that…

Dave Willner: LLMs being able to work directly from the written text of policies to make decisions, instead of being trained on a bunch of human-labeled data to make decisions, each individual one of which is not necessarily going to be reviewed by a human,

Dave Willner: actually, like, changes how we should think about that, right? Like, we do have the tools to think about what good enough is there. The more interesting question to me, picking up on Alice's theme, about how this changes what the work even looks like is… I suspect, if you sort of buy this idea that we're going to be able to do the individual classifications increasingly adeptly, and I believe

Dave Willner: Probably better than, certainly, people under the actual labor conditions in which this work is done for real, and I suspect, eventually, almost all people, simply from sort of an attention span and a number of variables in mind while deciding point of view, if you sort of buy that that's where this is going, which I think there's strong evidence is the case.

Dave Willner: Then the question becomes, how do you build systems that effectively allow a human to oversee that sort of decision-making system. And what does keeping people meaningfully informed at that sort of altitude of being in the loop look like? I think that looks something more like…

Dave Willner: Like, what we've traditionally thought of as operations management. That you move from a future where humans are both the operators and the people in the sort of hierarchy of an operations team doing the bureaucratic management to a situation where humans are both the content moderation experts, or content subject matter experts, and overseeing, like, a constellation of decision-making systems.

Dave Willner: Again, we already have some parallels here in some of the more anti-spam, anti-fraud investigative work, where our automation for catching those things at scale has been better for a longer period of time, and you do give, sort of, individual human investigators the ability to take mass action on large amounts of content, because we trust the identifications of the individual systems that are classifying particular pieces of content more than we trust those for some of the more subjective arms that we're all tasked to go after. So I think there are parallels here in a lot of places where

Dave Willner: what is good enough is a question where we can kind of reach for analogies, and we're not in a totally new situation. What's new is the domains in which we can operate in that way.

Dave Willner: And the speed at which we can change what we're looking for, which increases the sort of responsibility on the humans piloting those broader systems.

Alice Goguen Hunsberger: I also have the ability to, like, A-B test? Which you can't do when you're training human. Like, if you're using the same analogy, you're an operations manager, you're training a bunch of moderators on a new policy, you're like, okay, here's a new policy, go do it.

Alice Goguen Hunsberger: Then you have to wait for them to make a bunch of decisions, and you have to, like, look and see how well they did it, and then you have to measure how well they did it, and you have to be like, oh, we forgot this educate… you know, it can go on and on and on.

Alice Goguen Hunsberger: With LLMs, you can do all of that ahead of time, before you release a policy. You can do the testing, you can see which model is the best, you can see what language is the best, you can see what examples come up, what edge cases, you can test it on live data, you can play with it much, much more before it goes live, and have much more of a sense of what the impact of a policy change will be before it goes live, which then also lets you know

Alice Goguen Hunsberger: what the impact on the users will be, before you go live, which is really exciting. Like, that black box of, like, what happens after I send my training data to the moderation team was always so frustrating for me, as a manager, and so being able to have those tools and

Alice Goguen Hunsberger: And be part of building tools like that, is really, really exciting, and I think opens up, like, a ton of possibilities, for people in these spaces.

Cassie Coccaro: Great. I want to give Ken and Rebecca a chance to weigh in, and I also just want to remind everybody who's tuning in right now that there is a Q&A box in your Zoom, so if you have specific questions about any of these topics that come up, please do put them in there, and we'll do our best to get to them. But, Rebecca, I see you came off mute.

Dr. Rebecca Portnoff: Yeah, I'm happy to jump in here, in part because I really love this question about what makes something good enough, what makes an AI system good enough. I think it really gets at the heart of a lot of the,

Dr. Rebecca Portnoff: broader conversation on content moderation, where we sometimes can end up talking past each other, is good enough something that's based on the previous baseline, like your… is it… as long as your model is better than the best version that's out there in the market, or better than the version that you've been using, is that good enough? Is it based on unacceptable outcomes, like false positives that derail victim identification? Is it based on minimizing content exposure to trust and safety teams, or just

Dr. Rebecca Portnoff: In general, trying to minimize content exposure, there's so many different variables and dials that we can be considering here. I know that, for me, it's really important to understand both

Dr. Rebecca Portnoff: good enough, and in general, what Human in the Loop looks like from that holistic systems perspective. So whether or not a technology is good enough, it really depends on what else is in place in your AI system. So, do you have wellness services offered to your staff? Is there opportunity for redressing or contesting a decision? Do you have, like, how often or in what capacity does repeat exposure occur? Do you have the same teams that own the initial decision and

Dr. Rebecca Portnoff: The ones that follow up for redress. These are all things that really inform whether or not a system is good enough is how it slots in, or how it holistically fits with the rest of what's operating, as opposed to the technology itself. You can't answer that question without looking at the entire order of operations.

Ken Morino: Yeah, I really want to second something that Rebecca just said there, because I think it's really, really important to look at the context. I think that we have seen a leap in a lot of the tools, but we really need to look at individual circumstances and resources available. What is the impact of a false positive going to make on an individual, on a community, and is that an acceptable risk, or is it not an acceptable risk?

Ken Morino: I think it's going to be a very individualized decision based on the community that you're operating within, and no one's going to know that community better than the community managers, the moderators that are actually looking after that community, understanding what the goals are, what risks they're willing to take, and what risks they're not willing to take. You could have something that has an incredibly low false positive rate.

Ken Morino: But there could be an extraordinary risk. If you're talking about, adult situations where certain things are completely acceptable and other things are absolute red lines you can't cross, even if a system is operating at a 97% or higher percent risk of those red lines, you may not think that it's worth risking it. So I think you need to make the calculus of what are you optimizing for.

Ken Morino: What systems are in place, and what's the impact of something being incorrectly identified, or something being correctly identified too quickly, too slowly, and kind of what that entire picture looks like before you can say, okay, this is the right thing for us to do.

Ken Morino: It is, I think, definitely, to Alice and Dave's point, really important to maintain that trust, and it's really important to continually evaluate and identify ways you can improve. You can't ever say, okay, we're great here, we don't have to change. It's always going to be a continuous iteration process to get better, to do better, because there's always space for that, but you really have to look at your circumstances and figure out what's going to work best for you based on your policies. And the things that you're guarding against, and things that you're protecting.

Dave Willner: Building quickly on a point Rebecca made about, sort of, viewing this as a holistic system. It's easy when we talk about, sort of, content moderation and AI to have the, sort of, AI plug-in piece of this zoom into the decision-making part of the process, but I… and I've made this point before publicly, I think we're also in a time where there's significant change happening in parallel systems like customer service delivery, where language models and the capability of language models are changing

Dave Willner: how quickly we can communicate with people in sort of reasonable, warm, somewhat effective, adaptive ways compared to, like, traditional chatbots or IVR systems. And so when you're thinking about this question of AI and ethical AI in content moderation, don't just think about it as a… as intersecting with how the decision about whether or not something fits into some policy category gets made, that part's very important.

Dave Willner: But also, how we record the reasoning, how we're able to communicate that reasoning, what the, frankly, the user experience of encountering a moderation episode even feels like. All of those things are… Open to redesign in this moment, because having access to machinery that can do a much better job of engaging in dialogue sort of changes the constraints that we've traditionally had in terms of how we can even design any of those systems. And so, to me, the question actually ends up becoming even broader than just classification, and goes to this broader question of how is this entire

Dave Willner: Experience of what this looks like going to change over the next… Five-ish years, as we all figure out how to best employ these things.

Cassie Coccaro: Thank you so much, everyone. So much to think about there. I want to change topics just a little bit, and…

Cassie Coccaro: pull on something else that was really interesting to me last time you all talked about this, which is bias within data. And Alice, I'm going to start with you, because I remember you having a lot to say on this last time. So my question for you, and then of course everybody else can follow, is how do or can humans help identify bias in training data and guide systems to better represent vulnerable or marginalized communities or groups who are disproportionately harmed online?

Alice Goguen Hunsberger: Yeah, I have a lot to say. I'm sure Dave does too, because he sort of dealt with some of this from the inside.

Alice Goguen Hunsberger: he hired me as a freelancer to red team for, dbt 3 before it came out, so I got to play with, like, figuring out what is the most horrible thing that I can make this say, and so that kind of testing is a really good tactic.

Alice Goguen Hunsberger: I think there's a couple things when thinking about AI bias, and when thinking about it in terms of using it for, content moderation and sort of labeling and decision making, there's always going to be some kind of

Alice Goguen Hunsberger: built-in bias or point of view based on the data that a model is trained on, just because humans are biased, and, you know, it's just going to reflect the data that it has. But…

Alice Goguen Hunsberger: They're also… LLMs, by definition, are steerable, and so you can do a test, and you can sort of say.

Alice Goguen Hunsberger: Decide whether this… policy is hate speech or not, and not give it any further instructions, and I think there was actually, like, an academic research paper on this that sort of showed, like, what the different levels of bias were in terms of, like, all the different foundational models, and some were more biased than others, against different groups when making moderation decisions. But that was, like, very, very broad prompting, and you can get

Alice Goguen Hunsberger: extremely detailed and extremely, like, this is what bias means to us. Like, this is what hate speech means, these are all the examples, this is what counts, this is what doesn't, here's the reclaimed language that our particular community uses, this is the way they talk to each other, these are the things to look out for, and get incredibly, incredibly granular, and that

Alice Goguen Hunsberger: kind of steering of the model is something that's never possible with, like, fixed ML models, because there's so many different permutations that can come up.

Alice Goguen Hunsberger: So, it's really, really, exciting, because I think with prompting, you can actually flip around, a lot of those and sort of define exactly what your community needs.

Alice Goguen Hunsberger: But in order to get that specific with all of your policies, you need a lot of examples, you need to talk to a lot of your users, you need to really understand your community, you need to know what they think about, what they care about, what too much means, what not, you know, where the line is, and there are all of these trade-offs between safety and expression, and how much should people be able to say, and those are, like.

Alice Goguen Hunsberger: traditional… trust and safety problems that we've had forever, but through the act of having to get really precise with your policies in a way that I think people often don't when they're writing a policy for a human moderation team, you really have to think about

Alice Goguen Hunsberger: Your values, and your community, and, like, all of those things that, maybe people could afford to be a little bit more wishy-washy about.

Alice Goguen Hunsberger: Before, so it's… it's a really, like, exciting opportunity, I think.

Alice Goguen Hunsberger: To combat bias in your classification and your labeling. And it's also, obviously you want to be really, really aware of the models that you're using and how they work, and what assumptions they're going to make, so that you can counteract that if you need to.

Dr. Rebecca Portnoff: If I can hop in here and just build onto that, because Alice, definitely agree with what you're sharing about the opportunity in that evaluation stage and in that iteration stage to keep, like, refining how your models are performing specifically against these values that are core to folks. I don't want us to lose sight, though, on, like, the importance of

Dr. Rebecca Portnoff: doing that from the get-go at that training stage level, I think, not to be a broken record here, but it really is that systems approach, that if you're just focusing on the evaluation side, you've missed an opportunity to, like, build it correctly from the get-go, and I think there's a lot that can be done, even at that initial data curation and collection stage, to

Dr. Rebecca Portnoff: make challenging and important decisions around what that looks like, so that it's accurately performing against the task that you have in a way that reflects the values that we care about. Even something as simple as, well, are you trying to build a multilingual model? Then you probably shouldn't just have English in your training data set. That these are, like, really practical decisions that I think have a lot of important downstream consequence.

Dave Willner: Yeah, building on both of those points, I think we're… and, you know, I… transparently, some of this is very specifically related to where my company works, so I've got a point of view here that we're sort of…

Dave Willner: fairly firmly committed to, but I… I think we're right. The… the sort of training for not just bias, but for performance in particular. And if you make a move from thinking of this as training against bias, which is important, to more holistically training the system to be specifically fit for the particular purpose you're gonna put it to.

Dave Willner: We're in a world right now where a lot of the Frontier Foundation models, which are very impressive and can do a lot, are sort of trained to be pretty good at a lot of things, and that is part of how they become so powerful.

Dave Willner: But it also means they're not… when you really push them as an expert, sort of to the edges, they're not excellent at some of these very specific, persnickety things, absent special purpose training. And…

Dave Willner: Some of the stuff we've worked on has been basically curating very high-quality data sets, specifically to try to teach the skill of faithful

Dave Willner: Policy text interpretation and application, and to Rebecca's point, while some of that is translatable across language, some of it's not, right? And so it really emphasized, at least for me, the need for taking systems that are

Dave Willner: you know, pretty good at a general purpose level, and for these situations where, to some of Ken's points earlier, the stakes are very, very high, and actually putting in the work to curating the training data to get them to be really good at the specific thing that we are asking them to do in a particular instance makes a really big difference in terms of both the performance of the models.

Dave Willner: And how well they performed at a given size, which ends up being important, because having smaller models being good at some of these tasks

Dave Willner: changes to, sort of, Alice's point, who is able to engage in running these systems by bringing down the cost, by bringing down the hardware that you need to run these things. And so, there's, like, a broader…

Dave Willner: interesting theme here, where this technology, yes, changes how we do some of the individual, like, unit tasks we've been doing, but it also changes

Dave Willner: who can afford the bureaucratic? Who can afford to do bureaucracy?

Dave Willner: Right? Where when it previously required regimenting very large numbers of humans in very expensive ways, if you can do something that resembles what we previously did with people in a controllable way.

Dave Willner: That is provably trained to be fit for purpose, and it's very, very small, and doesn't involve either traumatizing people or spending a ton of money. You've then changed who can have the responsibility of moderating what size of space.

Dave Willner: Which if we are able to train for, you know, less well-supported languages, more marginal community needs, is actually potentially really cool, but it is, to Rebecca's point, gonna require us to put a lot of work in to getting that all to work up front. It's not just gonna kind of happen, because LLMs are magic.

Ken Morino: Yeah, I kind of want to speak to this domain concept a little bit more, because I do think getting that domain baseline correct is really important, but I almost want to talk about how things evolve in communities. People will set out to build a community, and the community they end up with is not always what they had in mind when they got started, and that can be a good thing, and that can be a bad thing.

Ken Morino: But I think you really need to account for not just, kind of, you know, what are the expected targets, what is the

Ken Morino: the languages, what is the domain that we are trading off against, but what is… how do we account for the culture and the subcultures, and how the different aspects of people speaking different languages from being different regions of the world are going to interact and kind of foment into this new subculture? And then how do we optimize for that experience, and how do we evolve the tools that we've set out

Ken Morino: for this ideal community and adapt them to the community that we've ended up with. And I think part of that is you have to have a community-in-the-loop aspect.

Ken Morino: And understand how you can tie that back into your training, how you can make sure your domain and your training regimen can adapt to the community that you have, not the community that you thought you were gonna have.

Cassie Coccaro: Thank you all so much for that. Ken, since you're with us right now, I want to stick with you, just given the nature of the work you all do at Modulate.

Cassie Coccaro: I'm curious about, how we can or when we should rely on humans to interpret context that AI systems may struggle with. So, like, distinguishing between playful banter and targeted harassment in voice or text, for example.

Ken Morino: Yeah, I think, AI is very good at, kind of picking out specific patterns, but it may not understand when those same patterns can have very different meanings depending on the context. So I think that the most important thing in those circumstances is to recognize, when there's a higher probability that those patterns are going to potentially fall into a gray area, and then provide as much of a context

Ken Morino: window as you can

Ken Morino: So that includes things like who's participating, what are the policies that are enforced in that area you're participating in, and then, you know, what is the probability that other people who are not directly involved in this engagement can participate, and then what impact could that have?

Ken Morino: And then looking at how people react in that scenario. So if you have somebody make a comment that could be seen in a negative or a positive light, sometimes the best thing to do is say, okay, this person said this comment, and these are how the people reacted around it. The AI may not be trained specifically to say, okay, we know

Ken Morino: that, you know, half the people laughing and half the people being silent is good or bad, but we can provide that context to a moderator, and they can now say, I have a better situational context of what was said, and how people reacted, and make a judgment based on that, that, and then inform AI for future cases as well.

Cassie Coccaro: Okay, awesome. Thank you so much. If no one else has comments on that, I want to turn for a moment to how we catch and respond to emerging and new harms, which I know is on our minds all the time. So, we know that AI systems degrade over time, and at the same time, new and emerging harms are constantly evolving and improving. So, I'm just going to open this up to all of you. How can we ensure that humans can continue to help spot new harms and adversarial behavior before they overwhelm automated systems, or to the opposite end.

Cassie Coccaro: How can we effectively use AI to help us here?

Dr. Rebecca Portnoff: I'll jump in here, if that's okay, in part because I want to build off of a thought you just shared, Ken, around the importance of

Dr. Rebecca Portnoff: context, and specifically the role that, like, moderators in community or folks who are in community and how they respond can play for this. My short answer for how we effectively deal with these systems degrading over time is work with other people. The longer one is, you know, somebody, somewhere, whether or not someone on your team or someone not on your team, knows before you do what new harms are emerging. And so the kind of, you know.

Dr. Rebecca Portnoff: investigation, being involved in the community, that intelligence gathering, that research, all of it is really critical to staying up to date, and then, you know, giving that input and feedback back into models to make sure that they're maintaining or improving performance. And so it's really that kind of, like, monitoring is not the right word to use here, but in general, that real, deep understanding and deep visibility into how folks are interacting with your system and where it's working and where it's failing.

Dr. Rebecca Portnoff: getting those feedback loops from users, you know, letting you know where you missed the mark and where you did well. Even things like anomaly detection systems that let you know if there seems like there's something shifting in your underlying data distribution, these all come down to paying attention and working with other people.

Ken Morino: Yeah, to kind of piggyback off that, the thing that we like to talk about is validation loops. How can you get a wide and diverse set of validation loops that can help keep you informed of, you know, what is the truth on the ground looking like right now? That can be things like having moderators randomly sampling different pieces of data, validating that the things that the AI is making determinations on are

Ken Morino: correct. That can mean, looking at appeal systems, having forums where you're talking to your community members.

Ken Morino: And having them tell you what they think about judgments, what they think about policies, if they're reacting to specific events, and how they think you did about that. So you can kind of get a good gauge and a consistent gauge over time on

Ken Morino: are you doing the right thing? Do you think that people think that you're doing the right thing? Because you could be doing the right thing, and if you're not impressing upon people what you're doing, they could have a very wrong interpretation of that. So I think validation, and then making sure that you're providing what you're doing with that validation back to the community is a really important way to reinforce that.

Dave Willner: Piggybacking here quickly, this is, I think, actually another place where the sort of… my prior point that AI can change some of how we think about design constraints in all of these systems is potentially really interesting, right? Because

Dave Willner: Particularly for very scaled social platforms.

Dave Willner: the ability to give input in a way that is genuinely meaningful is honestly very limited a lot of the time, because companies, given the scale that they're working at, tend to shut down avenues for, like, real dialogue. Maybe at this point you have DSA-compliant appeals surfaces, but even that is being fed into a sort of giant decision-making

Dave Willner: bureaucracy, for lack of a better way of putting it, and isn't

Dave Willner: really fine-tuned to be sensitive at, like, hearing what people are saying at scale. This is not a figured-out thing, it is not just going to happen. But again, if you think about part of good trust and safety.

Dave Willner: Design and delivery as also a service experience.

Dave Willner: Then, sort of, the evolution in systems that is also being driven by the rise of large language models to do a at least better job of at-scale

Dave Willner: Elicitation and understanding of sentiment and giving people chances to voice in an at least somewhat emotionally satisfying way, their more nuanced and discussed objections to a process they've been a part of.

Dave Willner: is potentially really exciting. Like, that's… this is a, like, farther out has to get figured out thing, but on this sort of point of how do you detect harms that you are maybe missing.

Dave Willner: Part of that is just working on improving your listening systems, and there, too.

Dave Willner: we are potentially in a world where the constraints that we've been under, particularly at hyperscale, are changing. I hope for the better.

Alice Goguen Hunsberger: Yeah, just to add to that, I was head of trust and safety at Grindr, and also OkCupid, so two big dating platforms. I was also head of customer support, so any customer complaint that came in at all, came through my team, whether it was Trust and Safety or not, and users love to tell you what they think.

Alice Goguen Hunsberger: The problem becomes also figuring out, is this something that matters to everybody?

Alice Goguen Hunsberger: When two people tell you opposite things about what they care about, half the people are like, don't take down posts like this, it's censorship, and the other half are like, this is terrible, take it down. You have to, you know, figure out the bureaucracy of how to deal with that, and you have to see what is the prevalence, what is the importance. That all gets really messy when you are dealing with user feedback, because users

Alice Goguen Hunsberger: Are not always the most,

Alice Goguen Hunsberger: accurate with, what is a big deal to them, and what is a big deal overall on the platform, but I totally agree that, like.

Alice Goguen Hunsberger: AI, as much as it sounds like it is this inhuman layer that is going to make decisions on behalf of humans and, you know, just be our, like, robot overlords, I think it actually is enabling us to get back to basics and get back to the things that really matter, which is, like, connecting with your community, listening to what's important, thinking about their experiences and how they're affected, and then looking

Alice Goguen Hunsberger: at the data and seeing, you know, where things are really going wrong, and all of that was sort of this impossible black box, historically, for trust and safety leaders, and is slowly becoming, like, a clear cube. So, it's very helpful and exciting time to think about that stuff.

Ken Morino: Yeah, if I can just kind of piggyback off one thing that you said, I think there's a huge difference between, getting a complaint or getting a comment and saying, okay, that's the thing that we need to do, versus taking that within the context, understanding

Ken Morino: that they are representing a specific point of view, and if you can contextualize that, things like AI systems that are emerging now allow you to take the context, take the volume, look at, kind of, the statistical

Ken Morino: probability that, you know, that's a large faction that's represented in that comment, or maybe it's a smaller faction, but still taking that data point in there and figuring out, not necessarily, oh, I'm gonna do the thing that they said or not do the thing that they said, but

Ken Morino: Contextualize it and say, how can we get value from this point and put forward, you know, the best interest of our platform?

Dave Willner: quickly, drawing out a specific thing that Alice had said.

Dave Willner: this concern about the sort of AI system being an inhuman way of doing this, and I think sort of the root of my perspective on all of this is that the systems we have today are inhuman.

Dave Willner: They're just inhuman.

Dave Willner: Through, like, nightmare bureaucracy, instead of inhuman through

Dave Willner: AI. And the question to me is, like, where can we actually use these technologies that are not human to make what is already a very inhuman and alienating system feel less like it is an inhuman system by potentially not relying on humans in critical links where we've

Dave Willner: really had no choice in the past, which is different than that being the best answer. It was simply the only answer.

Alice Goguen Hunsberger: And for moderators as well as for users, like, it's on both sides of the line.

Cassie Coccaro: Thank you so much. Well, silly me as moderator, I thought we wouldn't get to all of our questions, but we are not even halfway through, and we have so much more to say, so we might need a part two of this, but, we did get a couple of audience questions, so I'm going to sort of wrap up my portion here and ask you all to respond.

Cassie Coccaro: to my final question, but I also want to remind the audience that if you do have any questions, we still have time to take a couple more, I think, so feel free to put any in the box that you want to. Okay, for the group, so as AI capabilities grow.

Cassie Coccaro: How do you see the role of human moderation evolving, and what are the risks you're most concerned about specifically, and what are you most hopeful for?

Cassie Coccaro: Anyone want to go first? I'll call on someone if not.

Cassie Coccaro: Rebecca, you want to jump in?

Dr. Rebecca Portnoff: Sure, I can volunteer here.

Dr. Rebecca Portnoff: You know, I think, candidly, my hope is that we get to a place where we have to do a lot less content moderation, period, and it's not because LLMs are magic. I'd like to go on the record officially and say they are not. And it's more because I'd love to see us continue to…

Dr. Rebecca Portnoff: adopt a posture where we are designing products, platforms, and technologies in a way that prioritizes preventing these harms in the first place, that is considering what it looks like to shape spaces that do uplift people and kids. Like, noting that, you know, really great points were made earlier about how you can have an intention with a community, and then that can go sideways when it comes to how that actually materializes, that, you know, there's that reality that there will always be offenders that are

Dr. Rebecca Portnoff: motivated to do harm as well, but I really do not think it's inevitable that we need to be designing platforms for,

Dr. Rebecca Portnoff: you know, the kind of metrics that, at the end of the day, are pretty meaningless, like metrics around engagement and metrics around clicks and eyeballs, that there are better things that we can optimize for, and I get a lot of hope when I hear about work that people are driving to pursue that, to pursue that kind of

Dr. Rebecca Portnoff: Fundamental, preventative attitude that rejects some of these existing paradigms and really lifts up those efforts as a first-class citizen in, trust and safety.

Alice Goguen Hunsberger: Can you build on that, and flip it to the, sort of, how the role of humans in moderation are evolving.

Alice Goguen Hunsberger: You know, for so long, trust and safety has been seen as this industry that is…

Alice Goguen Hunsberger: either, like, a back office cost center that Novi in the business takes seriously and tries to spend as little money as possible on, or it's seen as this, like, industrial censorship

Alice Goguen Hunsberger: Kind of deal. And,

Alice Goguen Hunsberger: it doesn't have to be either of those, you know, either. And the people who work in content moderation, and trust and safety more broadly really, really care. Like, all of us

Alice Goguen Hunsberger: Do this work, because we want the world to be a better place, and…

Alice Goguen Hunsberger: My hope is that, you know, with new tools, with new insights, with the ability to look at more, analyze more, step back a little bit, be, more strategic about the way that we're doing this work, it also will sort of uplift the industry as a whole, and the way that we function in, in business, and also the way that we

Alice Goguen Hunsberger: We treat our people, especially in the front lines, who are doing this work.

Alice Goguen Hunsberger: I don't know. That… that might also be totally pie in the sky, but, you know, that's… that's my hope for where all of this is heading on the… on the human side of things.

Dave Willner: Do you want some pessimism? I think there's a request for fear, as well.

Cassie Coccaro: Go for it. Share some.

Dave Willner: Cool. No, I am generally very optimistic, actually. I hope that's come through and echo a lot of what was just said. I think…

Dave Willner: A thing I worry about here is that

Dave Willner: while I do… while I think that the characterization of trust and safety and censorship is Like, deeply incorrect.

Dave Willner: I do think it is true that the techniques for better identification of what things say is a dual-purpose technology. And so there is, like, a…

Dave Willner: an interesting question here, less, I actually think, at the level of the industry, and more at the question of how the industry encounters government regulation that could potentially get kind of interesting, where…

Dave Willner: there's, like, some amount of freedom that has existed on the internet that is downstream of the fact that we're also kind of terrible at content moderation, and we are, or historically have been, and we are, I think, collectively very focused on becoming better at content moderation, because us being terrible at it has also enabled a bunch of harm.

Dave Willner: But also being better at this…

Dave Willner: potentially has costs if it's directed for ill. I don't…

Dave Willner: think that can really be escaped, right? That just becomes a, like.

Dave Willner: Bad people with power are scary.

Dave Willner: is essentially what I'm saying, but, like, in this context of us improving our ability to do this.

Dave Willner: That is a potential hazard to watch for, I think.

Cassie Coccaro: Thank you. Ken, any final thoughts on that?

Ken Morino: Yeah, I think, kind of, to go off what David said, I think there's opportunity and risk here. I think that there's really opportunity to establish community guidelines and actually enforce them in an appropriate way. I think there is opportunity for people to focus on the nuance and kind of understand how they can do harm reduction as opposed to policing.

Ken Morino: But I do also think that there's a risk where there's more visibility, there's risk against usage against people's privacies.

Ken Morino: I do think that there is a risk that people are, accidentally, being punitive against groups when they potentially don't mean to because of introducing bias, so I think…

Ken Morino: that whatever you do, you have to be cautious, and you have to always be evaluating, you know, are the consequences what we intend them to be? And are we doing more good than harm with all situations that we're looking into? So…

Cassie Coccaro: All right, thank you so much. So, we have about 5 minutes, we got 2 questions, so I'm just going to select one, and if any of the panelists feel inclined to jump in and answer it first, I'd be grateful. This question says, I was wondering to what extent AI can really be trained to keep up with user slang that keeps changing, evolving, or circumvention labels

Cassie Coccaro: That malicious actors might use to circumvent AI detection, and also, will AI without human in the loop ever be able to identify dark humor or sarcasm? Anybody have thoughts on that?

Ken Morino: I can jump in on this one. So, I would say that, there are a couple different techniques that we've used, successfully,

Ken Morino: And some techniques that are a little bit trickier, and maybe don't have the level of precision that we'd like to see. But there is possibility here. I don't think it's going to be foolproof, I don't think it's going to be 100%. When you're talking about new slang, oftentimes it's not what people are saying, but how people are reacting. There's always going to be a curve that you're going to fall behind, but you're also looking at your community, and if

Ken Morino: People in a group generally find something to be unacceptable.

Ken Morino: then that's worth looking into, and then you can adapt to that. In terms of sarcasm, dark humor, I think there's kind of… it's a little bit more nuanced than that. I think that there is the, are they being sarcastic or dark humor, and is that still causing harm? What is the impact of that? We could say that, yes, this person's being sarcastic, or they're just joking around, but they could be

Ken Morino: doing something in an… doing it in an unacceptable way, so that may not be acceptable, even if it is harm. So I think just saying, oh, it's sarcasm, we should not do anything about that, is missing the mark. And so I think it's more about the impact, what's the potential harm that's being caused there, as opposed to, are we getting all the signals and nuances there?

Dr. Rebecca Portnoff: I can add a thought to that, and sorry, Dave, didn't mean to cut you off. I just… again, I definitely resonate with, in particular, the point on there's, like, there's always a curve that you can fall behind, but what does it look like to keep rapidly iterating in response to changing dynamics? That, to me, that, that requires looking at, sort of, where the,

Dr. Rebecca Portnoff: what's the right way to put this? Like, where the areas of slowdown are from when you are building the model to when you are shipping the model, and then when you're iterating on it. So, for example, that handoff portion where you've built the model, but it actually has to get deployed into environments, does that take 6 months? Does that take 2 days? Like, how efficient are you in that process? And that's going to really make a big difference downstream, and whether or not your users are, like.

Dr. Rebecca Portnoff: This is slang that I've been using for 3 years already, and only now you're starting to notice it, and so I just wanted to unpack that point a little bit more.

Dave Willner: this is, I'd say this, this question also gets at, sort of, the… some of the starting frame, we talked about, which is as opposed to what, right? Like, yes, these two things are challenges. Also, humans at scale are extraordinarily bad at both of these. And so, the question isn't…

Dave Willner: is AI going to magically solve it? It's where can we potentially use it to make incremental progress, and really, how much are we losing, right? I think the funniest document I've ever written in my entire career was a detailed guide for how to tell if someone was trying to be funny. Like, a completely straight-faced, flow chart-oriented, is this an attempt at a joke document, which I think might be the funniest document that can exist on a meta level.

Dave Willner: And we did that because the human moderators we were having make that assessment, like, weren't on the same page with each other about whether or not things were attempts at jokes. So it's… it is a hard analysis to do, but it is…

Dave Willner: Not actually that novel, that it's a hard analysis to do, and is something that, particularly with the largest foundation models, you can do reasonably well, although not perfectly. On the sort of slang thing, this is another place where the shift in paradigm is interesting, right? So, with a black box model where you're training it on a bunch of decisions, that's… it's a slang evolution is a total nightmare.

Dave Willner: If, instead, you're training special purpose language models to do a good job of following directions you give them. You could train for, like, by the way, check this list of words, here's what they mean.And then… getting up-to-date on slang becomes as simple as changing the sort of resources that you're exposing to the model trained to care about that language. So there's… there's, again, places here where we may have different design opportunities open up to us.

Cassie Coccaro: Thank you so much, and I think, Dave, you should volunteer to share that guide when we share the recording of the webinar, so everyone can read it.

Cassie Coccaro: Okay, well, I really just want to thank all of you, all of our panelists, for diving into this topic yet again with us. I think we could have gone on for certainly another 30 minutes, if not longer, but…

Cassie Coccaro: Clearly, it's a topic people care about, digging deep into, so I know this isn't the end.

Cassie Coccaro: Thank you, everyone who attended today. Before you jump off, if you're still here, we're going to share a quick exit poll that I hope you'll answer really quickly. It'll help us at Thorne continue to make webinars that are useful for the trust and safety world at large. And as a reminder, this recording will be sent to participants or anybody who registered, so keep your eye on your inbox for that and any future events that might be relevant to you. And just thank you all so much for being here. I appreciate it.