The product manager’s guide to choosing content moderation solutions

Are you a product manager (PM)? Does your platform host user-generated content (UGC) OR a generative engine? Is your platform starting to feel like the Wild West?

If you answered "yes" to any of the above, you’re in the right place. Let’s chat about why you need content moderation, and how to choose the right solution for your product. Want to skip the guide and jump right to evaluating solutions? Go to the bottom of the page for a business case and scorecard template.

Content moderation is a problem before it’s a problem

With UGC and AI-generated content comes creativity, community and risk.

  • Where there are compliments there is a hate speech
  • When facts are shared they are countered with misinformation
  • Within each community there are spams and scams
  • And where there are children someone is seeking to exploit or harm them

You might be thinking, “Not on our platform! We have nice users sharing pictures of dogs.” That might be true, but if you don’t have proactive content moderation processes in place, how would you really know? Any platform that hosts user-generated content could be at risk of hosting harmful content.

Even in the early stages of development, it’s wise to plan for addressing toxic content, because at some point, it will surface. In 2023 alone, more than 104 million files related to CSAM were reported by registered electronic service providers.

Depending on the type of toxic content, you may have legal obligations for reporting that content. In the case of child sexual abuse material (CSAM), you’re legally required to review suspected content and report content violations to your local reporting agency. For many platforms, that’s The National Center for Missing and Exploited Children (NCMEC).

If you’re in the EU, you’re required by the Digital Services Act (DSA) to remove illegal content promptly and provide transparency reports. Ignoring these obligations can result in hefty fines, lawsuits, or platform bans. It also puts users in danger and, in cases of CSAM, further harms survivors.

Crafting your platform’s approach to content moderation

There are many considerations when it comes to content moderation. Triaging and escalating user reports; establishing community guidelines; selecting programmatic or AI tools for content detection; and establishing a team of moderators—just to name a few.

When it comes to selecting content moderation tools, you’ll first want to establish what kind of questions you need to ask and who should be involved?

What are our policies?

Other teams to include: Legal, Policy, Product Policy, Customer Support, Product Marketing

  1. Draft clear community guidelines aligned with legal requirements.
  2. Create specific categories of harmful content (e.g., hate speech, CSAM, self-harm).
  3. Which specific categories of harmful content should be prioritized due to legal obligations or risks?

What are our technical expectations and specifications?

Other teams to include: Engineering, Data Science, Infra, Ops

  1. Build in-house or select a vendor?
  2. How will we scale over the next 3-5 years?
  3. Do we start with hash matching for known CSAM?
  4. Do we need to add AI classifiers for the proactive detection of emerging threats?
  5. Do we need real-time filtering for text-based UGC like comments and chat?

Do we need human review?

Other teams to include: Customer Support, Account Managers

  1. Use hybrid systems where AI flags content and human moderators review borderline cases.
  2. Develop training systems and frequent checkpoints to address the evolution of threats.
  3. Provide mental health support for moderators dealing with harmful content.

How do we handle user reports and appeals?

Other teams to include: Legal, Product Policy, UX, Engineering

  1. Allow users to flag harmful content easily.
  2. Build appeal systems for mistaken removals.
  3. Establish escalation and decision-making frameworks

How will we address transparency and auditing?

Other teams to include: Legal, Policy

  1. Publish your policies publicly to set user expectations.
  2. Publish regular transparency reports to meet DSA and other regulatory requirements.
  3. Review moderation accuracy and update models regularly.
  4. Publish your policies publicly to set user expectations.

The Now, Next, Laters

As the above starts to take shape, you’ll need to prioritize your roadmap. What content needs to be addressed most urgently? What content will require special tooling? A lot of this will depend on your product, but there are some constants.

Now: CSAM detection and enticement signals

For child safety, there are outside factors, such as laws and public pressure, that move CSAM and child grooming to the top of every Product Manager’s priority list for risk mitigation. The rapid evolution of generative AI and other emerging risks, such as financial sextortion, has made addressing child safety an urgent need for most digital platforms.

  • CSAM: Digital platforms are legally required to review and report CSAM. The most economical and reliable way to proactively detect known CSAM is through hashing and matching, also known as fingerprinting. If your platform hosts any kind of image or video content, this is an important tool. Thorn has also built the Safer Predict CSAM classifier to identify potentially novel image and video CSAM, giving platforms a comprehensive CSAM detection coverage.

  • AI-generated or AI-manipulated CSAM: In the US, AI-generated CSAM is federally illegal (18 U.S.C. §1466A). In the UK, four laws were recently passed that make it illegal to possess, create or distribute AI tools designed to create CSAM. Generative AI models are already being manipulated to produce custom CSAM. GenAI is being used to generate new material of historical survivors and bespoke material of kids from benign imagery for the purposes of fantasy, extortion, and, in some cases, peer-based bullying within schools and communities.

  • Deepfake nudes: AI-generated deepfake nudes are accelerating the spread of nonconsensual image abuse, reshaping how young people experience online harm and creating new areas of safety risk to mitigate. Research from Thorn, found 31% of teens are already familiar with deepfake nudes, and 1 in 8 personally knows someone who has been targeted.

  • Child grooming and enticement: In 2024, the REPORT Act was signed into law, also requiring platforms to report any child grooming and enticement to NCMEC. If proactive detection of this behavior is a priority for your platform, there are machine learning classifiers available that are trained to detect these text-based harms, including the Safer Predict text classifier.

Next: Other forms of toxic content

  • Following the above, the next threats truly depend on what your community aims to do. Is it at high risk for spam? Hate speech? This will help shape what kind of proactive detection you’ll need to prioritize next.
  • Establish systems and solutions to implement basic content review and reporting.

Later: Automation and optimization

  • Design and develop semi-automated processes based on complex platform signals and policies.
  • Develop advanced analytics on trends—including geographies, times, types of bad actors, trending toxic phrases or keywords, etc.
  • Optimize processes for human moderators, reducing their exposure to abuse and increasing wellness opportunities.

Creating your trust and safety tooling business case

Now that you have a high-level understanding of what your platform’s content moderation needs and priorities are, you can start to drill into the tactical decisions, like whether you want to build or buy your solution, and if you buy, what is needed from your vendor. You’ll need to answer:

  1. What goals are your company trying to achieve?
  2. What departmental goals will this tool help you achieve?
  3. Who needs to be a part of the decision-making process and when do those decisions need to be made?
  4. What is our plan for implementing the tool? Which teams need to be involved? What is the sequence of those steps?
  5. How can we prioritize the most impactful types of detection, like hashing and matching for CSAM?
  6. What is our budget? What can we save on now?
  7. What risks are we assuming by not moderating? Are we willing to take those risks?
  8. What kind of technical evaluation can we complete (hint, scorecard here!) Do we want to do a trial for the top contenders?
  9. Will this solution scale with us?
  10. What kind of kickoff do we need to ensure this project is successful?

Get the trust and safety tooling business case template here to get started.

Using the trust and safety tooling scorecard

Vendor calls start to meld into one after a while. *Which hash lists were available? Which one updated its model annually? How was their model trained? *

A technical scorecard is handy to keep you on task and to give you a side-by-side comparison of the vendors. It grounds the team into the reality of the situation and puts weight behind real requirements. Assigning a score to each solution helps take the guesswork out of vendor selection. The goal with a scorecard is to:

  1. Collect and record your requirements
  2. Apply weights to what matters most
  3. Get direct answers from your vendors
  4. Record the answers in one place for all cross-functional teams to see
  5. Tally up the weighted scores

Finance will want to know the price. Engineers will want to know how much work it will take. Data science wants to know whether their models are sufficient. Most of all the product team is going to need to keep track of all of the internal and external requirements, and most likely, make the final decision on which vendor is the best fit for your needs.

To no PM’s surprise, the key to a successful scorecard is true cross-functional collaboration and buy-in. As a product manager, you’ll need to:

  1. Understand how your product is going to grow in the next 3-5 years, whether or not the vendor will scale with you, and how.
  2. Work with finance on the budget
  3. Work with engineering on the scope of implementation.
  4. Align with policy and legal to make sure the solution will help enforce community guidelines and reporting obligations.

A vendor analysis is more than checking a box for classification or hashing. It requires clear goals and clear answers.

Get the trust and safety tooling scorecard template here to start your evaluation process.

Investing in responsible growth

Thoughtful moderation strategies protect your business from legal risks and reputational harm. They also create a safer digital space for your community to thrive. Remember that moderation is an ongoing journey, not a one-time implementation.

Technology evolves, threats adapt, and your moderation approach must keep pace. Successful platforms see content moderation as key to user experience, not just a regulatory task. Use our business case framework and scorecard to make choices that reflect your company values. This will protect your users and support responsible growth for your platform. Investing in effective content moderation now will build trust, safety and sustainable growth for years ahead.