Data and Social Justice

Dan Lee
3 min readApr 6, 2021

Why Data Science?

This being the opening post of my new Data Science blog, I’d like to share some personal experience leading me to take interest in this field. In February 2020 I came across a two-part investigative series on The Daily podcast from The New York Times.

A Dark Reality

Full disclosure: The content of this podcast’s material is rather dark. The weightiness of the subject matter in itself was deeply impactful, and in time was a seed that eventually grew into my leap of faith to enter the DS field.

The topic covered in this investigative report was the dramatic increase of illegal online sharing of Child Sexual Abuse Material. The National Center for Missing & Exploited Children (NCMEC) defines Child Sexual Abuse Material (CSAM) as any visual depiction of sexually explicit conduct involving a minor. While hands-on physical sexual abuse of a minor is undoubtedly traumatic resulting in long-lasting physical and emotional repercussions, the online sharing of CSAM impacts them differently because these images are permanent and their distribution is potentially unending. The horrifying reality for these victims is they are subject to abuse each and every time the material transfers hands, following them for the rest of their lives.

Fighting the Good Fight

Of the 18.4 million reports of illegal imagery received by the National Center in 2018, 17 million came from Facebook’s platforms. Does this mean child predators are using FB exclusively to exchange their illegal material? Surely not. Instead, Facebook is the only company aggressively acting to root out and report CSAM to authorities. Alex Stamos, former chief security officer at FB and current professor at Stanford University, believes that if every company with a file-sharing platform searched for and reported their numbers, likely 100 million+ reports per year would result.

The list of popular file-sharing platforms includes but is not exclusive to:

  • Amazon’s cloud storage with millions of uploads and downloads per day. No scanning.
  • Apple’s encrypted messaging platform iMessage, and photo/video backup platform iCloud. No scanning.
  • Both Snapchat and Yahoo! scan photos for CSAM but not video (video being a significant amount of such illegal material)

How Data is Used for Good

What is FB’s method for their aggressive reporting? This is where the Data Science aspect comes in (well sort of). FB scans every image/video that travels across their platform in an unencrypted(footnote 1) manner against a database of known child sexual abuse material. The fact that FB implements this strategy and many other industry leaders do not is why FB finds far more material than any other company.

As I am early in my studies, I’m beginning to see how the DS field can support in analysis by finding answers to some complex problems. A data scientist may work with the data itself to discover trends and insights, potentially helping federal investigators in their fight to catch child predators. This could aid investigators in their goal to assist victims of child trafficking, sexual exploitation, and physical harm.

For instance, a data scientist may look at such a block of data and attempt to answer the following questions: What similarities are there across confirmed victims? Can this tell us what youth are more vulnerable to predators?

Catalyzed by this podcast, I began to dream of using my affinity for mathematics, logic, and problem solving to make an impact on my community. My outlook shifted to seeing human problems through the lens of data. The incredible potential data has — when strategically utilized — can be world changing. Data forms the foundational underpinnings of every major field and industry. The fight for human equity and preserving civil rights is just one such sphere. The work of a data scientist can improve lives (even save lives!) and be a tool in exacting social justice for the most vulnerable.

1 The debate over end-to-end encryption and consumer privacy is a closely related discussion of equal relevance which I will not discuss in great detail here. It is important to note that platforms featuring end-to-end encryption are not scannable. Encryption effectively blinds a company from being able to see such sensitive materials traveling through their platform. Additionally, FB has announced their plans to move their messenger platform towards end-to-end encryption, as elevating consumer privacy is an attractive option for many tech companies.

Linked to in this post:

https://www.nytimes.com/2020/02/19/podcasts/the-daily/child-sex-abuse.html

https://www.nytimes.com/2020/02/07/us/online-child-sexual-abuse.html

https://www.missingkids.org/theissues/csam

https://www.missingkids.org/theissues/end-to-end-encryption

https://www.wired.com/story/facebook-messenger-end-to-end-encryption-default/

--

--