Deepfake detection challenge from R

This is a companion discussion topic for the original entry at https://blogs.rstudio.com/tensorflow/posts/2020-08-18-deepfake


.colab-root { display: inline-block; background: rgba(255, 255, 255, 0.75); padding: 4px 8px; border-radius: 4px; font-size: 11px!important; text-decoration: none; color: #aaa; border: none; font-weight: 300; border: solid 1px rgba(0, 0, 0, 0.08); border-bottom-color: rgba(0, 0, 0, 0.15); text-transform: uppercase; line-height: 16px; } span.colab-span { background-image: url(https://distill.pub/2020/growing-ca/images/colab.svg); background-repeat: no-repeat; background-size: 20px; background-position-y: 2px; display: inline-block; padding-left: 24px; border-radius: 4px; text-decoration: none; }

Introduction

Working with video datasets, particularly with respect to detection of AI-based fake objects, is very challenging due to proper frame selection and face detection. To approach this challenge from R, one can make use of capabilities offered by OpenCV, magick, and keras.

Our approach consists of the following consequent steps:

  • read all the videos
  • capture and extract images from the videos
  • detect faces from the extracted images
  • crop the faces
  • build an image classification model with Keras

Let’s quickly introduce the non-deep-learning libraries we’re using. OpenCV is a computer vision library that includes:

On the other hand, magick is the open-source image-processing library that will help to read and extract useful features from video datasets:

  • Read video files
  • Extract images per second from the video
  • Crop the faces from the images

Before we go into a detailed explanation, readers should know that there is no need to copy-paste code chunks. Because at the end of the post one can find a link to Google Colab with GPU acceleration. This kernel allows everyone to run and reproduce the same results.

Data exploration

The dataset that we are going to analyze is provided by AWS, Facebook, Microsoft, the Partnership on AI’s Media Integrity Steering Committee, and various academics.

It contains both real and AI-generated fake videos. The total size is over 470 GB. However, the sample 4 GB dataset is separately available.

Frame extraction

The videos in the folders are in the format of mp4 and have various lengths. Our task is to determine the number of images to capture per second of a video. We usually took 1-3 fps for every video.

Note: Set fps to NULL if you want to extract all frames.

video = magick::image_read_video("aagfhgtpmv.mp4",fps = 2)
vid_1 = video[[1]]
vid_1 = magick::image_read(vid_1) %>% image_resize('1000x1000')
![[Deepfake detection challenge](https://www.kaggle.com/c/deepfake-detection-challenge/data)](upload://uqCYeQwF2hm6sfmhSx8nLMcwKZz.jpeg)

(#fig:unnamed-chunk-2)Deepfake detection challenge

We saw just the first frame. What about the rest of them?

![[Deepfake detection challenge](https://www.kaggle.com/c/deepfake-detection-challenge/data)](upload://1J78nB187LMfOCgOwojdMSfXFil.gif)

(#fig:unnamed-chunk-3)Deepfake detection challenge

Looking at the gif one can observe that some fakes are very easy to differentiate, but a small fraction looks pretty realistic. This is another challenge during data preparation.

Face detection

At first, face locations need to be determined via bounding boxes, using OpenCV. Then, magick is used to automatically extract them from all images.

# get face location and calculate bounding box
library(opencv)
unconf <- ocv_read('frame_1.jpg')
faces <- ocv_face(unconf)
facemask <- ocv_facemask(unconf)
df = attr(facemask, 'faces')
rectX = (df$x - df$radius) 
rectY = (df$y - df$radius)
x = (df$x + df$radius) 
y = (df$y + df$radius)

draw with red dashed line the box

imh = image_draw(image_read('frame_1.jpg'))
rect(rectX, rectY, x, y, border = "red",
lty = "dashed", lwd = 2)
dev.off()

![[Deepfake detection challenge](https://www.kaggle.com/c/deepfake-detection-challenge/data)](upload://iNAf4p1KEpH9bW9gct4hYNGPBxx.jpeg)

(#fig:unnamed-chunk-5)Deepfake detection challenge

Face extraction

If face locations are found, then it is very easy to extract them all.

edited = image_crop(imh, "49x49+66+34")
edited = image_crop(imh, paste(x-rectX+1,'x',x-rectX+1,'+',rectX, '+',rectY,sep = ''))
edited
![[Deepfake detection challenge](https://www.kaggle.com/c/deepfake-detection-challenge/data)](upload://nbZW2xhAyLcPsBtB6Q9LrFwqSPd.jpeg)

(#fig:unnamed-chunk-7)Deepfake detection challenge

Deep learning model

After dataset preparation, it is time to build a deep learning model with Keras. We can quickly place all the images into folders and, using image generators, feed faces to a pre-trained Keras model.

train_dir = 'fakes_reals'
width = 150L
height = 150L
epochs = 10

train_datagen = image_data_generator(
rescale = 1/255,
rotation_range = 40,
width_shift_range = 0.2,
height_shift_range = 0.2,
shear_range = 0.2,
zoom_range = 0.2,
horizontal_flip = TRUE,
fill_mode = "nearest",
validation_split=0.2
)

train_generator <- flow_images_from_directory(
train_dir,
train_datagen,
target_size = c(width,height),
batch_size = 10,
class_mode = "binary"
)

Build the model ---------------------------------------------------------

conv_base <- application_vgg16(
weights = "imagenet",
include_top = FALSE,
input_shape = c(width, height, 3)
)

model <- keras_model_sequential() %>%
conv_base %>%
layer_flatten() %>%
layer_dense(units = 256, activation = "relu") %>%
layer_dense(units = 1, activation = "sigmoid")

model %>% compile(
loss = "binary_crossentropy",
optimizer = optimizer_rmsprop(lr = 2e-5),
metrics = c("accuracy")
)

history <- model %>% fit_generator(
train_generator,
steps_per_epoch = ceiling(train_generator$samples/train_generator$batch_size),
epochs = 10
)



Reproduce in a Notebook

Conclusion

This post shows how to do video classification from R. The steps were:

  • Read videos and extract images from the dataset
  • Apply OpenCV to detect faces
  • Extract faces via bounding boxes
  • Build a deep learning model

However, readers should know that the implementation of the following steps may drastically improve model performance:

  • extract all of the frames from the video files
  • load different pre-trained weights, or use different pre-trained models
  • use another technology to detect faces – e.g., “MTCNN face detector”

Feel free to try these options on the Deepfake detection challenge and share your results in the comments section!

Thanks for reading!

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.