Automated Coding of Political Video Ads

Summary: With the advent of new media technology and the ability to identify more information about potential voters, political campaigns have aggressively changed their campaign strategies. Election campaigns increasingly rely on online video advertising to reach voters. Until now, the contents of these ads are manually coded for political science research to study campaign strategies. Manual coding is tremendously time consuming and not scalable to handle the expected increase in online ads. We make the first attempt to investigate automated coding of the content of political video ads for political science research. Specifically, we focus on the problem of classifying a political ad into one of these categories: attack ads, promoting ads, and contrast ads. Together with the domain expert, we introduce a concrete definition for each of these categories. We made available the ground truth labels of 773 political ads of the 2016 primary presidential election. We investigate the effectiveness of several classifiers using single modality and two modalities. The best average F1 score is 0.845 using text features from audio and embedded text in image frames.

Definitions of Categories of Political Ads:
Attack ad is an ad that contains only attacks against others such as the following cases.
(1) Candidate A attacks candidate B in the same party.
(2) An organization attacks a candidate.
(3) An organization attacks a party.
Promoting ad is an ad that fits one of the following criteria.
(1) Candidate A promotes himself/herself than others.
(2) Candidate A attacks candidate B in another party.
(3) Candidate A attacks another party.
Contrast ad is an ad that promotes one candidate while attacks another candidate in the same party.
The above definitions are for the primary election. The main point of the attack ads and contrast ads criticizes opponents. In the general election, the opponent will be the candidate from the other party.   

              Promote Ad                     Attack Ad                              Contrast Ad

Data Collection and Data Cleaning: We collected presidential video ads from two political websites PCL and Political TV Ad Archive. Since some of these ads are duplicate of each other, we detected the duplicate ones by comparing the cosine similarity for each pair of ads. If the cosine similarity of any two ads is over a certain threshold of 0.7, we deleted the one with a lower resolution. Next, we excluded the following ads: non-English ads, low resolution ads, and ads from the candidate who released less than 10 ads and suspended the campaign. Our final dataset has ads from eleven following presidential candidates for the 2016 primary election.


Democratic Party:   Bernie Sanders     Hillary Clinton  


Republican Party:    Ben Carson   Carly Fiorina    Chris Christie    Donald Trump   Jeb Bush     John Kasich      Marco Rubio     Rand Paul        Ted Cruz

Ground Truth Labeling: This process was very time consuming. All the ads in the dataset were first labeled by two political science students under the guidance of the political science professor, one of the authors. The ads were reviewed by the professor who reviewed the labels and resolved any conflicting labels. The names of the ads and the corresponding ground truth labels are available HERE. The final dataset consists of 773 ads (“Entire Set”) classified into 498 promoting ads (“Promoting Set”), 191 attack ads (“Attack Set”), and 84 contrast ads (“Contrast Set”).