Advanced Natural Language Processing (968G5):

March 29, 2023

Assessed coursework 1

Format Submit a single zip file containing 1 pdf and an appendix of your code (which may be a .ipynb or a .py file)

Word Count 8 pages (approx. 3000 words) plus code appendix

Marking You will be told your mark and receive feedback via Canvas before Friday 19th May

Weighting This assignment is worth 60% of your mark for this module.

1.0 Practical assignment (3000 words):  Propaganda Detection

You are provided with a zipfile propaganda dataset. This includes 2 files with identical format: one for training and one for testing. Each file is in tab-separated-value (tsv) format with 2 columns as illustrated below.

flag wavingI want to get <BOS> our soldiers <EOS> out.
not propagandaOur older measure of <BOS> American Worker Displacement <EOS> understated the problem.

The first column contains a label from a set of 9 possibilities which are

  1. flag waving
  2. appeal to fear prejudice
  3. causal simplification
  4. doubt
  5. exaggeration,minimisation
  6. loaded language
  7. name calling,labeling
  8. repetition
  9. not propaganda

The first 8 labels are all propaganda techniques and are a subset of those identified in the Propaganda Techniques Corpus (Da San Martino et al., 2020). The final label not propaganda indicates that no propaganda has been identified in the text. The second column contains a sentence or chunk of text where the propaganda technique has been identified (or no propaganda has been identified in the case of not propaganda). Note the use of additional tokens <BOS> and <EOS> which indicate the beginning and end of the span of text (within the sentence) which is actually annotated with the given propaganda technique.  In the first example above, the span of text “our soldiers” has been identified as an example of flag waving in the context of the sentence “I want to get our soldiers out.”

Your tasks are as follows:

  1. Build and evaluate at least 2 approaches to classify whether a sentence contains propaganda or not.
  2. Given a snippet or span of text which is known to contain propaganda, build and evaluate at least 2 approaches to classifying the propaganda technique which has been used.

In this assignment you are expected to complete both tasks above and investigate at least 2 different approaches to making classification decisions. The approaches used for task 2 may be the same or different to the approaches used in task 1. Your solution does not need to be novel. You might choose to investigate 2 of the following approaches or 1 of the following approaches and 1 of your own devising.

  • Text probability based on n-gram language models
  • Text similarity or classification based on uncontextualised word embedding methods e.g., word2vec • Neural language models
  • Pretrained large language models e.g., BERT

It does not matter how well your method(s) perform. However, your methods should be clearly described, any hyper-parameters (either fixed, varied or optimised) should be discussed and there should be a clear comparison of the approaches with each other – both from a practical and empirical perspective.

1.1 Resources

You have been provided with the training and test data for this task with the assignment. You may (and are expected to) use any of the code that you have developed throughout the labs. This includes code provided to you in the exercises or solutions. You may use any other resources to which you have access. You may also download other resources from the Internet and make use of any Python libraries with which you are familiar. All code that you use (libraries, lab solutions and open source code) should be probably accredited within your code base and within your report e.g., “my function for X is adapted from code available at Y”

1.2 Report

Your report should be in the style of an academic paper. It should include an introduction to the problem and the methods you have implemented. It might contain a brief discussion of related work in the area but the focus here should be on your practical work rather than producing a comprehensive literature review. Also, make sure you describe your solution and not just the theoretical background of the approach. For example, the theoretical background on how word embeddings are learnt using word2vec might be useful to motivate your approach but does not constitute a description of your method to solving the task using word2vec – there are many ways word2vec can be used to provide a solution and it is this that you should focus on in the description of your method. You should also make sure you discuss any hyper-parameter settings – both those which you have decided to fix and any which you are investigating. Justify your design decisions. You should discuss and justify the method of evaluation. You should provide your results and compare them with any baselines. You should also provide some analysis of errors – do the approaches make the same or different mistakes and can you comment on the types or causes of errors being made? You should end with your conclusions and areas for further work. You should also submit your code as an appendix. Your report (including figures and bibliography but not including code appendix) should be no longer than 8 sides (3000 words of text plus figures and bibliography). Your code in the appendix should be clearly commented.

Marks will not be awarded simply for how well your system does or for programming wizardry. Marks will be awarded for clearly evaluating possible solutions to the tasks set out above.

Get expert help on all your assignments

Trusted Assignment Help

Buy Assignment Writing Help Online

25% Off

Get Free Quote

HomeworkChoice is a dedicated paper writing service for students the world over. Our team crafts exceptional college application essays or other types of academic papers. The main goal is to help students eliminate stress and work with them to create brilliant, thought compelling – and most importantly, top mark academic papers.

Get In Touch



+1 520 556 7735


© 2023 HomeworkChoice. All Rights Reserved.

Terms And Conditions Privacy Policy Refund Policy