Advanced Natural Language Processing (968G5):

March 29, 2023

Assessed coursework 1

Format Submit a single zip file containing 1 pdf and an appendix of your code (which may be a .ipynb or a .py file)

Word Count 8 pages (approx. 3000 words) plus code appendix

Marking You will be told your mark and receive feedback via Canvas before Friday 19th May

Weighting This assignment is worth 60% of your mark for this module.

1.0 Practical assignment (3000 words): Propaganda Detection

You are provided with a zipfile propaganda dataset. This includes 2 files with identical format: one for training and one for testing. Each file is in tab-separated-value (tsv) format with 2 columns as illustrated below.

label	sentence
flag waving	I want to get <BOS> our soldiers <EOS> out.
not propaganda	Our older measure of <BOS> American Worker Displacement <EOS> understated the problem.

The first column contains a label from a set of 9 possibilities which are

flag waving
appeal to fear prejudice
causal simplification
doubt
exaggeration,minimisation
loaded language
name calling,labeling
repetition
not propaganda

The first 8 labels are all propaganda techniques and are a subset of those identified in the Propaganda Techniques Corpus (Da San Martino et al., 2020). The final label not propaganda indicates that no propaganda has been identified in the text. The second column contains a sentence or chunk of text where the propaganda technique has been identified (or no propaganda has been identified in the case of not propaganda). Note the use of additional tokens <BOS> and <EOS> which indicate the beginning and end of the span of text (within the sentence) which is actually annotated with the given propaganda technique. In the first example above, the span of text “our soldiers” has been identified as an example of flag waving in the context of the sentence “I want to get our soldiers out.”

Your tasks are as follows:

Build and evaluate at least 2 approaches to classify whether a sentence contains propaganda or not.
Given a snippet or span of text which is known to contain propaganda, build and evaluate at least 2 approaches to classifying the propaganda technique which has been used.

In this assignment you are expected to complete both tasks above and investigate at least 2 different approaches to making classification decisions. The approaches used for task 2 may be the same or different to the approaches used in task 1. Your solution does not need to be novel. You might choose to investigate 2 of the following approaches or 1 of the following approaches and 1 of your own devising.

Text probability based on n-gram language models
Text similarity or classification based on uncontextualised word embedding methods e.g., word2vec • Neural language models
Pretrained large language models e.g., BERT

It does not matter how well your method(s) perform. However, your methods should be clearly described, any hyper-parameters (either fixed, varied or optimised) should be discussed and there should be a clear comparison of the approaches with each other – both from a practical and empirical perspective.

1.1 Resources

You have been provided with the training and test data for this task with the assignment. You may (and are expected to) use any of the code that you have developed throughout the labs. This includes code provided to you in the exercises or solutions. You may use any other resources to which you have access. You may also download other resources from the Internet and make use of any Python libraries with which you are familiar. All code that you use (libraries, lab solutions and open source code) should be probably accredited within your code base and within your report e.g., “my function for X is adapted from code available at Y”

1.2 Report

Your report should be in the style of an academic paper. It should include an introduction to the problem and the methods you have implemented. It might contain a brief discussion of related work in the area but the focus here should be on your practical work rather than producing a comprehensive literature review. Also, make sure you describe your solution and not just the theoretical background of the approach. For example, the theoretical background on how word embeddings are learnt using word2vec might be useful to motivate your approach but does not constitute a description of your method to solving the task using word2vec – there are many ways word2vec can be used to provide a solution and it is this that you should focus on in the description of your method. You should also make sure you discuss any hyper-parameter settings – both those which you have decided to fix and any which you are investigating. Justify your design decisions. You should discuss and justify the method of evaluation. You should provide your results and compare them with any baselines. You should also provide some analysis of errors – do the approaches make the same or different mistakes and can you comment on the types or causes of errors being made? You should end with your conclusions and areas for further work. You should also submit your code as an appendix. Your report (including figures and bibliography but not including code appendix) should be no longer than 8 sides (3000 words of text plus figures and bibliography). Your code in the appendix should be clearly commented.

Marks will not be awarded simply for how well your system does or for programming wizardry. Marks will be awarded for clearly evaluating possible solutions to the tasks set out above.

Get expert help on all your assignments

Trusted Assignment Help

Buy Assignment Writing Help Online

25% Off

Total price:

0.00

Get Free Quote

Timely delivery
Any assignment deadline you set, we can meet. You’re guaranteed a quality paper when you need it, always on-time.
Plagiarism-free
All papers are screened through reputable plagiarism scanners. You’ll only recieve original academic assignments.
10/10 quality
Experts working for our professional essay writing service come from the US, UK and Canada and are highly qualified, with years of experience and education.
Security & privacy
Ensuring the security of our customers’ private data is crucial. Complete and strict confidentiality is our #1 priority. Your data is secured following GDPR and CCPA.
Refund policy
Have peace of mind knowing that our online writing services offer refunds. It is quite unlikely that you'll need to request one as our satisfaction rate is 98%.

Make An Order

Advanced Natural Language Processing (968G5):

1.0 Practical assignment (3000 words): Propaganda Detection

1.1 Resources

1.2 Report

Get expert help on all your assignments

Trusted Assignment Help

25% Off

0.00

Free Features

Get all these free features forFree!

Our Guarantees

Get In Touch

Email

Phone

Quick Links

Newsletter