NLP Citations for 'The Pile'

Project Gutenberg-19

Original link:

      title={Compressive Transformers for Long-Range Sequence Modelling}, 
      author={Jack W. Rae and Anna Potapenko and Siddhant M. Jayakumar and Timothy P. Lillicrap},



  title={Big bird: Transformers for longer sequences},
  author={Zaheer, Manzil and Guruganesh, Guru and Dubey, Kumar Avinava and Ainslie, Joshua and Alberti, Chris and Ontanon, Santiago and Pham, Philip and Ravula, Anirudh and Wang, Qifan and Yang, Li and others},
  journal={Advances in Neural Information Processing Systems},

Usage: In this paper introducing the BIGBIRD Transformer, the authors compare their new model's performance in solving the problem of quadratic time undermining the usefulness of large Transformers to existing models such as the process introduced by Rae et al (2020) with PG-19.



      title={Rethinking Attention with Performers}, 
      author={Krzysztof Choromanski and Valerii Likhosherstov and David Dohan and Xingyou Song and Andreea Gane and Tamas Sarlos and Peter Hawkins and Jared Davis and Afroz Mohiuddin and Lukasz Kaiser and David Belanger and Lucy Colwell and Adrian Weller},

Usage: The authors present the PG-19 dataset as "a long-range text modelling task" (2020, p15). They compare the original tokenisation of PG-19 to their own tokenisation method and take the dataset's log likelihood multpiiers before calculating the perplexities involved. The paper uses PG-19 as part of demonstrating 'Performers', transformer architectures for estimating the accuracy of softmax full-rank-attention Transformers but using only linear space and time complexity, not quadratic.



      title={Recipes for building an open-domain chatbot}, 
      author={Stephen Roller and Emily Dinan and Naman Goyal and Da Ju and Mary Williamson and Yinhan Liu and Jing Xu and Myle Ott and Kurt Shuster and Eric M. Smith and Y-Lan Boureau and Jason Weston},

Usage: The authors describe their bots' inability to learn about the agent with whom they are chatting, comparing the decision by Rae et al (2020), to use "extended neural architectures to process longer contexts" with the authors' decision not to, and their belief that the evaluation setup used by the longer-context paper was not the right one for measuring their success in this paper.



author = {Huang, Yu-Siang and Yang, Yi-Hsuan},
title = {Pop Music Transformer: Beat-Based Modeling and Generation of Expressive Pop Piano Compositions},
year = {2020},
isbn = {9781450379885},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {},
doi = {10.1145/3394171.3413671},
booktitle = {Proceedings of the 28th ACM International Conference on Multimedia},
pages = {1180–1188},
numpages = {9},
keywords = {automatic music composition, transformer, neural sequence model},
location = {Seattle, WA, USA},
series = {MM '20}

Usage: The paper compares the use of existing text corpora, including PG19, with their own metrical structure for feeding classical music into Transformers to create AI-generated compositions. In this specific case the comparison is with a pop piano piece built with their Pop Music Transformer.



author={K. {Irie} and A. {Gerstenberger} and R. {Schlüter} and H. {Ney}},  
booktitle={ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},   
title={How Much Self-Attention Do We Needƒ Trading Attention for Feed-Forward Layers},   

Irie et al (2020) propose a simple architectural modification when using Transformers to deal with the large states that need to be stored at evaluation time when scaling those models up. This paper aims to show how the traditionally high-performance large Transformer-type models can maintain that performance without the poor scaling of state management during evaluation.

Full list of found citations

| Title | URL | | ---- | ---- | | Big bird: Transformers for longer sequences | | | Longformer: The long-document transformer | | | Efficient transformers: A survey | | | Pop Music Transformer: Beat-based modeling and generation of expressive Pop piano compositions | | | Rethinking attention with performers | | | How Much Self-Attention Do We Needƒ Trading Attention for Feed-Forward Layers | | | Recipes for building an open-domain chatbot | | | Do Transformers Need Deep Long-Range Memory | | | Sparse Sinkhorn Attention | | | Improving Transformer Models by Reordering their Sublayers | | | HiPPO: Recurrent Memory with Optimal Polynomial Projections | | | Training with Quantization Noise for Extreme Fixed-Point Compression | | | Streaming Transformer-based Acoustic Models Using Self-attention with Augmented Memory | | | ProGen: Language Modeling for Protein Generation | | | Beyond 512 Tokens: Siamese Multi-depth Transformer-based Hierarchical Encoder for Document Matching | | | Zero-shot Entity Linking with Efficient Long Range Sequence Modeling | | | CogLTX: Applying BERT to Long Texts | | | The Deep Convolutional Neural Network for NOx Emission Prediction of a Coal-Fired Boiler | | | Accessing Higher-level Representations in Sequential Transformers with Feedback Memory | | | Masked Language Modeling for Proteins via Linearly Scalable Long-Context Transformers | | | Capturing Longer Context for Document-level Neural Machine Translation: A Multi-resolutional Approach | | | ETC: Encoding Long and Structured Inputs in Transformers | | | Stepwise Extractive Summarization and Planning with Structured Transformers | | | Transformers for limit order books | | | Advancing Neural Language Modeling in Automatic Speech Recognition | | | ETC: Encoding Long and Structured Data in Transformers | | | Cluster-former: Clustering-based sparse transformer for long-range dependency encoding | | | Exploring Transformers for Large-Scale Speech Recognition | | | Transformer-based Long-context End-to-end Speech Recognition | | | GMAT: Global Memory Augmentation for Transformers | | | Beyond 512 Tokens: Siamese Multi-depth Transformer-based Hierarchical Encoder for Long-Form Document Matching | | | Multi-scale Transformer Language Models | | | Attending to Long-Distance Document Context for Sequence Labeling | | | HUSH: A Dataset and Platform for Human-in-the-Loop Story Generation | | | STORIUM: A Dataset and Evaluation Platform for Machine-in-the-Loop Story Generation | | | Network Representation Learning Based on Topological Structure and Vertex Attributes | | | Memformer: The Memory-Augmented Transformer | |


Original link:

  title={Finding alternative translations in a large corpus of movie subtitle},
  author={Tiedemann, J{\"o}rg},
  booktitle={Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)},


Information Extraction from TV Series Scripts for Uptake Prediction Original URL:

  title={Information Extraction from TV Series Scripts for Uptake Prediction},
  author={Wang, Junshu},
  school={Auckland University of Technology}

Wang and Junshu (2017) used IMDB, script websites, and the OpenSubtitles2016 corpus to extract features from pilot episodes in an attempt to create a predictive model of whether or not a pilot script would be green-lit. They admit, however, that the results were not as compelling as was hoped, and the paper has been cited only twice.


Paraphrase Detection on Noisy Subtitles in Six Languages Original URL:

      title={Paraphrase Detection on Noisy Subtitles in Six Languages}, 
      author={Eetu Sjöblom and Mathias Creutz and Mikko Aulamo},

This paper describes the results from a paraphrase detection model they had created, once with a Word-Averaging (WA) model and once with a Gated Recurrent Averaging Network (GRAN). Of the two approaches, the GRAN proved vastly more effective and showed better results when more, noisy data was used rather than less, cleaner data.


PassPort: A Dependency Parsing Model for Portuguese Original URL:

author="Zilio, Leonardo and Wilkens, Rodrigo and Fairon, C{\'e}drick",
editor="Villavicencio, Aline and Moreira, Viviane and Abad, Alberto and Caseli, Helena and Gamallo, Pablo and Ramisch, Carlos and Gon{\c{c}}alo Oliveira, Hugo and Paetzold, Gustavo Henrique",
title="PassPort: A Dependency Parsing Model for Portuguese",
booktitle="Computational Processing of the Portuguese Language",
publisher="Springer International Publishing",

This paper outlines how PassPort was created in the context of previous NLP translation models and its parser created using the Stanford Parser. PassPort's cited UAS scores are both between 85% and 90% against the Universal Dependencies corpus. While PassPort achieved a slightly better score compared with PALAVRAS, but the authors admitted a bottleneck in part-of-speech tagging.


Distill, Adapt, Distill: Training Small, In-Domain Models for Neural Machine Translation Original URL:

      title={Distill, Adapt, Distill: Training Small, In-Domain Models for Neural Machine Translation}, 
      author={Mitchell A. Gordon and Kevin Duh},

Gordon and Duh (2020) released this paper that suggests a double-distillation process for machine translation models that require memory efficiency. Their method describes a general distillation process followed by an in-domain distillation across domain adaption and knowledge distillation use-cases.


Tilde at WMT 2020: News Task Systems Original URL:

  title={Tilde at WMT 2020: News Task Systems},
  author={Kri{\v{s}}lauks, Rihards and Pinnis, M{\=a}rcis},
  journal={arXiv preprint arXiv:2010.15423},

This WMT2020 submission describes further work being conducted on translating News content for both directions in the English-Polish language pair. The team used the Marian machine translation toolkit and experimented with muiltiple methodologies before providing both a 'Transformer base' and 'Transformer big' model for others to use.

Full list of found citations

| Title | URL | | ----- | -- | | Information Extraction from TV Series Scripts for Uptake Prediction | | | Paraphrase Detection on Noisy Subtitles in Six Languages | | | PassPort: A Dependency Parsing Model for Portuguese | | | Distill, Adapt, Distill: Training Small, In-Domain Models for Neural Machine Translation | | | Tilde at WMT 2020: News Task Systems | | | Mapping physical stores using google street view | | | Creating a frequency dictionary of spoken Hebrew : a reproducible use of technology to overcome scarcity of data | | | Toward automatic improvement of language produced by non-native language learners | | | EMPAC: an English–Spanish Corpus of Institutional Subtitles | | | SMILLE for Portuguese: Annotation and Analysis of Grammatical Structures in a Pedagogical Context | | | Designing the Business Conversation Corpus | | | CoTK: An Open-Source Toolkit for Fast Development and Fair Evaluation of Text Generation | | | Coursera Corpus Mining and Multistage Fine-Tuning for Improving Lectures Translation | | | Measuring the Effect of Conversational Aspectson Machine Translation Quality | | | JESC: Japanese-English Subtitle Corpus | | | Large-Scale Machine Translation between Arabic and Hebrew: Available Corpora and Initial Results | | | Phrase-Based SMT for Finnish with More Data, Better Models andAlternative Alignment and Translation Tools | | | OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles | | | OpenSubtitles2018: Statistical Rescoring of SentenceAlignments in Large, Noisy Parallel Corpora | | | Freezing Subnetworks to Analyze Domain Adaptation in Neural Machine Translation | |

DeepMind Mathematics

Original URL:

  title={Analysing mathematical reasoning abilities of neural models},
  author={Saxton, David and Grefenstette, Edward and Hill, Felix and Kohli, Pushmeet},
  journal={arXiv preprint arXiv:1904.01557},


Simulating Problem Difficulty in Arithmetic Cognition Through Dynamic Connectionist Models Original URL:

  title={Simulating Problem Difficulty in Arithmetic Cognition Through Dynamic Connectionist Models},
  author={Cho, Sungjae and Lim, Jaeseo and Hickey, Chris and Park, Jung and Zhang, Byoung-Tak},
  journal={arXiv preprint arXiv:1905.03617},

This study used addition and subtraction problems from the DMM to analyse the differences and similarities between humans and RNNs in tackling these problems, as well as gauging the difficulty curve experienced by both species as problem complexity increased. The paper finds a striking number of similarities between humans and machines using the Jordan network model.


Human-like machine thinking: Language guided imagination Original URL:

      title={Human-like machine thinking: Language guided imagination}, 
      author={Feng Qi and Wenchuan Wu},

This paper by Qi and Wu (2019) attempts to create an Language Guided Imagination (LGI) — a new model architecture where neural networks combine three human-like internal processes to enable a machine to construct fictitious mental scenarios as a source of reasoning and memory. They hope that this construction would enable a machine to possess general intelligence.


Leap-Of-Thought: Teaching Pre-Trained Models to Systematically Reason Over Implicit Knowledge Original URL:

  title={Leap-Of-Thought: Teaching Pre-Trained Models to Systematically Reason Over Implicit Knowledge},
  author={Talmor, Alon and Tafjord, Oyvind and Clark, Peter and Goldberg, Yoav and Berant, Jonathan},
  journal={Advances in Neural Information Processing Systems},

In this paper, Talmor et al (2020) describe a procedure for automatically generating training sets for training neural networks to reason over both natural language and symbolic facts such as maths. Their end goal is to establish an open-domain system that can improve itself with its own reasoning and simple corrections from user interactions in natural language.


Joint translation and unit conversion for end-to-end localization Original URL:

      title={Joint translation and unit conversion for end-to-end localization}, 
      author={Georgiana Dinu and Prashant Mathur and Marcello Federico and Stanislas Lauly and Yaser Al-Onaizan},

Dinu et all (2020) have used unit conversions as an example of NLP tasks needing to switch back and forth between natural and formal language during end-to-end localisation, proposing a data augmentation technique for models used in that context.


Performance vs. competence in human–machine comparisons Original URL:

@article {Firestone26562,
	author = {Firestone, Chaz},
	title = {Performance vs. competence in human{\textendash}machine comparisons},
	volume = {117},
	number = {43},
	pages = {26562--26571},
	year = {2020},
	doi = {10.1073/pnas.1905334117},
	publisher = {National Academy of Sciences},
	issn = {0027-8424},
	URL = {},
	eprint = {},
	journal = {Proceedings of the National Academy of Sciences}

The DeepMind Mathmatics corpus forms part of the tasks compared in Chaz Firestone's (2020) paper arguing for a change to the broader conception of species difference when humans and AIs fail at certain tasks such as maths. Chaz draws on cognitive science to argue that the destinction between competence and performance must be addressed for a more "species-fair" comparison between humans and machines.

Full list of found citations

| Title | URL | | ----- | -- | | Joint translation and unit conversion for end-to-end localization | | | Leap-Of-Thought: Teaching Pre-Trained Models to Systematically Reason Over Implicit Knowledge | | | Performance vs. competence in human–machine comparisons | | | Simulating Problem Difficulty in Arithmetic Cognition Through Dynamic Connectionist Models | | | Human-like machine thinking: Language guided imagination | | | Automatic load sharing of distribution transformer for overload protection | | | Semi-Lexical Languages -- A Formal Basis for Unifying Machine Learning and Symbolic Reasoning in Computer Vision | | | Analyzing Elementary School Olympiad Math Tasks as a Benchmark for AGI | | | Humans Keep It One Hundred: an Overview of AI Journey | | | Domain Adversarial Fine-Tuning as an Effective Regularizer | | | Recurrent Inference in Text Editing | | | Automatic diagnosis of the 12-lead ECG using a deep neural network | | | Transformer Model for Mathematical Reasoning -CS 230 Final Report | | | Deep Differential System Stability -- Learning advanced computations from examples | | | Are Neural Nets Modular? Inspecting Functional Modularity Through Differentiable Weight Masks | | | Solving Arithmetic Word Problems with a Templatebased Multi-Task Deep Neural Network (T-MTDNN) | | | Semantic similarity and machine learning with ontologies. | | | Machine learning with biomedical ontologies | | | Do Neural Models Learn Systematicity of Monotonicity Inference in Natural Language? | | | Emergence of Syntax Needs Minimal Supervision | | | Towards Question Format Independent Numerical Reasoning: A Set of Prerequisite Tasks | | | Achieving Machine Reasoning for Math ProblemsThrough Step-By-Step Supervision Signal | | | NeuReduce: Reducing Mixed Boolean-Arithmetic Expressions by Recurrent Neural Network | | | Compositional Generalization by Learning Analytical Expressions | | | Leap-Of-Thought: Teaching Pre-Trained Models to Systematically Reason Over Implicit Knowledge | | | Artificial General Intelligence: A New Perspective, with Application to Scientific Discovery | | | Generative Language Modeling for Automated Theorem Proving | | | First Symposium on Artificial Intelligence for Mathematics Education. Book of Abstracts (AI4ME 2020) | | | Scaling Laws for Autoregressive Generative Modeling | | | The Challenge of Modeling the Acquisition of Mathematical Concepts | | | Enhancing the Numeracy of Word Embeddings: A Linear Algebraic Perspective | | | Neural Status Registers | | | EXAMS: A Multi-Subject High School Examinations Dataset for Cross-Lingual and Multilingual Question Answering | | | Bongard-LOGO: A New Benchmark for Human-Level Concept Learning and Reasoning | | | Solving Arithmetic Problems on a Checkered Paper | | | Measuring Compositional Generalization: A Comprehensive Method on Realistic Data | | | Enhancing the Transformer with Explicit Relational Encoding for Math Problem Solving | | | The Search for Equations – Learning to Identify Similarities Between Mathematical Expressions | | | Compositional Generalization with Tree Stack Memory Units | | | Transformers as Soft Reasoners over Language | | | Injecting Numerical Reasoning Skills into Language Models | | | Mathematical reasoning abilities: The Impact of Novick's Learning and Somatic, Auditory, Visual, Intellectual Learning Styles | | | Towards Finding Longer Proofs | | | Simulating Problem Difficulty in Arithmetic Cognition Through Dynamic Connectionist Models | | | From Shallow to Deep Interactions Between Knowledge Representation, Reasoning and Machine Learning (Kay R. Amel group) | | | Machine Number Sense: A Dataset of Visual Arithmetic Problems for Abstract and Relational Reasoning | | | Compositionality Decomposed: How do Neural Networks Generalise? | | | Neural Mathematical Solver with Enhanced Formula Structure | | | Modelling High-Level Mathematical Reasoning in Mechanised Declarative Proofs | | | Visual sense of number vs. sense of magnitude in humans and machines | | | Dataset for Evaluation of Mathematical Reasoning Abilities in Russian | | | An Empirical Investigation of Contextualized Number Prediction | | | Deep Learning for Symbolic Mathematics | | | What Can Neural Networks Reason About? | | | Do NLP Models Know Numbers? Probing Numeracy in Embeddings | | | A Multi-Type Multi-Span Network for Reading Comprehension that Requires Discrete Reasoning | | | Giving BERT a Calculator: Finding Operations and Arguments with Reading Comprehension | | | The compositionality of neural networks:integrating symbolism and connectionism | |



  title={Aligning books and movies: Towards story-like visual explanations by watching movies and reading books},
  author={Zhu, Yukun and Kiros, Ryan and Zemel, Rich and Salakhutdinov, Ruslan and Urtasun, Raquel and Torralba, Antonio and Fidler, Sanja},
  booktitle={Proceedings of the IEEE international conference on computer vision},


Deep visual-semantic alignments for generating image descriptions Original URL:

  title={Deep visual-semantic alignments for generating image descriptions},
  author={Karpathy, Andrej and Fei-Fei, Li},
  booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},

BookCorpus formed the basis of this model designed to create textual descriptions of images and their compsite regions — a combination of fundamental NLP and Convolutional Neural Nets (CNNs) used in image processing. This paper has been cited 3,876 times and, like BERT, forms a backbone element of one of machine learning's core use-cases, furthing adding to the the gravitas of BookCorpus as a foundational work.


Skip-thought vectors Original URL:

 author = {Kiros, Ryan and Zhu, Yukun and Salakhutdinov, Russ R and Zemel, Richard and Urtasun, Raquel and Torralba, Antonio and Fidler, Sanja},
 booktitle = {Advances in Neural Information Processing Systems},
 editor = {C. Cortes and N. Lawrence and D. Lee and M. Sugiyama and R. Garnett},
 pages = {3294--3302},
 publisher = {Curran Associates, Inc.},
 title = {Skip-Thought Vectors},
 url = {},
 volume = {28},
 year = {2015}

Members of the team behind the original BookCorpus dataset were able to create an off-the-shelf encoder evaluated against 8 common NLP tasks — "semantic relatedness, paraphrase detection, image-sentence ranking, question-type classification and 4 benchmark sentiment and subjectivity datasets" (Kiros, et al. 2015) — using the BookCorpus dataset and an approach for unsupervised learning on generic distributed sentences. The encoder used an older encoder-decoder model instead of a more modern Transformer model.


Generative adversarial text to image synthesis Original URL:

  title={Generative adversarial text to image synthesis},
  author={Reed, Scott and Akata, Zeynep and Yan, Xinchen and Logeswaran, Lajanugen and Schiele, Bernt and Lee, Honglak},
  journal={arXiv preprint arXiv:1605.05396},

The IEEE's BookCorpus provided the foundations for Reed et al's seminal 2016 work on the automatic synthesis of realistic images from text despite long-standing incapacity among contemporary AI systems to achieve such a feat. The authors cite many recent advances in recurrent neural network (RNN) architectures as a significant step forward in their work in generative adverserial networks (GANs) using text-to-image models, and have themselves been cited 1,750 times by other papers.


Layer Normalisation Original URL:

      title={Layer Normalization}, 
      author={Jimmy Lei Ba and Jamie Ryan Kiros and Geoffrey E. Hinton},

JL Ba et al (2016) demonstrated a much-cited new method of normalising layers in a Recurrent Neural Network (RNN) by taking BookCorpus and processing it once with, and once without their new approach. The results and methods in the Layer Normalisation paper have been cited 2,424 times, showing the reliability of the BookCorpus dataset as a basis for demonstrating new methodologies.


Bert: Pre-training of deep bidirectional transformers for language understanding Original URL:

  title={Bert: Pre-training of deep bidirectional transformers for language understanding},
  author={Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina},
  journal={arXiv preprint arXiv:1810.04805},

BookCorpus is significant in its use in creating the BERT (Bidirectional Encoder Representations From Transformers) language representation model. The IEEE's corpus is now the upstream basis of a great deal of current leading-edge work on Transformers, especially in the field of NLP, with significant attention coming from Google's own AI researchers. BERT has itself become the basis upon which many of the most-used transformers in NLP are built, with this paper cited 12,472 times.

Full list of found citations

| Title | URL | | | | | Bert: Pre-training of deep bidirectional transformers for language understanding | | | Deep visual-semantic alignments for generating image descriptions | | | Layer normalization | | | Skip-thought vectors | | | Generative adversarial text to image synthesis | | | Improving language understanding by generative pre-training | | | Supervised learning of universal sentence representations from natural language inference data | | | Xlnet: Generalized autoregressive pretraining for language understanding | | | Glue: A multi-task benchmark and analysis platform for natural language understanding | | | Deep sentence embedding using long short-term memory networks: Analysis and application to information retrieval | | | Multimodal machine learning: A survey and taxonomy | | | End-to-end learning of action detection from frame glimpses in videos | | | An empirical evaluation of doc2vec with practical insights into document embedding generation | | | Movieqa: Understanding stories in movies through question-answering | | | Vse++: Improving visual-semantic embeddings with hard negatives | | | Representation learning with contrastive predictive coding | | | Grounding of textual phrases in images by reconstruction | | | Albert: A lite bert for self-supervised learning of language representations | | | What you can cram into a single vector: Probing sentence embeddings for linguistic properties | | | Automatic description generation from images: A survey of models, datasets, and evaluation measures | | | Roberta: A robustly optimized bert pretraining approach | | | Discovering event structure in continuous narrative perception and memory | | | Adversarial feature matching for text generation | | | A corpus and cloze evaluation for deeper understanding of commonsense stories | | | Siamese cbow: Optimizing word embeddings for sentence representations | | | Exploring the limits of transfer learning with a unified text-to-text transformer | | | Learning general purpose distributed sentence representations via large scale multi-task learning | | | Swag: A large-scale adversarial dataset for grounded commonsense inference | | | Unified language model pre-training for natural language understanding and generation | | | Uncovering the temporal context for video question answering | | | Linguistic knowledge and transferability of contextual representations | | | Tgif-qa: Toward spatio-temporal reasoning in visual question answering | | | Connectionist temporal modeling for weakly supervised action labeling | | | Movie description | | | Vilbert: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks | | | DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter | | | Generating text via adversarial training | | | From recognition to cognition: Visual commonsense reasoning | | | Transfertransfo: A transfer learning approach for neural network based conversational agents | | | Self-taught convolutional neural networks for short text clustering | | | The LAMBADA dataset: Word prediction requiring a broad discourse context | | | The neuro-symbolic concept learner: Interpreting scenes, words, and sentences from natural supervision | | | Learning generic sentence representations using convolutional neural networks | | | See, hear, and read: Deep aligned representations | | | Efficient vector representation for documents through corruption | | | Language models as knowledge bases? | | | Sentence encoders on stilts: Supplementary training on intermediate labeled-data tasks | | | Deep MIML Network. | | | Vl-bert: Pre-training of generic visual-linguistic representations | | | Enhancing video summarization via vision-language embedding | | | A corpus and evaluation framework for deeper understanding of commonsense stories | | | Generalized multi-view embedding for visual recognition and cross-modal retrieval | | | HellaSwag: Can a Machine Really Finish Your Sentence? | | | Improving machine reading comprehension with general reading strategies | | | Deconvolutional latent-variable model for text sequence matching | | | Neural text generation in stories using entity representations as context | | | What can you do with a rock? affordance extraction via word embeddings | | | Dynamic meta-embeddings for improved sentence representations | | | Character-level and multi-channel convolutional neural networks for large-scale authorship attribution | | | Let your photos talk: Generating narrative paragraph for photo stream via bidirectional attention recurrent neural networks | | | Bert has a mouth, and it must speak: Bert as a markov random field language model | | | Discourse-based objectives for fast unsupervised sentence representation learning | | | Learning to write with cooperative discriminators | | | Learning joint representations of videos and sentences with web image search | | | Embedding text in hyperbolic spaces | | | Probabilistic reasoning via deep learning: Neural association models | | | Connecting images and natural language | | | Dissent: Sentence representation learning from explicit discourse relations | | | Dream: A challenge data set and models for dialogue-based reading comprehension | | | Sort story: Sorting jumbled images and captions into stories | | | Creative writing with a machine in the loop: Case studies on slogans and stories | | | Grounding the neurobiology of language in first principles: The necessity of non-language-centric explanations for language comprehension | | | Socialiqa: Commonsense reasoning about social interactions | | | BoolQ: Exploring the surprising difficulty of natural yes/no questions | | | Probing for semantic evidence of composition by means of simple classification tasks | | | What makes a good conversation? how controllable attributes affect human judgments | | | Assessing composition in sentence vector representations | | | Neural task graphs: Generalizing to unseen tasks from a single video demonstration | | | Sceneskim: Searching and browsing movies using synchronized captions, scripts and plot summaries | | | Moviegraphs: Towards understanding human-centric situations from videos | | | Knowledge enhanced contextual word representations | | | Virtualhome: Simulating household activities via programs | | | Show, Reward and Tell: Automatic Generation of Narrative Paragraph From Photo Stream by Adversarial Training. | | | Biva: A very deep hierarchy of latent variables for generative modeling | | | Electra: Pre-training text encoders as discriminators rather than generators | | | Video summarization by learning deep side semantic embedding | | | Relevance of unsupervised metrics in task-oriented dialogue for evaluating natural language generation | | | Learning sentence representation with guidance of human attention | | | Learning visual storylines with skipping recurrent neural networks | | | Unsupervised visual-linguistic reference resolution in instructional videos | | | D3tw: Discriminative differentiable dynamic time warping for weakly supervised action alignment and segmentation | | | Idiom token classification using sentential distributed semantics | | | Big bird: Transformers for longer sequences | | | Explaining first impressions: Modeling, recognizing, and explaining apparent personality from videos | | | Bert for joint intent classification and slot filling | | | An empirical evaluation of visual question answering for novel objects | | | Broad context language modeling as reading comprehension | | | [C] Reducing BERT pre-training time from 3 days to 76 minutes | | | Ruse: Regressor using sentence embeddings for automatic machine translation evaluation | | | An Information-Theoretic Explanation of Adjective Ordering Preferences. | | | Discourse marker augmented network with reinforcement learning for natural language inference | | | Quickcut: An interactive tool for editing narrated video | | | Sentence similarity measures for fine-grained estimation of topical relevance in learner essays | | | Quoref: A reading comprehension dataset with questions requiring coreferential reasoning | | | Large scale language modeling: Converging on 40gb of text in four hours | | | Beyond caption to narrative: Video captioning with multiple sentences | | | Towards generalizable sentence embeddings | | | Large-scale cloze test dataset created by teachers | | | Compressive transformers for long-range sequence modelling | | | Unsupervised learning of sentence representations using convolutional neural networks | | | Computer vision and natural language processing: recent approaches in multimedia and robotics | | | Poly-encoders: Transformer architectures and pre-training strategies for fast and accurate multi-sentence scoring | | | Multimodal first impression analysis with deep residual networks | | | Representing sentences as low-rank subspaces | | | Multitask learning for cross-domain image captioning | | | Bert-dst: Scalable end-to-end dialogue state tracking with bidirectional encoder representations from transformer | | | Find and focus: Retrieve and localize video events with natural language queries | | | Story cloze evaluator: Vector space representation evaluation by predicting what happens next | | | Harnessing ai for augmenting creativity: Application to movie trailer creation | | | Don't Settle for Average, Go for the Max: Fuzzy Sets and Max-Pooled Word Vectors | | | SummCoder: An unsupervised framework for extractive text summarization based on deep auto-encoders | | | Aesthetic critiques generation for photos | | | Sample efficient text summarization using a single pre-trained transformer | | | Improving relation extraction by pre-trained language representations | | | DT_Team at SemEval-2017 Task 1: Semantic Similarity Using Alignments, Sentence-Level Embeddings and Gaussian Mixture Model Output | | | Real or fake? learning to discriminate machine from human generated text | | | Beyond canonical texts: A computational analysis of fanfiction | | | Towards text generation with adversarially learned neural outlines | | | Rethinking skip-thought: A neighborhood based approach | | | Extreme language model compression with optimal subwords and shared projections | | | A survey on deep learning methods for robot vision | | | Domain-specific language model pretraining for biomedical natural language processing | | | Full-GRU natural language video description for service robotics applications | | | Context mover's distance & barycenters: Optimal transport of contexts for building representations | | | Multimodal filtering of social media for temporal monitoring and event analysis | | | Fine-tuning pre-trained transformer language models to distantly supervised relation extraction | | | Inferlite: Simple universal sentence representations from natural language inference data | | | Deepdiary: Automatically captioning lifelogging image streams | | | Generating bags of words from the sums of their word embeddings | | | Video scene analysis: an overview and challenges on deep learning algorithms | | | Team papelo: Transformer networks at fever | | | Whodunnit? crime drama as a case for natural language understanding | | | DisSent: Learning sentence representations from explicit discourse relations | | | What's in a question: Using visual questions as a form of supervision | | | Plato: Pre-trained dialogue generation model with discrete latent variable | | | Learning to align the source code to the compiled object code | | | Understanding and improving transformer from a multi-particle dynamic system point of view | | | Dirichlet variational autoencoder for text modeling | | | Speech-based visual question answering | | | Longformer: The long-document transformer | | | AlBERTo: Italian BERT Language Understanding Model for NLP Challenging Tasks Based on Tweets. | | | Practical text classification with large pre-trained language models | | | Assessing social and intersectional biases in contextualized word representations | | | Integrating character networks for extracting narratives from multimodal data | | | Do Attention Heads in BERT Track Syntactic Dependencies? | | | Pun generation with surprise | | | An end-to-end approach to natural language object retrieval via context-aware deep reinforcement learning | | | Diachronic sense modeling with deep contextualized word embeddings: An ecological view | | | Learning universal sentence representations with mean-max attention autoencoder | | | Unsupervised domain adaptation of contextualized embeddings for sequence labeling | | | HiText: Text reading with dynamic salience marking | | | Trends in integration of vision and language research: A survey of tasks, datasets, and methods | | | Movienet: A holistic dataset for movie understanding | | | Incorporating structured commonsense knowledge in story completion | | | Generalization through memorization: Nearest neighbor language models | | | Hopfield networks is all you need | | | Multi-Granularity Self-Attention for Neural Machine Translation | | | A face-to-face neural conversation model | | | [C] Rare Words: A Major Problem for Contextualized Embeddings and How to Fix it by Attentive Mimicking. | | | Cause-Effect Knowledge Acquisition and Neural Association Model for Solving A Set of Winograd Schema Problems. | | | Scaling laws for neural language models | | | Punny captions: Witty wordplay in image descriptions | | | K-adapter: Infusing knowledge into pre-trained models with adapters | | | Parallel iterative edit models for local sequence transduction | | | Adversarial text generation without reinforcement learning | | | Deep multimodal embedding model for fine-grained sketch-based image retrieval | | | Sentence directed video object codiscovery | | | [C] The octopus approach to the Alexa competition: A deep ensemble-based socialbot | | | Trimming and improving skip-thought vectors | | | A neural multi-sequence alignment technique (neumatch) | | | Unilmv2: Pseudo-masked language models for unified language model pre-training | | | Don't Stop Pretraining: Adapt Language Models to Domains and Tasks | | | Embeddia at SemEval-2019 Task 6: Detecting hate with neural network and transfer learning approaches | | | Denoising based sequence-to-sequence pre-training for text generation | | | Memory Efficient Adaptive Optimization | | | Temporal event knowledge acquisition via identifying narratives | | | [C] Assessing the Benchmarking Capacity of Machine Reading Comprehension Datasets. | | | Journal of visual communication and image representation | | | Efficient training of bert by progressively stacking | | | Bond: Bert-assisted open-domain named entity recognition with distant supervision | | | [C] Evaluating Commonsense in Pre-Trained Language Models. | | | A coupled hidden conditional random field model for simultaneous face clustering and naming in videos | | | Mixed membership word embeddings for computational social science | | | Comparing character-level neural language models using a lexical decision task | | | AraBERT: Transformer-based model for Arabic language understanding | | | Deep sequential and structural neural models of compositionality | | | Adversarial training and decoding strategies for end-to-end neural conversation models | | | Sticking to the facts: Confident decoding for faithful data-to-text generation | | | On layer normalization in the transformer architecture | | | [C] CAiRE: An End-to-End Empathetic Chatbot. | | | Minilm: Deep self-attention distillation for task-agnostic compression of pre-trained transformers | | | Exploring asymmetric encoder-decoder structure for context-based sentence representation learning | | | Modelling sentence generation from sum of word embedding vectors as a mixed integer programming problem | | | [C] Do Not Have Enough Data? Deep Learning to the Rescue! | | | A transformer-based approach to irony and sarcasm detection | | | Regularized and retrofitted models for learning sentence representation with context | | | M-vad names: a dataset for video captioning with naming | | | Moviescope: Large-scale analysis of movies using multiple modalities | | | Nilc at cwi 2018: Exploring feature engineering and feature learning | | | Generating sentences using a dynamic canvas | | | [C] A cost-sensitive visual question-answer framework for mining a deep and-or object semantics from web images | | | Structbert: Incorporating language structures into pre-training for deep language understanding | | | Now you shake me: Towards automatic 4D cinema | | | BUT-FIT at SemEval-2019 Task 7: Determining the Rumour Stance with Pre-Trained Deep Bidirectional Transformers | | | TalkSumm: A Dataset and Scalable Annotation Method for Scientific Paper Summarization Based on Conference Talks | | | StereoSet: Measuring stereotypical bias in pretrained language models | | | Well-read students learn better: On the importance of pre-training compact models | | | Analysis of adapted films and stories based on social network | | | Imagebert: Cross-modal pre-training with large-scale weak-supervised image-text data | | | Deep state space models for unconditional word generation | | | Metric for automatic machine translation evaluation based on universal sentence representations | | | Large-scale pretraining for visual dialog: A simple state-of-the-art baseline | | | Content vs. function words: The view from distributional semantics | | | EigenSent: Spectral sentence embeddings using higher-order Dynamic Mode Decomposition | | | Attending to entities for better text understanding | | | Label-based automatic alignment of video with narrative sentences | | | Inducing brain-relevant bias in natural language processing models | | | Compressing large-scale transformer-based models: A case study on bert | | | ClassiNet--Predicting Missing Features for Short-Text Classification | | | Social bias frames: Reasoning about social and power implications of language | | | Time of your hate: The challenge of time in hate speech detection on social media | | | Recycling a pre-trained BERT encoder for neural machine translation | | | Span selection pre-training for question answering | | | Realtoxicityprompts: Evaluating neural toxic degeneration in language models | | | Unsupervised stylish image description generation via domain layer norm | | | Lexical semantic change analysis with contextualised word representations | | | How decoding strategies affect the verifiability of generated text | | | CLUECorpus2020: A Large-scale Chinese Corpus for Pre-trainingLanguage Model | | | Embedding strategies for specialized domains: Application to clinical entity recognition | | | Machine translation evaluation with BERT regressor | | | Unsupervised question decomposition for question answering | | | Deep bidirectional transformers for relation extraction without supervision | | | Weakly supervised learning of heterogeneous concepts in videos | | | Rethinking attention with performers | | | Recent advances in natural language inference: A survey of benchmarks, resources, and approaches | | | Have a chat with BERT; passage re-ranking using conversational context | | | Levels of hate in online environments | | | Prophetnet: Predicting future n-gram for sequence-to-sequence pre-training | | | CLER: Cross-task learning with expert representation to generalize reading and understanding | | | An Interactive tour guide for a heritage site | | | Improving sentence representations with multi-view frameworks | | | Aligning movies with scripts by exploiting temporal ordering constraints | | | Learning semantic sentence representations from visually grounded language without lexical knowledge | | | Emptransfo: A multi-head transformer architecture for creating empathetic dialog systems | | | Does it make sense? and why? a pilot study for sense making and explanation | | | Residual energy-based models for text generation | | | BLCU_NLP at SemEval-2019 Task 7: An Inference Chain-based GPT Model for Rumour Evaluation | | | Telling stories with soundtracks: an empirical analysis of music in film | | | Are Transformers universal approximators of sequence-to-sequence functions? | | | Finding structure in figurative language: Metaphor detection with topic-based frames | | | On the benefit of combining neural, statistical and external features for fake news identification | | | LinCE: A centralized benchmark for linguistic code-switching evaluation | | | Automatic evaluation of end-to-end dialog systems with adequacy-fluency metrics | | | Generating text through adversarial training using skip-thought vectors | | | Automatic identification of character types from film dialogs | | | Tod-bert: Pre-trained natural language understanding for task-oriented dialogues | | | A comparison of pre-trained vision-and-language models for multimodal representation learning across medical images and reports | | | Automated Labelling using an Attention model for Radiology reports of MRI scans (ALARM) | | | Determining relative argument specificity and stance for complex argumentative structures | | | Context is Key: Grammatical Error Detection with Contextual Word Representations | | | A Survey on Contextual Embeddings | | | Pretrained Transformers for Text Ranking: BERT and Beyond | | | Introduction to Deep Learning Business Applications for Developers | | | Multi-view sentence representation learning | | | Image to Text Conversion: State of the Art and Extended Work | | | A survey of word embeddings for clinical text | | | Go wide, then narrow: Efficient training of deep thin networks | | | Saagie at semeval-2019 task 5: From universal text embeddings and classical features to domain-specific text classification | | | A joint learning framework with BERT for spoken language understanding | | | Linking artificial and human neural representations of language | | | Computer-generated music for tabletop role-playing games | | | From trailers to storylines: An efficient way to learn from movies | | | [C] Syntactic properties of skip-thought vectors | | | Supervised and unsupervised neural approaches to text readability | | | [C] Text to 3D Scene Generation | | | Reweighted proximal pruning for large-scale language representation | | | SqueezeBERT: What can computer vision teach NLP about efficient neural networks? | | | Sentiment analysis is not solved! Assessing and probing sentiment classification | | | In SUPPLEMENTARY INFORMATION, on | | | Unsupervised Evaluation Metrics and Learning Criteria for Non-Parallel Textual Transfer | |



  title={Europarl: A parallel corpus for statistical machine translation},
  author={Koehn, Philipp},
  booktitle={MT summit},


Normalized (pointwise) mutual information in collocation extraction Original URL:

  title={Normalized (pointwise) mutual information in collocation extraction},
  author={Bouma, Gerlof},
  journal={Proceedings of GSCL},

Bouma's paper deals with the difficulties in using mutual information measures and pointwise mutual information measures by creating normalised variants with increased interpratablity and reduced occurence frequency. The paper also includes an empirical study into the impact of these new normalised variants.


Statistical machine translation Original URL:

  title={Statistical machine translation},
  author={Koehn, Philipp},
  publisher={Cambridge University Press}

This book on Machine Translation is a comprehensive look at using various machine-enabled language translation, with the EuroParl corpus involved in both the empirical and machine-analytical elements of the book.


KenLM: Faster and smaller language mode queries Original URL:

  title={KenLM: Faster and smaller language model queries},
  author={Heafield, Kenneth},
  booktitle={Proceedings of the sixth workshop on statistical machine translation},

The KenLM library implements both a PROBING and Trie data structure for increased performance in language translation. PROBING used linear probing hash tables and was considerably more performant than the competing SRILM model. Their new Trie structure was also more performant over a number of key indicators. This library can be found in the Moses, cdec, and Joshua translation systems.


BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network Original URL:

	title = "BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network",
	journal = "Artificial Intelligence",
	volume = "193",
	pages = "217 - 250",
	year = "2012",
	issn = "0004-3702",
	doi = "",
	url = "",
	author = "Roberto Navigli and Simone Paolo Ponzetto",
	keywords = "Knowledge acquisition, Word sense disambiguation, Graph algorithms, Semantic networks",

Navigli and Ponzetto present an automatic approach to constructing BabelNet, the large-scale wide-coverage multilingual semantic network. The paper highlights the use of machine translation, WordNet, Wikipedia, and their own in-vitro experiments on gold-standard datasets like the EuroParl corpus in creating what they claim is state-of-the-art results on three SemEval evaluation tasks.


Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation Original URL:

      title={Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation}, 
      author={Kyunghyun Cho and Bart van Merrienboer and Caglar Gulcehre and Dzmitry Bahdanau and Fethi Bougares and Holger Schwenk and Yoshua Bengio},

Through the use of two concurrent RNNs, Cho et al (2014) show an empirically more effective encoder-decoder in its capacity to reach a target sequence for a given source sequence in phrase translations. The paper also claims their model learns a semantically and syntactically meaningful representation of linguistic phrases during an operation.

Full list of found citations

| Title | URL | | | | | Learning phrase representations using RNN encoder-decoder for statistical machine translation | | | Routledge encyclopedia of translation studies | | | Statistical machine translation | | | BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network | | | KenLM: Faster and smaller language model queries | | | Normalized (pointwise) mutual information in collocation extraction | | | Findings of the 2014 workshop on statistical machine translation | | | The Sketch Engine: ten years on | | | The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages | | | PPDB: The paraphrase database | | | Learning word vectors for 157 languages | | | BabelNet: Building a very large multilingual semantic network | | | Wit3: Web inventory of transcribed and translated talks | | | langid. py: An off-the-shelf language identification tool | | | Intelligent selection of language model training data | | | Neural network methods for natural language processing | | | Scalable modified Kneser-Ney language model estimation | | | Statistical machine translation | | | Posterior regularization for structured latent variable models | | | Word lengths are optimized for efficient communication | | | Learning bilingual lexicons from monolingual corpora | | | Midge: Generating image descriptions from computer vision detections | | | Improved statistical machine translation using paraphrases | | | Meteor 1.3: Automatic metric for reliable optimization and evaluation of machine translation systems | | | Bilbowa: Fast bilingual distributed representations without word alignments | | | Unsupervised part-of-speech tagging with bilingual graph-based projections | | | Offline bilingual word vectors, orthogonal transformations and the inverted softmax | | | An autoencoder approach to learning bilingual word representations | | | Multilingual models for compositional distributed semantics | | | Apertium: a free/open-source platform for rule-based machine translation | |

Enron Emails


  title={The enron corpus: A new dataset for email classification research},
  author={Klimt, Bryan and Yang, Yiming},
  booktitle={European Conference on Machine Learning},


Communication networks from the Enron email corpus “It's always about the people. Enron is no different” Original URL:

  title={Communication networks from the Enron email corpus “It's always about the people. Enron is no different”},
  author={Diesner, Jana and Frantz, Terrill L and Carley, Kathleen M},
  journal={Computational \& Mathematical Organization Theory},

This paper by Diesner et al (2005) explores the Enron Email Corpus to understand and model the interactions of organisation members during a survival-endangering crisis. The paper studies the topography and characteristics of the social network as well as the properties and patterns of the communicative style employed. The end result is a deeper understanding of organisational failure.


The structure of information pathways in a social communication network Original URL:

author = {Kossinets, Gueorgi and Kleinberg, Jon and Watts, Duncan},
title = {The Structure of Information Pathways in a Social Communication Network},
year = {2008},
isbn = {9781605581934},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {},
doi = {10.1145/1401890.1401945},
booktitle = {Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining},
pages = {435–443},
numpages = {9},
keywords = {communication latency, strength of weak ties, social network},
location = {Las Vegas, Nevada, USA},
series = {KDD '08}

The Enron corpus formed part of Kossinets et al's (2008) study of the speed of information transfer between agents in a social network. They draw on the notion of vector-clocks from distributed computing to frame a temporal notion of 'distance' between nodes in a social network. Their studies developed new understanding in the connectivity of social networks showing they have a sparse backbone with highly-embedded edges and long-range bridges.


Community evolution in dynamic multi-mode networks Original URL:

author = {Tang, Lei and Liu, Huan and Zhang, Jianping and Nazeri, Zohreh},
title = {Community Evolution in Dynamic Multi-Mode Networks},
year = {2008},
isbn = {9781605581934},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {},
doi = {10.1145/1401890.1401972},
booktitle = {Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining},
pages = {677–685},
numpages = {9},
keywords = {dynamic heterogeneous network, community evolution, multi-mode networks, evolution, dynamic network analysis},
location = {Las Vegas, Nevada, USA},
series = {KDD '08}

The Enron Emails formed part of the real-world social networks studied by Tang et al (2008) in attempting to create an accurate model of social influencers, data shortages, and marketing needs within groups. They took these specific examples and formed a framework to which, they argued, they could apply to social groups more generally.


Adaptive Regularization of Weight Vectors Original URL:

 author = {Crammer, Koby and Kulesza, Alex and Dredze, Mark},
 booktitle = {Advances in Neural Information Processing Systems},
 editor = {Y. Bengio and D. Schuurmans and J. Lafferty and C. Williams and A. Culotta},
 pages = {414--422},
 publisher = {Curran Associates, Inc.},
 title = {Adaptive Regularization of Weight Vectors},
 url = {},
 volume = {22},
 year = {2009}

This paper introduces the Adaptive Regularization of Weight Vectors (AROW) model, an online learning algorithm with large margin training, confidence weighting, and tolerance for non-seperable data. The paper has been cited 310 times since its publish-date in 2009.


An extensive experimental comparison of methods for multi-label learning Original URL:

	title = "An extensive experimental comparison of methods for multi-label learning",
	journal = "Pattern Recognition",
	volume = "45",
	number = "9",
	pages = "3084 - 3104",
	year = "2012",
	note = "Best Papers of Iberian Conference on Pattern Recognition and Image Analysis (IbPRIA'2011)",
	issn = "0031-3203",
	doi = "",
	url = "",
	author = "Gjorgji Madjarov and Dragi Kocev and Dejan Gjorgjevikj and Sašo Džeroski",
	keywords = "Multi-label ranking, Multi-label classification, Comparison of multi-label learning methods",

The Enron Emails corpus was one of 11 benchmark datasets used by Madjarov et al (2012) to determine which of 12 multi-label learning methods should be considered the gold standard for future models. Across 16 evaluation methods, their paper posited that Random Forests of predictive clustering trees (RF-PCT) aand hierarchy of multi-label classifiers (HOMER) were, at the time, the benchmarks against which future models should be tested.

Full list of found citations

| Title | URL | | | | | An extensive experimental comparison of methods for multi-label learning | | | The structure of information pathways in a social communication network | | | Communication networks from the Enron email corpus “It's always about the people. Enron is no different” | | | Adaptive regularization of weight vectors | | | Community evolution in dynamic multi-mode networks | | | Automatic categorization of email into folders: Benchmark experiments on Enron and SRI corpora | | | Toward link predictability of complex networks | | | Contextual search and name disambiguation in email using graphs | | | Revisiting the nystrom method for improved large-scale machine learning | | | Spam filtering based on the analysis of text information embedded into images | | | Exploration of communication networks from the enron email corpus | | | Tracking recurring contexts using ensemble classifiers: an application to email filtering | | | SMS spam filtering: Methods and data | | | Tree ensembles for predicting structured outputs | | | Interacting meaningfully with machine learning systems: Three experiments | | | Spam: A shadow history of the Internet | | | Pattern mining in frequent dynamic subgraphs | | | Automated social hierarchy detection through email network analysis | | | Temporal and information flow based event detection from social text streams | | | Feature selection for multi-label classification using multivariate mutual information | |