Can Synthetic Data Make Event-Relation Systems Trainable? Answers from SEMMES 2023

May 29, 2023 • Pasquale Lisena

Research

Synthetic data can improve event-relation training up to 116%.

Can Data Augmentation Train Event-Relation Systems?

Recognising events in text is only part of the challenge. To understand a narrative, a system also needs to identify how events relate: does one cause, enable, prevent, or intend another?

Training such systems is difficult because annotated examples are scarce, particularly for less common relations such as prevention and intention. In this paper, we explore whether prompt-based data augmentation can help.

We use GPT-3 to generate synthetic event-relation examples, validate them manually, and extend a small, imbalanced dataset with over 1,500 new sentences. Models are then trained on this augmented resource and evaluated on held-out, human-authored data.

The results show substantial gains, especially for previously underrepresented relations:

Event relation	F1-score, original data	F1-score, augmented data
Cause	0.72	0.75
Enable	0.71	0.95
Intend	0.44	0.95
Prevent	0.64	0.94

These findings suggest that carefully generated and validated synthetic data can be a practical route to training event-relation systems when manually annotated data are limited.