Synthetic data can improve event-relation training up to 116%.
Can Data Augmentation Train Event-Relation Systems?
Recognising events in text is only part of the challenge. To understand a narrative, a system also needs to identify how events relate: does one cause, enable, prevent, or intend another?
Training such systems is difficult because annotated examples are scarce, particularly for less common relations such as prevention and intention. In this paper, we explore whether prompt-based data augmentation can help.
We use GPT-3 to generate synthetic event-relation examples, validate them manually, and extend a small, imbalanced dataset with over 1,500 new sentences. Models are then trained on this augmented resource and evaluated on held-out, human-authored data.
The results show substantial gains, especially for previously underrepresented relations:
| Event relation | F1-score, original data | F1-score, augmented data |
|---|---|---|
| Cause | 0.72 | 0.75 |
| Enable | 0.71 | 0.95 |
| Intend | 0.44 | 0.95 |
| Prevent | 0.64 | 0.94 |
These findings suggest that carefully generated and validated synthetic data can be a practical route to training event-relation systems when manually annotated data are limited.