Bar-JEPA: Extracting Values from Bar Chart with Joint-Embedding Predictive Architecture

Poonam Poonam*

Ulm University

Alexander Epple*

Ulm University

Timo Ropinski

Ulm University

International Conference on Document Analysis and Recognition 2026

* authors contributed equally

Abstract

Bar charts are commonly used in data visualization, and while they are easily understood by humans, it is non-trivial to extract the underlying data computationally. For a machine-learning-based ap- proach, training chart de-rendering models usually requires labeled, real- world data. Labeling data is a time consuming task, which is why an- notated data is scarce. Models can learn more efficiently when provided with features of high semantic quality, which a joint-embedding predic- tive architecture (JEPA) is designed to learn in a self-supervised manner. We present a per-bar, numerical value recovery pipeline for bar charts, where a JEPA encoder is used to produce semantically rich latent fea- tures. The decoder model consuming these features is simple and quick to train and outputs the coordinates of ticks and bars, which can be used to recover bar values. The effectiveness of self-supervised finetuning and quality of the extracted features is evident when comparing our model to end-to-end supervised baselines.

BibTeX

@inproceedings{poonam2024bar-jepa,
	title={Bar-JEPA: Extracting Values from Bar Chart with Joint-Embedding Predictive Architecture},
	author={Poonam, Poonam and Epple, Alexander and Ropinski, Timo},
	year={2026}
}