Back to ARI overview or review the ARI Architecture.

ARI Formalization

ARI solves a structured prediction problem over a bipartite graph between source and target field paths. The objective is to infer a globally consistent mapping via maximum a posteriori (MAP) inference in a constrained graphical model.

Problem setup

Let the sets of source and target field paths be:

S = \{s_i\} \qquad T = \{t_j\}

A mapping is a binary relation $R$ :

R \subseteq S \times T

Define indicator variables:

x_{s,t} \in \{0,1\}, \quad (s,t) \in S \times T

where $x_{s,t}=1$ means that the source $s$ and target $t$ fields are compatible.

Then, equivalently, we can define $R$ as:

R = \{(s,t) \mid x_{s,t} = 1\}

MAP objective

ARI defines a Gibbs distribution over mappings:

P(R) \propto \exp\left(-E(R)\right)

and seeks the MAP solution:

R^* = \arg\min_R E(R)

Energy decomposition

The energy decomposes into unary and pairwise terms:

E(x) = \sum_{(s,t)\in S\times T} \psi_u(s,t)\,x_{s,t} + \sum_{(s,t)\neq(s',t')} \psi_p\big((s,t),(s',t')\big)\,x_{s,t}x_{s',t'}

where:

$\psi_u$ encodes local compatibility
$\psi_p$ encodes structural consistency

This corresponds to a pairwise Markov random field over candidate matches.

Feature representation

Each candidate pair is mapped to a feature vector:

\phi : S \times T \to \mathbb{R}^d

denoted by $\mathbf x_{s,t}$ for pair $(s,t)$ :

\mathbf{x}_{s,t} = \phi(s,t)

Typical features include:

Lexical similarity
Structural context
Ontology compatibility
Embedding similarity: $\phi_{\text{emb}}(s,t) = \langle f(s), g(t) \rangle$

Unary scoring

Unary potentials (scores) are parameterized as:

\psi_u(s,t) = -f_\theta(\mathbf{x}_{s,t})

Examples:

Linear: $f_\theta = w^\top \mathbf{x}$
Tree/MLP models (GBDT, neural scoring)

Candidate pruning retains:

C_k(s) = \operatorname{Top\text{-}k}_{f_\theta(\mathbf{x}_{s,t})} \{t \in C(s)\}

Pairwise / structured scoring

Pairwise potentials capture dependencies:

\psi_p((s,t),(s',t')) = -p_\theta((s,t),(s',t'))

Examples:

Cross-encoder: $p_\theta = h_\theta(s,t,s',t')$
Structured models (CRF / GNN): $p_\theta = \psi_\theta(\mathcal{G}, (s,t), (s',t'))$

These enforce:

Structural alignment
Co-occurrence patterns
Ontological consistency

Constrained optimization

The MAP problem can be written as an integer quadratic program.

Using $\psi_u(s,t) = -f_\theta(\mathbf{x}_{s,t})$ and $\psi_p((s,t),(s',t')) = -p_\theta((s,t),(s',t'))$ , the MAP objective

R^* = \arg\min_R E(R)

becomes (equivalently)

\max_{x\in\{0,1\}^{|S|\times|T|}} \; \sum_{(s,t)} f_\theta(\mathbf{x}_{s,t})\,x_{s,t} + \sum_{(s,t)\neq(s',t')} p_\theta\big((s,t),(s',t')\big)\,x_{s,t}x_{s',t'}

subject to:

One-to-one constraints

\sum_{t} x_{s,t} \le 1 \quad \forall s \qquad \sum_{s} x_{s,t} \le 1 \quad \forall t

Type / ontology constraints

x_{s,t} = 0 \quad \text{if } \text{incompatible}(s,t)

Structural constraints

Mutual exclusion: $x_{s,t} + x_{s',t'} \le 1$
Hierarchical consistency: $x_{s,t} \le x_{\text{parent}(s), \text{parent}(t)}$

This yields an ILP / quadratic optimization problem.

Solution

The optimal mapping is:

R^* = \{(s,t) \in C_k \mid x_{s,t} = 1\}

Training objective

The models are trained over heterogeneous datasets:

D = \alpha D_{\text{pre}} + \beta D_{\text{gold}} + \gamma D_{\text{feedback}} + \delta D_{\text{neg}}

Optimize:

\mathcal{L} = \lambda_1 \mathcal{L}_{\text{contrastive}} + \lambda_2 \mathcal{L}_{\text{hard-neg}} + \lambda_3 \mathcal{L}_{\text{ranking}}

References

Cohesive ARI Architecture