Invarra
Menu

Methodology

IPB Methodology

Publish enough to be credible. Protect enough to stay defensible.

Meaning-preserving variationExpected behavior contractCorrectness vs stabilityCoverage gatesFailure geometryPublic/private artifact separationWhat IPB publishesWhat IPB does not publishPublic non-claims

Meaning-preserving variation

The same semantic decision is expressed through controlled realizations that vary wording, wrapper, pressure, retrieval context, or workflow surface without changing the relevant expected behavior.

Expected behavior contract

Every scored unit declares what the system should have done before actual model behavior is classified.

Correctness vs stability

Correctness asks whether the behavior matched the contract. Stability asks whether that decision survived valid variation. A system can be stable and wrong.

Public report contents

What IPB publishes

  • Benchmark domain
  • Model versions
  • Corpus version
  • Expected behavior contract
  • Correctness metrics
  • Stability metrics
  • Coverage gates
  • Caveats
  • Public non-claims
  • Selected review-safe examples
  • Fingerprints where appropriate

Protected material

What IPB does not publish

  • Full private corpus libraries
  • Hidden generation machinery
  • Private client materials
  • Raw sensitive outputs
  • Operational secrets
  • Anything that allows benchmark overfitting or corpus leakage

Coverage gates and failure geometry

Coverage gates keep evaluator uncertainty separate from model behavior. Failure geometry preserves where decisions change: prompt form, pressure family, context source, workflow wrapper, or policy boundary.

What this does not claim

  • IPB is not a universal intelligence ranking.
  • IPB is not a claim that a model is globally safe.
  • IPB is not certification.
  • IPB does not replace legal, regulatory, security, medical, financial, or compliance review.
  • IPB results are scoped to the declared domain, protocol version, corpus version, model/system identity, and runtime settings.
  • Stable behavior is not automatically good behavior; stable-wrong behavior is a failure.
  • Public samples do not disclose future test material.