Methodology

IPB Methodology

Publish enough to be credible. Protect enough to stay defensible.

Meaning-preserving variationExpected behavior contractCorrectness vs stabilityCoverage gatesFailure geometryPublic/private artifact separationWhat IPB publishesWhat IPB does not publishPublic non-claims

Meaning-preserving variation

The same semantic decision is expressed through controlled realizations that vary wording, wrapper, pressure, retrieval context, or workflow surface without changing the relevant expected behavior.

Expected behavior contract

Every scored unit declares what the system should have done before actual model behavior is classified.

Correctness vs stability

Correctness asks whether the behavior matched the contract. Stability asks whether that decision survived valid variation. A system can be stable and wrong.

Public report contents

What IPB publishes

Benchmark domain
Model versions
Corpus version
Expected behavior contract
Correctness metrics
Stability metrics
Coverage gates
Caveats
Public non-claims
Selected review-safe examples
Fingerprints where appropriate

Protected material

What IPB does not publish

Full private corpus libraries
Hidden generation machinery
Private client materials
Raw sensitive outputs
Operational secrets
Anything that allows benchmark overfitting or corpus leakage

Coverage gates and failure geometry

Coverage gates keep evaluator uncertainty separate from model behavior. Failure geometry preserves where decisions change: prompt form, pressure family, context source, workflow wrapper, or policy boundary.

What this does not claim

IPB is not a universal intelligence ranking.
IPB is not a claim that a model is globally safe.
IPB is not certification.
IPB does not replace legal, regulatory, security, medical, financial, or compliance review.
IPB results are scoped to the declared domain, protocol version, corpus version, model/system identity, and runtime settings.
Stable behavior is not automatically good behavior; stable-wrong behavior is a failure.
Public samples do not disclose future test material.