Hidden Naming Contracts in SWE-Agent Benchmarks

AI coding benchmarks now influence research priorities, product strategy, and engineering adoption decisions. Over the last year, SWE-bench has become a key benchmark for evaluating AI coding agents, and that momentum has pushed the community to build additional SWE-bench-style benchmarks beyond Python.

As I have explored in earlier analyses on single-file saturation and difficulty distribution, benchmark scores are only as trustworthy as the instances behind them. In this post, I look at a different failure mode: hidden naming contracts.

A hidden naming contract appears when benchmark tests require specific identifiers introduced in the reference solution, even though those names were never made explicit in the issue text. In that setting, an agent can produce a behaviorally correct fix and still be graded as wrong because it chose a different symbol name.

The Core Failure Mode

A typical SWE-bench-style instance has four parts:

an issue description
a base repository snapshot
a reference solution
executable tests

The intended contract is behavioral correctness: if a submitted patch fixes the issue, the tests should pass.

The problem starts when tests directly call symbols that were newly introduced in the reference solution. Evaluation then requires not only solving the behavior, but also reproducing a naming choice that may never have been stated anywhere.

A Concrete Example

Consider scikit-learn__scikit-learn-12682 from SWE-bench_Verified.

The issue reports that SparseCoder does not expose max_iter for Lasso, which leads to convergence warnings. The name transform_max_iter is not mentioned in the issue and does not exist elsewhere in the codebase at the base commit.

The reference patch introduces a new transform_max_iter parameter:

# sklearn/decomposition/dict_learning.py
class SparseCoder(BaseEstimator, SparseCodingMixin):
    def __init__(self, dictionary, transform_algorithm='omp',
                 transform_n_nonzero_coefs=None, transform_alpha=None,
                 split_sign=False, n_jobs=None, positive_code=False,
                 transform_max_iter=1000):
        self._set_sparse_coding_params(..., transform_max_iter)

The test patch then calls that exact parameter name:

def test_max_iter():
    with pytest.warns(ConvergenceWarning):
        model = SparseCoder(
            D_multi,
            transform_algorithm=transform_algorithm,
            transform_max_iter=1,
        )
        model.fit_transform(X)

    with pytest.warns(None) as record:
        model = SparseCoder(
            D_multi,
            transform_algorithm=transform_algorithm,
            transform_max_iter=2000,
        )
        model.fit_transform(X)

An agent could implement the same functionality with lasso_max_iter, max_transform_iterations, or another reasonable name and still fail evaluation. The behavioral fix is there, but the hidden naming contract is not satisfied.

What the Scan Found

I analyzed six SWE-bench-style datasets:

In a screening pass across all 7,567 instances, I found 2,167 instances (28.6%) where tests reference symbols newly introduced in the reference solution.

That 28.6% number is a screening signal, not a final estimate of confirmed false negatives. Some coupled symbols are fair because the name is explicit in the issue text or already established in the codebase. To isolate the highest-risk cases, I ran two refinement checks:

Is the symbol name mentioned in the issue text?
Does the symbol already exist elsewhere in the repository?

I treat the intersection as high-risk coupling: symbols that are neither mentioned in the issue nor present in the codebase.

High-Risk Coupling by Benchmark

Benchmark	Total Instances	Coupled Instances	High-Risk Coupling (% of Total)
SWE-bench/SWE-bench	2,294	574	34 (1.5%)
SWE-bench/SWE-bench_Verified	500	87	4 (0.8%)
SWE-bench/SWE-bench_Multilingual	300	15	0 (0.0%)
ScaleAI/SWE-bench_Pro	731	363	80 (10.9%)
ByteDance-Seed/Multi-SWE-bench	1,632	593	89 (5.5%)
AmazonScience/SWE-PolyBench	2,110	535	68 (3.2%)

The main takeaway is that this problem is concentrated rather than uniform.

SWE-bench_Pro is the clear outlier at 10.9% high-risk coupling.
Multi-SWE-bench is meaningfully elevated at 5.5%.
SWE-bench_Verified and SWE-bench_Multilingual are much cleaner on this specific failure mode.

That pattern suggests curation helps, but it does not eliminate the issue.

Where the Risk Concentrates

The high-risk subset becomes easier to interpret if we also look at the two refinement signals separately:

Benchmark	Coupled	None Mentioned in Issue	None Exist in Codebase	High-Risk
SWE-bench/SWE-bench	574	233	38	34
SWE-bench/SWE-bench_Verified	87	32	4	4
SWE-bench/SWE-bench_Multilingual	15	9	0	0
ScaleAI/SWE-bench_Pro	363	210	92	80
ByteDance-Seed/Multi-SWE-bench	593	296	104	89
AmazonScience/SWE-PolyBench	535	280	73	68

Two things stand out.

First, many coupled instances provide no naming hint in the issue text. For most datasets, roughly 40% to 60% of coupled instances fall into that bucket.

Second, codebase priors help in many cases, but not equally across benchmarks. In SWE-bench_Verified, most coupled symbols already exist somewhere else in the repository, which gives agents a chance to infer the naming pattern through exploration. In SWE-bench_Pro and Multi-SWE-bench, that safety net is much weaker.

This is why the raw coupling rate and the high-risk rate both matter. The raw rate tells us how often tests are structurally tied to reference naming. The high-risk rate tells us where that coupling is most likely to generate false negatives.

Why This Matters for Leaderboards

1. Scores can move at the margin

In SWE-bench_Pro, high-risk coupling appears in 10.9% of all instances. If only a fraction of those behave as false negatives in practice, that is still enough to shift scores by one to several points, which can reorder closely clustered systems.

2. Cross-benchmark comparisons get noisier

A model may appear to improve or regress partly because one benchmark family embeds more hidden naming contracts than another. That makes benchmark-to-benchmark comparisons less clean than leaderboard tables suggest.

3. Curation helps, but one clean metric is not the whole story

The low rates in SWE-bench_Verified and SWE-bench_Multilingual are encouraging for this particular failure mode. But low hidden-contract rates should not be read as a blanket validation of a benchmark’s frontier-tracking quality. Benchmarks can still have other issues, including overly narrow tests, overly wide tests, or contamination.

Recent benchmark audits make the same broader point from a different direction. My analysis is narrower: it is a scalable programmatic scan for one subtype of narrow-test risk.

What Benchmark Maintainers Should Change

1. Prefer behavior-first assertions where feasible

Tests should verify the intended behavior, not accidentally require the exact reference implementation.

2. Make naming requirements explicit when they are part of the task

Some fixes genuinely require a specific API, method name, or parameter name for compatibility reasons. In those cases, the naming requirement should be written into the issue statement rather than hidden in the tests.

3. Publish diagnostic metadata alongside headline scores

Leaderboards should ideally report not just one aggregate score, but also benchmark diagnostics such as coupling prevalence, high-risk hidden-contract rates, and whether a score was computed on a filtered subset that excludes known problematic instances.

4. Keep a human review loop in benchmark maintenance

The cleanest benchmarks in this analysis are also the ones with stronger curation signals. That supports a practical maintenance rule: programmatic scans can pre-screen risky instances, but human review is still needed before release.

Conclusion

Software engineering allows multiple valid implementations, and evaluations should reflect that reality. A benchmark should reward behaviorally correct fixes, not force agents to rediscover one unstated naming choice from a hidden reference patch.

Hidden naming contracts are not the entire benchmark-reliability story, but they are a measurable and actionable part of it. If we want benchmark scores to be more robust, one straightforward step is to identify and remove instances where tests silently depend on names that the task never actually required.

A coding agent should not be graded as wrong for solving the right problem with the wrong unstated identifier.

Methodology

The analysis uses a simple pipeline:

Parse gold patches and extract symbols introduced on added lines.
Parse test patches and look for references to those symbols.
Filter out overly generic names that are likely to match coincidentally.
For coupled instances, check whether the symbol appears in the issue text.
For coupled instances, check whether the symbol already exists elsewhere in the repository at the base commit.

The symbol extraction step uses language-specific patterns for Python, Java, JavaScript or TypeScript, Go, Rust, and C or C++.

This post reports filtered counts throughout. The filtering step removes names that are so generic that a match is probably accidental rather than evidence of a real hidden naming contract.

Filtering details (click to expand)

Generic variable names: result, data, value, output, input, response, item, obj, args, kwargs, etc.
Single-letter identifiers: x, y, i, j, k, n, etc.
Common method names: get, set, add, remove, create, run, execute, parse, read, write, load, save, etc.
Common class names: Base, Error, Exception, Handler, Manager, Factory, Config, etc.
Test infrastructure names: test, setUp, tearDown, fixture, mock, patch, and any symbol matching test-naming patterns (test_*, *Test, mock_*, fake_*, stub_*)
Placeholder names: foo, bar, baz, qux
Built-ins: True, False, None, null, main, name, type, id

The high-risk subset reported above is the intersection of the two refinement checks: symbols not mentioned in the issue text and not present in the codebase.

Appendix: High-Risk Instances by Benchmark

The following tables list the instances in the high-risk intersection.

SWE-bench/SWE-bench

34 instances with high-risk coupling (click to expand)

Instance ID	Coupled Symbols
django__django-11389	`get_session_cookie_age`
django__django-11742	`choice_max_length`
django__django-13250	`supports_json_field_contains`
django__django-13350	`upload_interrupted`
django__django-13722	`get_formset_kwargs`
django__django-14430	`empty_aggregate_value`
django__django-14559	`rows_updated`
django__django-14725	`edit_only`
django__django-14894	`empty_result_set_value`
django__django-15031	`list_separator`
django__django-15108	`OrderByList`
django__django-16302	`supports_unlimited_charfield`
django__django-16369	`get_languages_for_item`
django__django-16514	`get_log_entries`
django__django-16883	`normalize_table_name`
django__django-7188	`BaseAuthConfig`
matplotlib__matplotlib-13908	`remove_overlapping_locs`, `get_remove_overlapping_locs`
matplotlib__matplotlib-18869	`_parse_to_version_info`
matplotlib__matplotlib-25746	`labelfontfamily`
psf__requests-4356	`InvalidProxyURL`
psf__requests-4718	`should_strip_auth`
pydata__xarray-4759	`maybe_coerce_to_str`
pylint-dev__pylint-4421	`get_numversion_from_version`
pylint-dev__pylint-4604	`IS_PYPY`
pylint-dev__pylint-5839	`DELETED_MESSAGES`
pytest-dev__pytest-8124	`pytest_markeval_namespace`
scikit-learn__scikit-learn-12682	`transform_max_iter`
scikit-learn__scikit-learn-14806	`skip_complete`
scikit-learn__scikit-learn-14898	`neg_brier_score`
sphinx-doc__sphinx-7593	`KeyboardTransform`
sphinx-doc__sphinx-8026	`docpath`
sphinx-doc__sphinx-8095	`napoleon_preprocess_types`
sphinx-doc__sphinx-8291	`napoleon_attr_annotations`
sympy__sympy-11818	`from_real`

SWE-bench/SWE-bench_Verified

4 instances with high-risk coupling (click to expand)

Instance ID	Coupled Symbols
django__django-14559	`rows_updated`
django__django-14725	`edit_only`
pylint-dev__pylint-4604	`IS_PYPY`
scikit-learn__scikit-learn-12682	`transform_max_iter`

ByteDance-Seed/Multi-SWE-bench

89 instances with high-risk coupling (click to expand)

Instance ID	Coupled Symbols
BurntSushi__ripgrep-2610	`hyperlink`
BurntSushi__ripgrep-723	`line_number_width`
anuraghazra__github-readme-stats-117	`ONE_DAY`, `THIRTY_MINUTES`, `CONSTANTS`
anuraghazra__github-readme-stats-293	`defaultTitle`, `customTitle`
anuraghazra__github-readme-stats-58	`retryer`, `fetcher`
clap-rs__clap-2008	`before_long_help`, `after_long_help`
clap-rs__clap-2360	`forbid_empty_values`
clap-rs__clap-3453	`get_id`
clap-rs__clap-3990	`external_subcommand_value_parser`
clap-rs__clap-4080	`ids`
cli__cli-10139	`transformSecurityAndAnalysisOpts`, `SecurityAndAnalysisStatus`, `SecurityAndAnalysisInput`
cli__cli-1155	`ErrNotOnAnyBranch`
cli__cli-1279	`StatusStringResponse`, `HTTPError`, `httpErr`
cli__cli-1282	`listURLWithQuery`, `filterOptions`
cli__cli-1534	`runPager`
cli__cli-1639	`SetNeverPrompt`
cli__cli-1867	`LabelsByNames`
cli__cli-2034	`GistOwner`
cli__cli-2058	`HostnameValidator`
cli__cli-2138	`validateConfigEntry`
cli__cli-2221	`generateChecksumFromAssets`, `generateChecksum`
cli__cli-2224	`mergeMethodSurvey`
cli__cli-2997	`getFilesToAdd`
cli__cli-3490	`getExpansion`
cli__cli-3578	`detectEmptyFiles`
cli__cli-3833	`NewCmdCancel`, `CancelOptions`, `runCancel`
cli__cli-3898	`AddOriginRemote`
cli__cli-3992	`browserLauncher`
cli__cli-4146	`ttySize`, `ForceTerminal`
cli__cli-4416	`deleteAssetRun`, `DeleteAssetOptions`, `NewCmdDeleteAsset`
cli__cli-4543	`addPage`
cli__cli-4562	`normalizeRepoName`
cli__cli-5069	`CheckContext`, `eliminateDuplicates`
cli__cli-5108	`RepoSearchParameters`, `GetCodespaceRepoSuggestions`
cli__cli-5462	`ColorFromRGB`
cli__cli-5681	`SetAlternateScreenBufferEnabled`, `StartAlternateScreenBuffer`, `StopAlternateScreenBuffer`
cli__cli-5799	`artifactsPayload`
cli__cli-6074	`changedFilesNames`
cli__cli-6158	`DefaultFilterBySimilarityOpts`, `FilterBySimilarity`, `LevenshteinDistance`, `ListRepos`, `cands`, `FilterBySimilarityOpts`
cli__cli-6292	`PrCheckStatusSummaryWithColor`
cli__cli-667	`prStateTitleWithColor`, `issueStateTitleWithColor`
cli__cli-7205	`RemoveDiacritics`, `LatinMatchingFilter`
cli__cli-727	`parseCloneArgs`
cli__cli-7314	`RepoExists`
cli__cli-7477	`sanitizeFileName`
cli__cli-7866	`PendingError`
cli__cli-810	`formatRemoteURL`
cli__cli-842	`StubRepoResponseWithDefaultBranch`
cli__cli-857	`prReopenCmd`
cli__cli-8934	`FormatSlice`
cli__cli-9008	`simplifyURL`
cli__cli-970	`ExpandAlias`
cli__cli-9933	`ErrExtensionExecutableNotFound`
elastic__logstash-13825	`getMandatoryJvmOptions`
elastic__logstash-14058	`getDroppedEvents`
facebook__zstd-1080	`ZSTD_getFrameHeader_advanced`
facebook__zstd-1105	`ZSTD_CCtx_getParameter`
facebook__zstd-1107	`ZSTD_CCtx_resetParameters`
facebook__zstd-1532	`ZSTD_CCtxParams_setParameter`, `ZSTD_CCtxParams_getParameter`
facebook__zstd-1540	`RETURN_ERROR_IF_MSG`
facebook__zstd-1733	`ZSTD_SRCSIZEHINT_MAX`, `ZSTD_SRCSIZEHINT_MIN`, `ZSTD_c_srcSizeHint`
facebook__zstd-2094	`ZSTD_d_stableOutBuffer`
facebook__zstd-3530	`ZSTD_CCtx_setParams`, `ZSTD_CCtx_setFParams`
fasterxml__jackson-core-964	`setStreamReadConstraints`
fmtlib__fmt-1361	`compute_float_boundaries`
fmtlib__fmt-3279	`is_container_adaptor_like`
grpc__grpc-go-2744	`appendH2ToNextProtos`
iamkun__dayjs-1047	`localeNameRegex`
iamkun__dayjs-379	`weekStart`
mui__material-ui-26173	`isOptionEqualToValue`
mui__material-ui-29954	`inheritViewBox`
mui__material-ui-34131	`excludeVariablesFromRoot`
mui__material-ui-36399	`unstable_level`
mui__material-ui-37118	`getItemAsString`
nlohmann__json-1314	`error_handler_t`
nlohmann__json-2225	`NLOHMANN_DEFINE_TYPE_INTRUSIVE`, `NLOHMANN_DEFINE_TYPE_NON_INTRUSIVE`
nlohmann__json-3523	`value_in_range_of`
nlohmann__json-3605	`JSON_USE_GLOBAL_UDLS`
nlohmann__json-3663	`is_c_string`
nushell__nushell-12118	`xdg_config_empty`
ponylang__ponyc-2865	`divmod_partial`, `add_partial`
ponylang__ponyc-3293	`NullablePointer`
tokio-rs__tokio-5200	`auto_advance`, `set_auto_advance`
tokio-rs__tokio-6280	`try_join_next`, `try_join_next_with_id`
zeromicro__go-zero-1907	`WithStreamClientInterceptor`
zeromicro__go-zero-1964	`PrintRoutes`
zeromicro__go-zero-2363	`DontTracingSpanName`
zeromicro__go-zero-964	`NewPublisherWithAuth`, `NewRpcPubServerWithEtcdAuth`, `KeepAliveWithAuth`, `getClusterWithAuth`, `EnableAuth`
zeromicro__go-zero-990	`ReadLink`

ScaleAI/SWE-bench_Pro

80 instances with high-risk coupling (click to expand)

Instance ID	Coupled Symbols
instance_ansible__ansible-106909db8b730480615f4a33de0eb5b710944e78-v0f01c69f1e2528b935359cfe578530722bca2c59	`multipart_encoding`
instance_ansible__ansible-185d41031660a676c43fbb781cd1335902024bfe-vba6da65a0f3baefda7a058ebbd0a8dcafb8512f5	`host_label`
instance_ansible__ansible-29aea9ff3466e4cd2ed00524b9e56738d568ce8b-vba6da65a0f3baefda7a058ebbd0a8dcafb8512f5	`trailing_separator`, `default_value_name`
instance_ansible__ansible-415e08c2970757472314e515cb63a51ad825c45e-v7eee2454f617569fd6889f2211f75bc02a35f9f8	`get_best_parsable_locale`
instance_ansible__ansible-42355d181a11b51ebfc56f6f4b3d9c74e01cb13b-v1055803c3a812189a1133297f7f5468579283f86	`get_delegated_vars_and_hostname`
instance_ansible__ansible-502270c804c33d3bc963930dc85e0f4ca359674d-v7eee2454f617569fd6889f2211f75bc02a35f9f8	`BaseStrategy`
instance_ansible__ansible-be2c376ab87e3e872ca21697508f12c6909cf85a-vba6da65a0f3baefda7a058ebbd0a8dcafb8512f5	`_build_doc`
instance_ansible__ansible-cd9c4eb5a6b2bfaf4a6709f001ce3d0c92c1eed2-v0f01c69f1e2528b935359cfe578530722bca2c59	`get_sysinfo_facts`
instance_ansible__ansible-e64c6c1ca50d7d26a8e7747d8eb87642e767cd74-v0f01c69f1e2528b935359cfe578530722bca2c59	`_valid_time_stamp`
instance_ansible__ansible-f86c58e2d235d8b96029d102c71ee2dfafd57997-v0f01c69f1e2528b935359cfe578530722bca2c59	`_replace_stderr_clixml`
instance_element-hq__element-web-1077729a19c0ce902e713cf6fab42c91fb7907f1-vnan	`getLastSelectedRoomIdForSpace`
instance_element-hq__element-web-33e8edb3d508d6eefb354819ca693b7accc695e7	`isKeyComboMatch`
instance_element-hq__element-web-41dfec20bfe9b62cddbbbf621bef2e9aa9685157-vnan	`delegatedAuthentication`
instance_element-hq__element-web-53b42e321777a598aaf2bb3eab22d710569f83a8-vnan	`RoomOptionsMenu`
instance_element-hq__element-web-772df3021201d9c73835a626df8dcb6334ad9a3e-vnan	`setSelectedDeviceIds`, `selectedDeviceIds`
instance_element-hq__element-web-cf3c899dd1f221aa1a1f4c5a80dffc05b9c21c85-vnan	`getLiveness`
instance_flipt-io__flipt-2ce8a0331e8a8f63f2c1b555db8277ffe5aa2e63	`preFliptAcceptServerVersion`, `FliptAcceptServerVersionFromContext`, `FliptAcceptServerVersionUnaryInterceptor`
instance_flipt-io__flipt-36e62baffae2132f78f9d34dc300a9baa2d7ae0e	`getTraceExporter`
instance_flipt-io__flipt-a0cbc0cb65ae601270bdbe3f5313e2dfd49c80e4	`envsubst`
instance_flipt-io__flipt-a42d38a1bb1df267c53d9d4a706cf34825ae3da9	`AuthenticationSessionCSRF`
instance_flipt-io__flipt-b6cef5cdc0daff3ee99e5974ed60a1dc6b4b0d67	`ErrorHandler`
instance_flipt-io__flipt-c8d71ad7ea98d97546f01cce4ccb451dbcf37d3b	`SnapshotFromFS`, `Unwrap`
instance_flipt-io__flipt-cd2f3b0a9d4d8b8a6d3d56afab65851ecdc408e8	`validateArrayValue`
instance_flipt-io__flipt-e91615cf07966da41756017a7d571f9fc0fdbe80	`NewExporter`, `NewImporter`
instance_flipt-io__flipt-f36bd61fb1cee4669de1f00e59da462bfeae8765	`NewFeaturesValidator`
instance_future-architect__vuls-2923cbc645fbc7a37d50398eb2ab8febda8c3264	`rhelRebuildOSVersionToRHEL`
instance_future-architect__vuls-36456cb151894964ba1683ce7da5c35ada789970	`searchCache`
instance_future-architect__vuls-73f0adad95c4d227e2ccfa876c85cc95dd065e13	`GetCveContentTypes`
instance_future-architect__vuls-83bcca6e669ba2e4102f26c4a2b52f78c7861f1a	`listenIPPorts`
instance_future-architect__vuls-8d5ea98e50cf616847f4e5a2df300395d1f719e9	`removeInactives`
instance_future-architect__vuls-e4728e388120b311c4ed469e4f942e0347a2689b-v264a82e2f4818e30f5a25e4da53b27ba119f62b5	`CompareSeverity`
instance_gravitational__teleport-0ecf31de0e98b272a6a2610abe1bbedd379a38a3-vce94f93ad1030e3136852817f2423c1b3ac37bc4	`NotifyExit`
instance_gravitational__teleport-2bb3bbbd8aff1164a2353381cb79e1dc93b90d28-vee9b09fb20c43af7e520f57e9239bbcf46b7113d	`billingMode`
instance_gravitational__teleport-326fd1d7be87b03998dbc53bc706fdef90f5065c-v626ec2a48416b10a88641359a169d99e935ff037	`homeEnvVar`
instance_gravitational__teleport-82185f232ae8974258397e121b3bc2ed0c3729ed-v626ec2a48416b10a88641359a169d99e935ff037	`buildKubeConfigUpdate`
instance_gravitational__teleport-baeb2697c4e4870c9850ff0cd5c7a2d08e1401c9-vee9b09fb20c43af7e520f57e9239bbcf46b7113d	`yubiHSMTestConfig`, `gcpKMSTestConfig`, `HSMTestConfig`, `awsKMSTestConfig`, `softHSMTestConfig`, `cloudHSMTestConfig`
instance_gravitational__teleport-bb69574e02bd62e5ccd3cebb25e1c992641afb2a	`LiteralNamespace`
instance_gravitational__teleport-eefac60a350930e5f295f94a2d55b94c1988c04e-vee9b09fb20c43af7e520f57e9239bbcf46b7113d	`ParseOSReleaseFromReader`, `DMIInfoFromFS`
instance_internetarchive__openlibrary-0d13e6b4bf80bced6c0946b969b9a1b6963f6bce-v76304ecdb3a5954fcf13feb710e8c40fcf24b73c	`remove_author_honorifics`
instance_internetarchive__openlibrary-3aeec6afed9198d734b7ee1293f03ca94ff970e1-v13642507b4fc1f8d234172bf8129942da2c2ca26	`_get_wikipedia_link`, `_get_statement_values`
instance_internetarchive__openlibrary-431442c92887a3aece3f8aa771dd029738a80eb1-v76304ecdb3a5954fcf13feb710e8c40fcf24b73c	`luqum_replace_child`
instance_internetarchive__openlibrary-4b7ea2977be2747496ba792a678940baa985f7ea-v0f5aece3601a5b4419f7ccec1dbda2071be28ee4	`AuthorRemoteIdConflictError`
instance_internetarchive__openlibrary-5de7de19211e71b29b2f2ba3b1dff2fe065d660f-v08d8e8889ec945ab821fb156c04c7d2e2810debb	`is_valid_identifier`, `get_identifier_forms`, `get_isbn_or_asin`
instance_internetarchive__openlibrary-72321288ea790a3ace9e36f1c05b68c93f7eec43-v0f5aece3601a5b4419f7ccec1dbda2071be28ee4	`luqum_replace_field`
instance_internetarchive__openlibrary-91efee627df01e32007abf2d6ebf73f9d9053076-vbee42ad1b72fb23c6a1c874868a720b370983ed2	`within_date_range`
instance_internetarchive__openlibrary-c4eebe6677acc4629cb541a98d5e91311444f5d4-v13642507b4fc1f8d234172bf8129942da2c2ca26	`find_staged_or_pending`
instance_internetarchive__openlibrary-d40ec88713dc95ea791b252f92d2f7b75e107440-v13642507b4fc1f8d234172bf8129942da2c2ca26	`author_import_record_to_author`, `import_record_to_edition`, `check_cover_url_host`
instance_internetarchive__openlibrary-d8162c226a9d576f094dc1830c4c1ffd0be2dd17-v76304ecdb3a5954fcf13feb710e8c40fcf24b73c	`get_non_isbn_asin`, `is_asin_only`
instance_navidrome__navidrome-1e96b858a91c640fe64e84c5e5ad8cc0954ea38d	`validateCredentials`
instance_navidrome__navidrome-28389fb05e1523564dfc61fa43ed8eb8a10f938c	`IsValidPlaylist`
instance_navidrome__navidrome-31799662706fedddf5bcc1a76b50409d1f91d327	`tokenFromHeader`
instance_navidrome__navidrome-69e0a266f48bae24a11312e9efbe495a337e4c84	`DecodeArtworkID`, `EncodeArtworkID`
instance_navidrome__navidrome-874b17b8f614056df0ef021b5d4f977341084185	`validatePasswordChange`
instance_navidrome__navidrome-9c3b4561652a15846993d477003e111f0df0c585	`CRLFWriter`
instance_navidrome__navidrome-b3980532237e57ab15b2b93c49d5cd5b2d050013	`lastFMAPIKey`
instance_navidrome__navidrome-b65e76293a917ee2dfc5d4b373b1c62e054d0dca	`WithClientUniqueId`
instance_protonmail__webclients-369fd37de29c14c690cb3b1c09a949189734026f	`findHolidaysCalendarByCountryCodeAndLanguageTag`
instance_protonmail__webclients-3a6790f480309130b5d6332dce6c9d5ccca13ee3	`getCachedChildrenCount`
instance_protonmail__webclients-51742625834d3bd0d10fe0c7e76b8739a59c6b9f	`punycodeUrl`, `getHostnameWithRegex`
instance_protonmail__webclients-5f0745dd6993bb1430a951c62a49807c6635cd77	`flushPromises`
instance_protonmail__webclients-ae36cb23a1682dcfd69587c1b311ae0227e28f39	`elementsToRemove`, `elementsToBypass`
instance_qutebrowser__qutebrowser-0d2afd58f3d0e34af21cee7d8a3fc9d855594e9f-vnan	`qobj_repr`
instance_qutebrowser__qutebrowser-16de05407111ddd82fa12e54389d532362489da9-v363c8a7e5ccdf6968fc7ab84a2053ac78036691d	`_get_locale_pak_path`, `_get_lang_override`
instance_qutebrowser__qutebrowser-1943fa072ec3df5a87e18a23b0916f134c131016-vafb3e8e01b31319c66c4e666b8a3b1d8ba55db24	`set_pinned`
instance_qutebrowser__qutebrowser-2dd8966fdcf11972062c540b7a787e4d0de8d372-v363c8a7e5ccdf6968fc7ab84a2053ac78036691d	`qcolor_to_qsscolor`
instance_qutebrowser__qutebrowser-35168ade46184d7e5b91dfa04ca42fe2abd82717-v363c8a7e5ccdf6968fc7ab84a2053ac78036691d	`template_config_variables`, `frozenset`
instance_qutebrowser__qutebrowser-473a15f7908f2bb6d670b0e908ab34a28d8cf7e2-v363c8a7e5ccdf6968fc7ab84a2053ac78036691d	`_get_locale_pak_path`, `_get_lang_override`
instance_qutebrowser__qutebrowser-52708364b5f91e198defb022d1a5b4b3ebd9b563-v2ef375ac784985212b1805e1d0431dc8f1b3c171	`StatusbarWidget`
instance_qutebrowser__qutebrowser-66cfa15c372fa9e613ea5a82d3b03e4609399fb6-v363c8a7e5ccdf6968fc7ab84a2053ac78036691d	`_get_locale_pak_path`, `_get_lang_override`
instance_qutebrowser__qutebrowser-8f46ba3f6dc7b18375f7aa63c48a1fe461190430-v2ef375ac784985212b1805e1d0431dc8f1b3c171	`_validate_untrusted_args`
instance_qutebrowser__qutebrowser-99029144b5109bb1b2a53964a7c129e009980cd9-va0fd88aac89cde702ec1ba84877234da33adce8a	`copy_remove_setting`, `qt_67`
instance_qutebrowser__qutebrowser-9b71c1ea67a9e7eb70dd83214d881c2031db6541-v363c8a7e5ccdf6968fc7ab84a2053ac78036691d	`_get_locale_pak_path`, `_get_lang_override`
instance_qutebrowser__qutebrowser-a84ecfb80a00f8ab7e341372560458e3f9cfffa2-v2ef375ac784985212b1805e1d0431dc8f1b3c171	`for_cmd`, `EmptyCommandError`
instance_qutebrowser__qutebrowser-bf045f7ec7c27709ea3ef61cf41a24e8fdd2e7da-v059c6fdc75567943479b23ebca7c07b5e9a7f34c	`_FindFlags`, `to_qt`
instance_qutebrowser__qutebrowser-c0be28ebee3e1837aaf3f30ec534ccd6d038f129-v9f8e9d96c85c85a605e382f1510bd08563afc566	`extra_suffixes_workaround`
instance_qutebrowser__qutebrowser-ec2dcfce9eee9f808efc17a1b99e227fc4421dea-v5149fcda2a9a6fe1d35dfed1bade1444a11ef271	`_js_log_to_ui`
instance_qutebrowser__qutebrowser-ef5ba1a0360b39f9eff027fbdc57f363597c3c3b-v363c8a7e5ccdf6968fc7ab84a2053ac78036691d	`_get_locale_pak_path`, `_get_lang_override`
instance_qutebrowser__qutebrowser-ff1c025ad3210506fc76e1f604d8c8c27637d88e-v363c8a7e5ccdf6968fc7ab84a2053ac78036691d	`set_defaults`
instance_tutao__tutanota-f3ffe17af6e8ab007e8d461355057ad237846d9d-vbc0d9ba8f0071fbe982809910959a6ff8884dbbf	`EntropyFacade`
instance_tutao__tutanota-fe240cbf7f0fdd6744ef7bef8cb61676bcdbb621-vc4e41fd0029957297843cb9dec4a25c7c756f029	`checkEventValidity`

AmazonScience/SWE-PolyBench

68 instances with high-risk coupling (click to expand)

Instance ID	Coupled Symbols
angular__angular-37484	`clearTsConfigCache`
apache__dubbo-4379	`whenCompleteWithContext`
apache__dubbo-5356	`PROMPT`
apache__dubbo-6498	`SERVICE_PATH_PREFIX`, `servicePathPrefix`
apache__rocketmq-1636	`TOPIC_MAX_LENGTH`
apache__rocketmq-3862	`incPutMessageEntireTime`, `initPutMessageTimeBuckets`, `findPutMessageEntireTimePX`
apache__rocketmq-4122	`setStorePathDLedgerCommitLog`
apache__rocketmq-4763	`getEnumByString`
apache__rocketmq-5008	`ConcurrentHashMapUtils`
apache__rocketmq-5037	`CONTROLLER_ELECT_MASTER_FAILED`
apache__rocketmq-5834	`incBrokerGetNumsWithoutSystemTopic`, `BROKER_GET_NUMS_WITHOUT_SYSTEM_TOPIC`, `getBrokerGetNumsWithoutSystemTopic`
apache__rocketmq-7455	`decodeCommandCustomHeaderDirectly`
apache__rocketmq-8051	`setTraceTopic`, `setEnableTrace`
apolloconfig__apollo-4119	`SpringCloudInnerDiscoveryService`
coder__code-server-5633	`welcomeText`, `appName`
google__gson-2420	`runTestNoDefaultConstructor`
google__gson-2549	`originalTimeZone`
huggingface__transformers-13573	`reorder_and_upcast_attn`, `scale_attn_by_inverse_layer_idx`
huggingface__transformers-15831	`resize_decoder_token_embeddings`, `share_encoder_decoder_embeddings`
huggingface__transformers-24510	`warn_if_padding_and_no_attention_mask`
huggingface__transformers-29838	`get_learning_rates`, `get_num_trainable_parameters`, `get_optimizer_group`
huggingface__transformers-31095	`on_optimizer_step`
langchain-ai__langchain-676	`save_local`, `load_local`
mrdoob__three.js-17649	`morphTargetsRelative`
mrdoob__three.js-20991	`setFromMatrix3`
mrdoob__three.js-22404	`setFromAttributeAndIndices`
mui__material-ui-13003	`StepIconComponent`
mui__material-ui-14461	`wrapsIntrinsicElement`
mui__material-ui-19257	`hasPopupIcon`, `hasClearIcon`
mui__material-ui-33812	`collapsedIcon`
mui__material-ui-36426	`getOptionKey`
prettier__prettier-15408	`GQL_QUERY_WITH_CONST`
prettier__prettier-9736	`cleanDoc`
serverless__serverless-2584	`compileRole`
serverless__serverless-3186	`setFunctionNames`
serverless__serverless-3521	`getServiceObject`, `getServiceName`
serverless__serverless-3622	`mergeResourceArrays`
serverless__serverless-3700	`loadEnvVarsForLocal`
serverless__serverless-3808	`assignDefaultOptions`
serverless__serverless-3812	`invocationId`
serverless__serverless-4120	`isArnRefOrImportValue`
serverless__serverless-4293	`canUseS3TransferAcceleration`, `disableTransferAccelerationForCurrentDeploy`, `enableS3TransferAcceleration`, `isS3TransferAccelerationEnabled`
serverless__serverless-4382	`conceal`
serverless__serverless-4531	`endpointType`
serverless__serverless-4793	`iamManagedPolicies`
serverless__serverless-5662	`getProfile`
serverless__serverless-5728	`suppressLogIfPrintCommand`
serverless__serverless-5988	`envVarsFromOptions`, `getEnvVarsFromOptions`
serverless__serverless-5994	`dockerArgsFromOptions`, `getDockerArgsFromOptions`
serverless__serverless-6293	`validateHeaderCondition`, `validateIpCondition`, `validateQueryCondition`
serverless__serverless-6322	`getAlbTargetGroupName`, `getAlbTargetGroupNameTagValue`
serverless__serverless-6823	`getDeploymentBucketPolicyLogicalId`
serverless__serverless-6869	`getValueStrToBool`
serverless__serverless-6871	`cfnRoleArn`
serverless__serverless-6960	`getResolved`, `getRejected`
sveltejs__svelte-1364	`assignTrue`
sveltejs__svelte-1627	`setData`
sveltejs__svelte-1988	`nextTick`
sveltejs__svelte-3430	`set_input_value`
sveltejs__svelte-6525	`insert_hydration`
sveltejs__svelte-6556	`claim_svg_element`
sveltejs__svelte-705	`callAll`
sveltejs__svelte-778	`setInputType`
trinodb__trino-3638	`updateExecutor`, `setMaxConcurrentMetastoreUpdates`
trinodb__trino-3771	`setDelegationTokenCacheTtl`, `setDelegationTokenCacheMaximumSize`
trinodb__trino-4393	`validateFileBuckets`
trinodb__trino-748	`setAwsSecretKey`, `setAwsAccessKey`
yt-dlp__yt-dlp-8917	`_deprecated_multivalue_fields`

Citation

If you find this analysis useful for your research, please cite it as:

@misc{ganhotra2026hiddencontracts,
  title={Hidden Naming Contracts in SWE-Agent Benchmarks},
  author={Ganhotra, Jatin},
  year={2026},
  month={April},
  url={https://jatinganhotra.dev/blog/swe-agents/2026/04/01/hidden-naming-contracts-in-swe-agent-benchmarks/},
  note={Blog post}
}