I Asked 3 AI Models to Recommend a Chiropractor. A practice with 18 Reviews Beat one with 377. Here’s What the Local Data Revealed.
98 queries, 25 practices, 3 models, and seven findings that upend how local businesses think about AI visibility.
Last month I published the first AEO study outside of B2B SaaS — 62 queries across three AI models, and 2,293 citations.
The headline: structure beats authority, clinical reasoning beats clinical claims, and the entire chiropractic vertical is a schema desert with zero competition for structured data signals.
That was the niche study. Educational queries. “Does chiropractic help sciatica?” “Is chiropractic safe during pregnancy?” National-level, condition-specific questions.
But that’s not how most people actually use AI to find a chiropractor.
They ask: “Who’s the best chiropractor near me?”
I decided to go local, and figure out what leads to certain Chiros being cited over others. The results were surprising.
What I Actually Did
I expanded the study in three directions simultaneously.
The query set tripled. 62 queries became 98, rebuilt from scratch using Josh Grant’s question mining framework. I pulled from the same sources as Phase 1 — Reddit, PAA boxes, Perplexity suggestions — but added a 178-question patient corpus clustered into 20 themes across 6 categories. The queries now span four intent buckets: Evaluation (”best chiropractor in Bee Cave for sciatica”), Outcome (”will chiropractic help my herniated disc”), Process (”how many visits for back pain”), and Fear (”is chiropractic safe,” “is my chiropractor scamming me”).
That last bucket didn’t exist in Phase 1. It turned out to be the most important addition.
The practice set expanded. 6 Tier 1 practices became 25, covering the full competitive landscape in Bee Cave, TX. For the top 6, I built complete signal profiles: Google Business data, Ahrefs domain metrics, Yelp reviews, Instagram presence, directory footprint, Reddit mentions, YouTube audit, and on-site structural fingerprinting across 35 attributes.
The analysis went three-dimensional. Phase 1 tested a two-variable relationship: does content structure predict citation, independent of domain authority? Phase 1.5 tested three variables simultaneously: on-site content quality, entity clarity signals, and off-site ecosystem strength. Each measured independently. Each correlated against AI citation frequency per model.
The dataset: 98 queries × 3 models × 3 runs per model = 882 AI responses. Every practice mention logged. Every citation tracked. Full correlation matrices built.
Finding #1: The Local Signal Inversion
This is the most counterintuitive result in either phase of the study.
In local AI recommendations, every volume metric I measured showed a negative correlation with citation frequency.
Every clarity metric showed positive.
Review count? Negative (r = −0.453). Yelp reviews? Negative (r = −0.520). Total platform count? Negative (r = −0.533). Domain Rating? Near-zero (r = −0.106).
Now the positive side: owner response rate (r = +0.826, p = 0.043). Location-name match (r = +0.846, p = 0.034). Average rating (r = +0.556). Condition density in reviews (r = +0.631).
Every signal that measures how much a practice has is negatively correlated. Every signal that measures how clear a practice’s identity is runs positive.
The entire local marketing industry is optimized for volume. Get more reviews. Get on more platforms. Build more backlinks. The data says that’s the wrong game. Signal clarity beats signal volume.
And it’s not close.
Finding #2: The Review Count Paradox
This one deserves its own section because the numbers are so stark.
Bee Cave Chiropractic has 18 Google reviews. Elite Wellness has 377.
Bee Cave Chiropractic received 77 AI citations across three models. Elite Wellness received 62.
That’s 4.28 citations per review versus 0.16 citations per review. A 26.6× difference in citation efficiency.
How? Bee Cave Chiropractic has the highest review condition density in the market — 14.7 conditions mentioned per 1,000 words of review text. Migraines, lower back, TMJ, lupus, car accident injuries — specific conditions named in specific reviews. Elite Wellness has 8.7 conditions per 1,000 words. More words, proportionally fewer condition-specific insights.
Bee Cave Chiropractic also has perfect doctor-naming consistency. “Dr. Swanson” appears 24 times across 18 reviews — 1.33 mentions per review. No ambiguity about who the practitioner is. Elite Wellness has three different doctor names appearing across its reviews, fragmenting the entity signal.
And Bee Cave Chiropractic has the one advantage nobody can engineer after the fact: the practice name IS the search query. When a patient asks “best chiropractor in Bee Cave,” the practice name matches the geography exactly. No entity-location resolution needed. The model doesn’t have to work to figure out that this practice is in Bee Cave.
Review volume isn’t just unhelpful for AI recommendations. It’s actively dilutive when the additional reviews don’t carry condition-specific information.
Finding #3: Model-Specific Signal Divergence
Not all models weight the same signals.
When I ran per-model correlations against the signal variables, each model revealed a distinct optimization profile:
GPT-4o prioritizes entity-geography alignment. The correlation between location-name match and GPT citations: r = 0.913. GPT cares most about whether it can cleanly resolve “this practice is in this place.”
Perplexity prioritizes domain authority. DR vs. Perplexity citations: r = 0.721. This is the one model where traditional SEO metrics have measurable predictive value.
Gemini inversely weights both. Gemini’s correlations with location-name match and DR both run negative. It applies the heaviest YMYL filter (24.3% of citations go to medical authority sites, 43% more than the other models), and it gave the highest number of zero-citation responses. Gemini is the skeptic.
The strategic implication is uncomfortable: optimizing for one model can hurt performance on another. A practice that invests heavily in domain authority to win on Perplexity may see no lift — or negative lift — on Gemini. The only signals that run positive across all three models are the clarity metrics: review quality, naming consistency, response rate.
Cross-model optimization requires signal clarity, not signal volume. There’s no shortcut.
Finding #4: The Schema Desert (Extended)
Phase 1 found zero schema markup across 31 practice sites in the niche study.
Phase 1.5 confirmed it extends to local practice sites. I audited all 6 Tier 1 practices across 35 structural attributes. Zero FAQ Schema. Zero LocalBusiness Schema. Zero MedicalCondition Schema. Zero MedicalBusiness Schema. Nobody.
But Phase 1.5 added a finding Phase 1 couldn’t: the Compressed Structural Landscape.
The 6 Tier 1 practices scored an average of 16.7 out of 35 on the structural audit. The range was only 10 to 22. Nobody is terrible. Nobody is good. Everyone is equally mediocre.
This creates a measurement paradox. On-site structure correlates at r = −0.178 with citations — which looks like structure doesn’t matter. But the real explanation is insufficient variance. When everyone scores between 10 and 22 on a 35-point scale, there’s not enough differentiation for structure to predict anything. It’s not that structure is irrelevant. It’s that nobody has invested enough to create structural separation.
This is the opportunity the Phase 1 study identified in theory, confirmed in practice: the first mover in the schema desert has zero competition. Not “low competition.” Zero.
Finding #5: The Entity Clarity Composite
When I combined the three strongest positive predictors — location-name match, naming consistency, and owner response rate — into a single composite score, it became the strongest predictor of local AI recommendations in the entire study.
Entity Clarity Composite vs. total AI citations: r = +0.775 (p = 0.070).
For context, no individual signal comes close to this predictive power. The composite outperforms review count (r = −0.453), domain authority (r = −0.106), organic traffic (r = 0.077), and even review condition density (r = +0.631) alone.
What this means in practice: AI models aren’t asking “which practice has the most reviews?” or “which practice has the best SEO?” They’re asking “which practice can I most clearly identify as being in this location, run by this person, with this level of engagement?” Entity clarity is identity resolution. And identity resolution is the prerequisite to recommendation.
Nobody had defined “entity clarity” as a measurable, composite variable before. Now there’s a number to optimize against.
Finding #6: The Fear Gap
This one emerged from the query set expansion and it’s the finding I keep coming back to.
When I mapped the 178-question patient corpus against Josh Grant’s four-bucket intent framework (Evaluation, Fear, Outcome, Process), one bucket didn’t fit cleanly. Grant’s “Fear” bucket — designed for SaaS — captures concerns like “will this product waste my budget?” or “will this tool break my workflow?”
In healthcare, Fear is categorically different.
“Will this person paralyze me?”
“Is my chiropractor scamming me with a 3x/week treatment plan?”
“Is it normal to feel worse after an adjustment?”
“Is chiropractic safe during pregnancy?”
That’s not product anxiety. That’s physical safety and bodily trust. The consequence scale is in a completely different category. SaaS Fear means “I might waste money.” Healthcare Fear means “I might get hurt.”
When I counted the questions in this cluster, it was 58 out of 178 — 33% of the entire patient corpus. The largest single intent category.
And when I audited the competitive landscape: zero practices have dedicated Fear content. Nobody addresses stroke risk honestly. Nobody has a page about whether a treatment plan is normal or a scam. Nobody explains what post-adjustment soreness means versus a sign of a real problem.
Meanwhile, the cross-model audit showed that one of the practices accidentally over-indexes on Fear queries — getting mentioned at 33% (GPT) and 42% (Perplexity) of Fear-related responses without having any Fear content at all. The signal is coming from review language and an implied Women-owned trust badge. It’s accidental.
33% of patient questions. Zero content addressing them. The largest intent cluster in the entire corpus with no competitive supply.
This is the content moat that can’t be copied — because Fear content requires professional courage. Writing “here’s when chiropractic doesn’t work” and “here’s when you should see someone else instead of us” feels counterintuitive. Which is exactly why it’s defensible.
Finding #7: The Missing Middle
I went into Phase 1.5 expecting the 25 practices to sort into four archetypes from the research plan: Ghost (invisible everywhere), Content Champion (strong content, weak off-site), Reputation Play (strong reviews, weak content), and Full Stack (both layers active).
The data produced seven.
Ghost (7 practices, 28%) — invisible across all signal layers. No content, no reviews worth noting, no entity clarity. These practices don’t exist to AI models.
Reputation Play (7 practices, 28%) — strong reviews and directory presence, weak or nonexistent content. Getting recommended on review strength alone but vulnerable to anyone who adds the content layer.
Volume Play (3 practices, 12%) — highest raw numbers (most reviews, highest DR, most followers) but worst citation efficiency. These are the practices that did everything the local marketing industry told them to do and got the worst results per unit of investment.
SEO Legacy (2 practices, 8%) — strong organic traffic from traditional SEO but weak AI citation. One practice has $462K in estimated organic traffic value and only 32 AI citations. SEO success doesn’t translate to AI visibility.
Content Island (2 practices, 8%) — deep content in a narrow niche but limited breadth. Getting cited for specific conditions but not appearing in general recommendation queries.
Entity Match Leader (1 practice, 4%) — Bee Cave Chiropractic. Winning primarily through geographic-name alignment rather than content or off-site strategy. A naming accident creating citation dominance.
Full Stack (1 practice, 4%) — TexStar, the closest to having all three layers active. YouTube channel, multi-location SEO, decent reviews. Still has gaps but is the only practice operating across all signal dimensions.
The striking finding: zero Content Champions. Not one practice out of 25 has both strong on-site content AND a rich off-site ecosystem. The archetype that the AEO Playbook says should dominate doesn’t exist in this market. There’s a “missing middle” between having good content and having a good entity signal profile. Nobody has built both.
That gap defines the Phase 2 intervention opportunity.
What’s Coming Next: Phase 2
Phase 1 proved the patterns exist. Phase 1.5 quantified them at the local level. Phase 2 tests whether they can be changed.
The plan: take one practice currently classified as a Volume Play — strong reviews, high DR, worst citation efficiency in the market — and systematically implement everything the data says should move the needle.
That means:
Entity clarity fixes first — geographic string injection across the site, naming consolidation, claiming missing health directories (the practice is the only Tier 1 competitor missing Healthgrades, WebMD, and Vitals, and those platforms received 78 combined AI citations in the audit).
Schema deployment — FAQPage, LocalBusiness, MedicalBusiness, MedicalCondition. Into the schema desert with zero competition.
Condition page rewrites using the Publishability Hypothesis template from Phase 1 — clinical reasoning, named studies, decision frameworks, reassessment criteria, referral pathways, comparison tables, FAQ schema. Every page built to answer specific queries from the 98-query audit set.
Fear content — the 33% of patient questions nobody is addressing. Dedicated pages on safety, scope limitations, treatment plan legitimacy, post-adjustment expectations. Each with named research, honest scope boundaries, and the “when to see someone else” section that no competitor will build.
YouTube launch — condition-specific videos titled with the practice name, location, and condition. Creating surface area that 5 of 6 competitors completely lack.
Review quality coaching — not more reviews, better reviews. Engineering condition density and doctor-naming consistency through smarter review prompts.
The 98-query × 3-model baseline already exists. The structural audit scores are documented. The archetype classification is complete. We know exactly where we’re starting.
After implementation, we re-run the exact same 98 queries across all three models at 30, 60, and 90 days and measure citation lift per model, citation position change, new pages being cited, and cross-model consistency improvement.
If the intervention moves a Volume Play practice toward Full Stack status — if the citation efficiency improves and the archetype changes — that’s a publishable before/after case study with the most granular pre-intervention baseline dataset in AEO research.
If it doesn’t work, that’s equally publishable. The AEO community needs honest data about what moves the needle and what doesn’t.
Either way, we’ll know.
Phase 1 data, methodology, and structural fingerprint scorecards are published here. Phase 1.5 data including the full correlation matrices, archetype classifications, and 98-query audit results will be published alongside the Phase 2 baseline. Subscribe to follow the experiment.
If you run a practice and want to understand where your signals stand before AI search becomes the primary discovery channel, I’m reachable here.



