The Blind Spots of AI Tutors

AI cod­ing as­sis­tants have be­come pop­u­lar study bud­dies for learn­ing al­go­rithms. They ex­plain con­cepts, gen­er­ate code, and an­swer fol­low-up ques­tions on de­mand. But how re­li­able are they when tack­ling prob­lems that re­quire care­ful rea­son­ing?

This post col­lects ex­am­ples where AI mod­els failed as teach­ers, ei­ther by pro­duc­ing buggy code or by in­cor­rectly cri­tiquing cor­rect code.

Case 1: The Sideway Tower of Hanoi

In the clas­sic Tower of Hanoi, you move disks be­tween any two pegs. The side­way vari­ant adds a con­straint: disks can only move be­tween ad­ja­cent pegs. With pegs arranged as A-B-C, you can move A↔B or B↔C, but never di­rectly A↔C.

This seem­ingly small change has sig­nif­i­cant im­pli­ca­tions. The min­i­mum moves jump from 2n1 to 3n1. For 3 disks, that is 7 moves ver­sus 26.

Where Things Went Wrong

Claude Opus 4.5: The Constraint Violation

Opus 4.5 quickly pro­duced a so­lu­tion that looked rea­son­able:

def hanoi_sideway(n, source, target, pegs=['A', 'B', 'C']):
    if n == 0:
        return

    source_idx = pegs.index(source)
    target_idx = pegs.index(target)

    if abs(source_idx - target_idx) == 1:
        # Adjacent case
        hanoi_sideway(n - 1, source, target, pegs)
        print(f"Move disk {n} from {source} to {target}")
        hanoi_sideway(n - 1, target, source, pegs)
        hanoi_sideway(n - 1, source, target, pegs)
    else:
        # Non-adjacent case
        middle = pegs[1]
        hanoi_sideway(n, source, middle, pegs)
        hanoi_sideway(n, middle, target, pegs)

The code ran and pro­duced 26 moves for n=3, match­ing the ex­pected count. The model even con­firmed: All moves are be­tween ad­ja­cent pegs only (A↔B or B↔C). No di­rect A↔C moves oc­cur.”

But there is a sub­tle bug in the ad­ja­cent case. Consider mov­ing disks from A to B:

  1. hanoi_sideway(n - 1, source, target, pegs) moves n-1 smaller disks to B
  2. print(f"Move disk {n}...") at­tempts to move the largest disk from A to B

The prob­lem? After step 1, the smaller disks are al­ready on peg B. Placing the largest disk on top of them vi­o­lates the fun­da­men­tal rule: a larger disk can­not sit on a smaller one.

When I spot­ted this and asked when we move n-1 pegs from source to tar­get, then the last one from source (the biggest) can­not put to tar­get since it vi­o­lates the con­straint,” Opus 4.5 im­me­di­ately ac­knowl­edged the mis­take and pro­vided a cor­rected ver­sion.

GPT 5.2: The False Negative

GPT 5.2 took a more cau­tious ap­proach ini­tially, ask­ing clar­i­fy­ing ques­tions be­fore writ­ing code. I chose to brain­storm the so­lu­tion my­self, and af­ter sev­eral it­er­a­tions, ar­rived at a work­ing al­go­rithm.

When I pre­sented the code for ver­i­fi­ca­tion, GPT 5.2 con­fi­dently de­clared it in­cor­rect:

No. The non-ad­ja­cent branch is in­cor­rect and will re­curse for­ever.”

The model pro­vided an elab­o­rate analy­sis, walk­ing through a step-by-step trace for n=2 and con­clud­ing:

The real prob­lem shows up at n=3: your re­cur­rence for the non-ad­ja­cent case is not the cor­rect one and it will not pro­duce the re­quired 3^n - 1 move count; it composes’ A->B and B->C moves of size n, which does­n’t match the nec­es­sary struc­ture.”

GPT 5.2 even claimed the code would pro­duce 24 moves in­stead of 26.

When I pushed back with no, it’s ac­tu­ally 26,” the model re­versed course. It re­cal­cu­lated the re­cur­rence re­la­tions and ad­mit­ted the code was valid af­ter all, not­ing that the for­mu­la­tion was not the stan­dard re­cur­rence, but it can still be cor­rect.”

Case 2: The BST In-Order Successor

In an­other ses­sion, I asked an AI to ver­ify my in-or­der suc­ces­sor al­go­rithm for bi­nary search trees:

Node* succ(Node* root, Node* target) {
    res = null;
    while (root != null) {
        if (root->key > target->key) {
            res = root;
            root = root->left;
        } else {
            root = root->right;
        }
    }
    return res;
}

The AI con­fi­dently iden­ti­fied two flaws”:

  1. The al­go­rithm does­n’t reach the tar­get node”
  2. It does not han­dle the case where the tar­get has a right sub­tree”

Both claims sound rea­son­able at first glance. But they’re wrong.

I asked for an ex­am­ple where the code fails. The AI tried sev­eral BSTs:

        20
       /  \
      10   30
     /  \
    5   15
       /  \
      12  18

Finding suc­ces­sor of 10? Starting from root 20:

  • 20 > 10: save res = 20, go left
  • 10 <= 10: go right to 15
  • 15 > 10: save res = 15, go left to 12
  • 12 > 10: save res = 12, go left (null)
  • Return 12. Correct.
        20
       /  \
      10   30
        \
        15
          \
          18

Finding suc­ces­sor of 15? Starting from root 20:

  • 20 > 15: save res = 20, go left
  • 10 <= 15: go right to 15
  • 15 <= 15: go right to 18
  • 18 > 15: save res = 18, go left (null)
  • Return 18. Correct.

Every case worked. The AI could­n’t pro­duce a sin­gle coun­terex­am­ple for its own cri­tique.

The prob­lem: the AI an­a­lyzed what it thought the al­go­rithm did rather than what it ac­tu­ally does. The al­go­rithm does­n’t need to reach” the tar­get node. It finds the small­est key greater than the tar­get by sys­tem­at­i­cally nar­row­ing the search space.

The AI ad­mit­ted its mis­take: The code is ac­tu­ally cor­rect for its in­tended pur­pose… My orig­i­nal cri­tique was flawed.”

The Subtle Danger

These er­rors share a trou­bling char­ac­ter­is­tic: they were plau­si­ble enough to fool some­one learn­ing the ma­te­r­ial.

Consider what would have hap­pened if I had not pushed back:

With Opus 4.5, I might have ac­cepted the buggy code as cor­rect. The out­put looked right: 26 moves, all be­tween ad­ja­cent pegs. Without man­u­ally trac­ing through the logic or sim­u­lat­ing the peg states, the con­straint vi­o­la­tion is in­vis­i­ble. A stu­dent im­ple­ment­ing this so­lu­tion would pro­duce in­valid move se­quences while be­liev­ing they un­der­stood the al­go­rithm.

With GPT 5.2, I might have aban­doned a cor­rect so­lu­tion. The mod­el’s con­fi­dent tone and de­tailed (but flawed) analy­sis could con­vince a learner that their work­ing code was bro­ken. Worse, GPT 5.2 sug­gested the code would pro­duce 24 moves in­stead of 26, im­ply­ing it found a better” so­lu­tion than the math­e­mat­i­cal op­ti­mum. This should have been a red flag, but how many stu­dents would catch it?

With the BST suc­ces­sor, the AI de­clared cor­rect code to be flawed with­out test­ing the hy­poth­e­sis. It pat­tern-matched on the al­go­rith­m’s struc­ture, made plau­si­ble-sound­ing crit­i­cisms, and only back­tracked when forced to pro­duce a coun­terex­am­ple.

The fact that both Hanoi mod­els pro­duced the cor­rect move count (26 for n=3) made ver­i­fi­ca­tion harder. A sim­ple does it out­put the right num­ber?” check would pass. Only by un­der­stand­ing the prob­lem deeply, or by sim­u­lat­ing the ac­tual peg states move by move, could you catch these er­rors.

Takeaways

AI cod­ing as­sis­tants are pow­er­ful tools for learn­ing, but these ex­am­ples re­veal their lim­i­ta­tions:

  1. Correct out­put does not mean cor­rect logic. Opus 4.5’s buggy code pro­duced the right move count while vi­o­lat­ing fun­da­men­tal con­straints. Always trace through the logic, not just the re­sults.

  2. Confidence is not cor­rect­ness. GPT 5.2 de­liv­ered a de­tailed, au­thor­i­ta­tive analy­sis that was sim­ply wrong. The BST re­viewer de­clared code bro­ken with­out test­ing the claim. A less ex­pe­ri­enced learner might have trusted the tone over their own work­ing code.

  3. Ask for coun­terex­am­ples. When an AI claims code is wrong, ask it to pro­duce a fail­ing case. If it can’t, the cri­tique may be un­founded.

  4. Verify in­de­pen­dently. Run the code. Test edge cases. Simulate the state changes. Cross-reference with text­books or other sources. Do not rely on a sin­gle AIs ex­pla­na­tion.

  5. Domain knowl­edge mat­ters. I caught these er­rors be­cause I un­der­stood the prob­lems well enough to ques­tion the an­swers. Without that foun­da­tion, the mis­takes would have gone un­no­ticed.

AI as­sis­tants make ex­cel­lent study com­pan­ions, but they are not in­fal­li­ble teach­ers. Trust, but ver­ify.