Memoirs of a Developer.

Why AI is a terrible pair programmer

Artificial intelligence (AI) is all the rage this decade. It can write poetry and music, generate increasingly accurate and realisic images, and has even found its way into software development. Endavours like IntelliCode and GitHub Copilot aim to make the lives of developers easier by utilizing AI to write source code.

All of these code-spewing tools label themselves as AI, but there isn't much intelligence to be found. They utilize a technique called machine learning. This is a specific area within the wide field of AI that has gained a lot of traction in the last decade due to increasing hardware performance.

Machine learning (ML) uses large amounts of data to train a model, where it learns to see patterns and correlations between input and desired output. This enabled the model, when prompted with previously unseen input, to guess the most likely output based on what it has learned from similar inputs. GitHub Copilot makes no secret about this:

Trained on billions of lines of code, GitHub Copilot turns natural language prompts into coding suggestions across dozens of languages.

It learned to code by looking at open-source code — monkey see, monkey do. It sure as hell didn't learn to program by following a Computer Science class. And this immediately brings me to a fundamental flaw in how these tools work.

Garbage in, garbage out

The machine learning models learn from analyzing other code, often open-source code found on GitHub. I'm not saying all open-source code is of poor quality, but it's probably also not the best source material to learn from.

I think the announcement video of IntelliCode painfully demonstrates what kind of issues this leads to. The second half of the very short video is annotated with "…and when I have a path to a file it has learned common patterns for checking before writing", while the AI produces the following code:

if (System.IO.File.Exists(fullPath))

System.IO.File.WriteAllBytes(fullPath, fileData);

These are three simple lines that are completely wrong, which I'll briefly highlight:

  1. There is a race condition at play.
  2. File.Delete doesn't fail if the file doesn't exist, so there is no need for File.Exists.
  3. File deletion can still fail because of a file lock or insufficient permissions.
  4. The File.WriteAllBytes call can fail due to improper permissions or other things that 'never happen', like an offline network share.

And I'm not even mentioning the synchronous nature and possible security issues of the full code snippet presented in the video. It's painful and embarrassing to see Microsoft produce such a poor example to 'show off' what IntelliCode can do.

The above is a perfect demonstration of cargo cult programming. The AI simply repeats (incorrect) patterns without understanding what it does. And probably plenty of developers will blindly accept the suggestion made by the AI. After all, it's intelligent and trained on billions of lines of code written by other highly capable developers, so it must be right! Right?

What's worse, is that more and more of the AI's code will become part of open-source code. And what does the AI use to train itself? Exactly — open-source code. As time goes on, the machine learning model will start to learn more and more from its own code. The AI will reinforce its own bad habits. Human developers have plenty of other opinions available in the form of comments and upvotes on platforms like Reddit and StackOverflow. These help developers to see and understand their mistakes, and learn new concepts. But that is taken away when a developer is given a suggestion by an AI. And maybe more important, who is going to downvote the AI when it makes a mistake, or explain to it that file system operations are subject to race conditions?

Think like a (pair) programmer

Pair programming is intended to bring two or more people together to tackle a coding problem because two people know more than one. It allows two minds to challenge each other. It can help spot design problems and security issues early. Junior developers can greatly benefit from a good pair programmer.

An AI cannot do these things. GitHub Copilot calling itself an AI pair programmer is disrespectful to us humans. You cannot replace a human mind with a machine learning model any more than you can replace a doctor with a flow chart. Software development is hard. Any tool that makes it seem easy often undermines very important aspects of software: security, privacy, maintainability, or even something as basic as correctness.

With AI-assisted tooling, the amount of hacked-together code written by unexperienced developers will likely increase. If some manager is now able to quickly write a Python script to get insight into some data they have, that might not seem like a bad thing. But I know all too well how these one-off scripts end up being relied on for business-critical processes. AI rings in a new era for rockstar developers. Yikes!

What do we need?

Naming an AI a pair programmer is clearly giving it way, way too much credit. But how about using it as a scaffolding tool to generate repetitive code?

Photo: Hush Naidoo Jade Photography, Unsplash

I have given AI tools a try. Every time it suggests something, I have to pause in my tracks to read what it suggests, only to realize that in most cases it's not what I need. In other cases it's exactly or close to what I need. In those cases I could have written it myself just as fast and without losing my train of thought. Developers have had auto completion and code snippets for decades; writing the actual code is usually not the bottleneck.

Even if an AI could perfectly write all the scaffolding code I needed, I still wouldn't want it in my code base. If you need a lot of repetitive code just to work with some framework or library, then it probably has the wrong abstractions. Applying AI is not the right solution to the problem. It would be better to write a proper abstraction instead of copy-and-pasting plumbing code everywhere. Let's be honest, AI code assistants are not much more than glorified clipboards.

What we need is smart solutions to the barriers we developers run into. AI is not a step forward. It's just a new way to keep doing the same old thing we've done for decades: write plain-text files. Machine learning and artifical intelligence in general have great potential, I'm not denying that! But not like this. Just because we can, doesn't mean we should.

Further reading

How git rebase can break your history

A git feature that is commonly seen as an advantage is being able to rewrite history. This ability allows for features such as rebase to exist. Git users seem to prefer rebasing over merge commits, because it keeps the commit history clean. But does that justify breaking your history?

Your Content Security Policy is a security seatbelt

Web application security is a delicate topic. There are many techniques you can implement to minimize the chance of security incidents. But what can you do to prevent the damage when things do go wrong?