Comparing Codex and Jules, OpenAI and Google's coding agents.

Both OpenAI and Google recently released coding agents that work off GitHub repos and pushing up PRs. Here’s a quick comparison between them for the same task on the same repo.

tl;dr: Jules is a lot faster and smarter, but the second time I tried it it got stuck starting forever and you only get 5 tasks a day.

Initial Task

I have a gRPC client repo that I use for pulling data off Farcaster. I have been meaning to update it to support stopping back at a certain number of days, instead of always retrieving data from the beginning of the protocol.

I gave both Codex and Jules access to the repo and the prompt:

Can you please update HubClient/HubClient.Production/Program.cs to have the ability to add in a cut off day param like –days 30 or something? so that instead of getting all casts of all time from each person it only gets the data going back 30 days or whatever the input is, by default the value is null so it gets all of them. thank u.

Performance

I figure this is the most important metric, how long did it take from initial prompt to PR?

Codex: 20 minutes, 3 messages from me.
View PR: https://github.com/jc4p/fast-hub-client/pull/2

Jules: 8 minutes, 2 messages from me.
View PR: https://github.com/jc4p/fast-hub-client/pull/1

Chats

You can view the Codex chat here: https://chatgpt.com/s/cd_682d14160c9481919b4d5271bfd0abe6

And since Jules doesn’t have built in share functionality, here is a big image showing the entire chat: https://images.kasra.codes/jules_chat.png

They both misunderstood the task initially (their approach wouldn’t stop early once hitting content from the cut-off date, it would retrieve all time data and only filter when saving the data) but understood it after being told exactly what to update and where.

It’s hard for me to express how much faster Jules was than Codex: With Codex I’d tab away into another thing and wait for it to chug along. With Jules it was near instant.

Quick Notes

Codex tries to run your code and tests for you. This means you have to setup a dev environment (adding apt-get packages manually or whatever) but they provide functionality for most things (including Bun) out of the box.

On the flip side, Codex running my code and tests for me was useless, because after the initial setup script Codex doesn’t have access to the internet anymore and all my .NET packages come from the internet!

I could’ve worked around this by running an initial build in the setup script where I do the apt-get commands, but it’d still be moot since connecting to the gRPC server would fail too.

Jules just makes a plan, and writes the code. As far as I can understand, Jules doesn’t have the ability to run the code it generates, it just told me to test the stuff locally and gave me a comprehensive testing plan.

Both of these points are irrelevant to me: I don’t want it to test the code, I want it to write the code, I read it, and then it pushes up a PR so I can test locally. For that use case Jules wins by default, it takes too long to do it on Codex.

Will I use them again?

I’m probably not touching Codex again unless I hear any major updates about it, it’s just way too damn slow. Jules is an accelerant, Codex is a junior dev whose shoulder I need to watch over.

If I have any more modular specific tasks I’m going to ask Jules to work on them.

In fact I did try to use it to dynamically update the currently hardcoded max FID in the script, but it’s been stuck at “Starting shortly” for 25 minutes, so maybe it’s a case of “Jules is fast as hell if it works for you.”

I haven’t been able to identify what Codex’s usage quota is (I’ve started 5 chats with it so far today) but Jules currently has a max use of 5 tasks a day and tasks that stall and don’t actually process count towards that.

So all in all, neither is absolutely amazing right now.

If I can figure out a way to utilize them in the manner I vibe code in Cursor (saying “read the PRD and the stand-up notes from yesterday, continue with your job”) it’ll be incredible.

The other thing is if there’s a system where it can do a front-end loop (making code changes, previewing them in browser, “seeing” them, reacting), it’d be extremely useful for me but neither of these can do that yet.

For now Jules seems to be a great use for handing off small concrete tasks that don’t require a lot of testing. I’ll probably use it for things like “I setup the scaffold for this API and don’t feel like filling in the rest of the methods” but it’s not a replacement for a self sufficient dev.

Day 1 Comparison of Codex & Jules

Initial Task

Performance

Chats

Quick Notes

Will I use them again?

Like this:

Related

Initial Task

Performance

Chats

Quick Notes

Will I use them again?

Share this:

Like this:

Related

Discover more from Kasra Rahjerdi