Taking GenAI from Tasks to Projects

Toy Bin Icon Test

Ellen Chisa

May 05, 2025

Like many folks, I’ve used ChatGPT for lots of small, well scoped tasks:

Ramping up on a new space quickly - especially one that’s full of acronyms!

Parsing through academic papers by getting a summary before diving into the details.

Asking for customized advice based on a very specific question (nutrition concerns while training for triathlon and breastfeeding)

Pretty much all of my “regular” uses help me learn faster, but don’t have much to do with my output.

With the hype around the idea that agents will do full projects, I wanted to try a project I would never do on my own: I wanted to label our toy bins with icons so our toddler could put away his own toys. I wanted the style to be cohesive, but I definitely wasn’t going to draw them myself. I didn’t even want to try to find them on noun project + then create a template on Canva because I am lazy. I also knew he’d get more toys over time and wanted to be able to easily add icons to the set when that happened.

The prompt I started with:

I'm generating a set of labels for my toddler's toy bin. I will be printing in black and white. I'd like the labels to clearly indicate what is in the bin and be recognizable for him. They need to be 3" tall but can be up to 8" wide. I'd like them to have a cohesive aesthetic and match each other.

I did eventually get to a reasonable result. My toddler recognizes the icons and has been putting toys in bins on his own. That said, I am still waiting on a Marble Run icon whenever the image generation tool comes back again...

It’s like managing a (very) junior employee

A lot of folks have mentioned that working with GenAI for software engineering is a lot like managing a very junior person. After this experience I’m inclined to agree. There were a few particular behaviors that felt like managing someone junior to me.

Strong tendency to “avoid” the actual work

The first thing the model did was give me the assumptions it took (repeating back to me the size, style, typeface, etc). This seemed positive and makes sense.

But that continued for a lot longer than I would expect. In this case it suggested a list of categories (I had my own list based on our toys) and a format. Then, during this process it re-iterated all the guidelines several times. It also pivoted to telling me how to do the project myself (not what I wanted).

After several rounds of that, it offered to generate the set (exactly what I wanted!) and the result was abysmal despite the assumptions seeming correct.

It reminded me of someone who isn’t sure how to do the job, so they keep trying to optimize ways to do the job vs. just doing it.

Losing the plot

Despite having all of the assumptions in the project repeatedly re-stated, ChatGPT would often lose the plot when trying to progress to actually doing work. This was particularly infuriating given the first limitation.

Generated labels only (above).

Created one image that had 13 icons, no words, wrong size

Decided to go back to doing colored icons

Dropped word labels completely

Fundamentally changed the icon style

Forgot how many icons we’d made

The best (only?) way to avoid this was to break things down into micro steps.

I could ask for one icon and refine it. But then I had to ask for each responses individually - “can I get an icon in our style for X?” and make sure nothing went awry vs. being able to ask to have all 13 generated.

This reminded me of someone who gets excited about some portion of the task, totally forgetting the overall business goal of the project.

Poor awareness of own skills

Once we hit this issue with availability of the generation tool, ChatGPT kept helpfully saying “want me to tell you when it’s back?” Despite not having the capabilities to monitor in the background and proactively send me messages. This undermined trust and meant I had to keep checking in - “is it back yet?”

This also happened when the tool suggested I upload a pdf with images, despite the fact that ChatGPT does block that from happening. Often when I coached on these items, it would acknowledge it had made a mistake, but it didn’t make it less frustrating

This reminded me of people who have poor overall project management skills and tend to need a lot of coaching while doing a project.

Platform & technical Limitations

In addition to what I think of as the “junior employee” challenges, I also ran into some general infrastructure/platform product limitations that still exist today.

“Invisible State” technical issues

As mentioned, I got five of my thirteen icons by breaking down things into micro steps. Then the tool started reporting that the image generation tool was down. If I asked if it was up, it would say yes, but if I asked to generate an icon, it would say it was down. Infuriating.

I eventually realized that the tool can be enabled overall, but also on a per-thread basis. ChatGPT helped me evade that by writing a custom prompt to start a new thread to continue my generation (lol). That said, the prompt it generated was not specific enough to not require more tweaking. I ended up having to download and upload a lot of icons into the new thread.

This is still going on and now my entire account will only use the legacy DALL-E model, so I can’t get my last icon in the style of the other twelve. Argh.

Generally when I’m using software, I’m not expecting to have to debug which pieces are up/down, and especially not expecting to have functionality completely vanish.

Weird Bugs

Probably doesn’t need to be said, but there are plenty of other rough edges. I used the web interface, desktop Mac interface, and mobile interface. Very rarely would image output show across the surfaces. Similarly, old generated files often “expired” and were no longer accessible.

What’s Next

While there were a lot of annoyances in this experience, I did get the project done! I’d do it again. That said, I talked to a friend who can draw and it would have taken her a lot less time than it took me finagling prompts and giving feedback. Especially given that with a tool, I can just say things like “that’s completely wrong” without worrying about offending it.

Here’s a broader subset of the icons and styles I ended up going with:

Ellen’s Newsletter

Discussion about this post