March 17, 20265 min readBy Kristi Kumrija

Which AI tool is best for deep research? I tested five with the same prompt.

Claude won clearly. What all five got wrong was the more interesting finding.

I had a very specific research task to do. Not the "summarise this industry for me" kind. The kind where you need to find specific companies, pull together a consistent set of structured information on each one, and get it back in a particular format you can actually use downstream.

So I wrote a detailed prompt. Specific requirements. Defined objectives. A clear output format. The kind of prompt that, if a human researcher received it, they'd know exactly what was expected.

Then I sent the same prompt to five tools and watched what happened.

The tools

Claude (Sonnet 4.6, Extended Thinking), ChatGPT, Manus, Gemini (Thinking), Grok (Expert mode).

Same prompt. Same task. No adjustments between tools.

1. Claude

Claude followed the prompt exactly. Produced the output in the format I asked for without any steering. The only limitation: tool-call limits meant I had to split the work across two sessions.

Session one ran for 1 hour and 8 minutes and gathered 1,085 sources. Session two took 19 minutes and gathered 481. That's 1,566 sources across roughly 90 minutes of total research time. The depth showed in the output in a way that was immediately obvious when I compared it to everything else.

If you're doing this kind of structured research and you need thoroughness, nothing else I tested came close.

2. ChatGPT

Good. About 400 sources in around 30 minutes. The results were solid but lacked the depth Claude produced. The more significant issue was format compliance: I had to redirect it several times to follow the output structure I'd asked for. It kept drifting. Not a dealbreaker, but if your prompt has very specific structural requirements, expect to manage that conversation.

3. Manus

This one surprised me. About 20 minutes, followed the format without much steering, clean output. Less depth than Claude or ChatGPT, and it dropped some information along the way, but it did what it was asked. Of the five, it was the most obedient to the prompt format relative to its overall capability. Straightforward and honest about what it could do.

4. Gemini (Thinking)

Genuinely disappointing. Gemini has been my default research tool for a while, so I came in with high expectations. What I got instead was a report. A summary. Not the structured output I asked for.

I tried to redirect it. Multiple times. It kept going its own way. I don't know if something about the prompt style didn't suit it, or if this type of structured output task is just not where it performs well, but in practice the result wasn't usable in the format I needed. That surprised me more than anything else in the test.

5. Grok (Expert)

Basic results. Shallow. Missed a lot of information. Did not follow the format.

The most frustrating part had nothing to do with the research quality: Grok couldn't create a canvas or produce a proper markdown artifact. Every other tool handled this without any issue. If you're doing structured research and need to work with the output downstream, that limitation matters more than it might seem.

The thing none of them got right

There's one failure mode that showed up across all five tools, and it matters more than the rankings.

Specific company website domains.

Every tool was reasonably good at researching and scraping website content. But when it came to identifying the actual domain a specific company uses, all five got it wrong. Not occasionally. Consistently.

The pattern is predictable once you see it: they don't guess randomly. They construct what a company's domain probably should be. Name with .ai or .com, clean and professional, the kind of domain a sensible company would register. Except it's the wrong domain, or it belongs to a different company entirely, or it doesn't exist.

To deal with this, I started a separate Claude Cowork session specifically to open and verify every domain in a browser. It added time but it worked. The problem is worst for smaller companies and recent startups. Well known companies with obvious, long-established domains are mostly fine. Anything less prominent? Verify manually.

Here's the observation I keep coming back to: all five models assume a company's domain looks like companyname.ai or companyname.com. That's the mental model they've built. If your actual domain doesn't fit that pattern, you are less discoverable in AI-generated research.

Domain choice is now part of your AI discoverability strategy. That's not something anyone was talking about a couple of years ago. But it's real, it's structural, and it's already affecting how companies appear when someone uses one of these tools to research a space.

The ranking was roughly what I expected. The insight wasn't.

I went into this wanting to know which tool handles structured deep research best. I got that answer. Claude, clearly, followed by ChatGPT, then Manus, with Gemini and Grok at the bottom.

But the more interesting finding came from the edges of the test: the shared domain identification problem, and what it reveals about how these models construct a picture of companies on the internet.

If you're building something right now, your domain is being read (and sometimes misread) by every AI research tool your potential customers might use. Worth knowing.

February 5, 20268 min read

Do AIs Have Personalities? We Tested 8 Models to Find Out

Recent research reveals that AI models exhibit distinct behavioral fingerprints. We used the Big Five Inventory to assess 8 leading LLMs and discovered each has a unique personality profile.

March 18, 20265 min read

Stop asking if you should use AI. Start using it.

Most designers aren't avoiding AI because they're lazy. They're afraid. That fear is understandable. It's still not a good reason to wait.

February 22, 20267 min read

One year building with AI, here is how it shaped me and my work

How going AI native for a year changed my process, my role, and how I think about what comes next.

March 17, 20265 min readBy Kristi Kumrija

Which AI tool is best for deep research? I tested five with the same prompt.

Claude won clearly. What all five got wrong was the more interesting finding.

So I wrote a detailed prompt. Specific requirements. Defined objectives. A clear output format. The kind of prompt that, if a human researcher received it, they'd know exactly what was expected.

Then I sent the same prompt to five tools and watched what happened.

The tools

Claude (Sonnet 4.6, Extended Thinking), ChatGPT, Manus, Gemini (Thinking), Grok (Expert mode).

Same prompt. Same task. No adjustments between tools.

1. Claude

Claude followed the prompt exactly. Produced the output in the format I asked for without any steering. The only limitation: tool-call limits meant I had to split the work across two sessions.

If you're doing this kind of structured research and you need thoroughness, nothing else I tested came close.

2. ChatGPT

3. Manus

4. Gemini (Thinking)

Genuinely disappointing. Gemini has been my default research tool for a while, so I came in with high expectations. What I got instead was a report. A summary. Not the structured output I asked for.

5. Grok (Expert)

Basic results. Shallow. Missed a lot of information. Did not follow the format.

The thing none of them got right

There's one failure mode that showed up across all five tools, and it matters more than the rankings.

Specific company website domains.

The ranking was roughly what I expected. The insight wasn't.

I went into this wanting to know which tool handles structured deep research best. I got that answer. Claude, clearly, followed by ChatGPT, then Manus, with Gemini and Grok at the bottom.

If you're building something right now, your domain is being read (and sometimes misread) by every AI research tool your potential customers might use. Worth knowing.

February 5, 20268 min read

Do AIs Have Personalities? We Tested 8 Models to Find Out

Recent research reveals that AI models exhibit distinct behavioral fingerprints. We used the Big Five Inventory to assess 8 leading LLMs and discovered each has a unique personality profile.

March 18, 20265 min read

Stop asking if you should use AI. Start using it.

Most designers aren't avoiding AI because they're lazy. They're afraid. That fear is understandable. It's still not a good reason to wait.

February 22, 20267 min read

One year building with AI, here is how it shaped me and my work

How going AI native for a year changed my process, my role, and how I think about what comes next.

The tools

1. Claude

2. ChatGPT

3. Manus

4. Gemini (Thinking)

5. Grok (Expert)

The thing none of them got right

The ranking was roughly what I expected. The insight wasn't.

Continue Reading

Do AIs Have Personalities? We Tested 8 Models to Find Out

Stop asking if you should use AI. Start using it.

One year building with AI, here is how it shaped me and my work

The tools

1. Claude

2. ChatGPT

3. Manus

4. Gemini (Thinking)

5. Grok (Expert)

The thing none of them got right

The ranking was roughly what I expected. The insight wasn't.

Continue Reading

Do AIs Have Personalities? We Tested 8 Models to Find Out

Stop asking if you should use AI. Start using it.

One year building with AI, here is how it shaped me and my work