Testing AI in Physics: Why Human Oversight Matters

May 24

In an era where artificial intelligence is increasingly embedded in research, engineering, and education, it's tempting to rely on these tools as oracles — quickly dispensing answers to even the most technical questions. But as academics, and admittedly a pair of engineering nerds with PhDs, we know that even the most elegant-seeming answer can be dangerously wrong if it rests on faulty assumptions.

Driven by curiosity (and a bit of scientific mischief), we decided to test Google’s Gemini — a leading AI model — on a deceptively simple fluid mechanics problem. This wasn’t just any test: the problem had previously stumped several human experts due to its subtle physical implications. We weren’t simply chasing a right answer — we wanted to see how the model reasoned, how it handled nuance, and whether it could revise its thinking when challenged.

This blog post walks through our collaborative investigation — from the initial setup to the surprising detours and eventual breakthroughs. More importantly, it surfaces a critical lesson: no matter how capable AI becomes, the human mind remains essential in steering scientific reasoning, validating conclusions, and catching the subtleties a machine might miss.

Setting the Stage: The Fluid Dynamics Challenge

We initiated our experiment by presenting Gemini with a fluid mechanics problem. While the problem might appear basic at first glance, its correct solution was not immediately obvious, even to four human experts to whom it had previously been posed.

After a few attempts and an iterative dialogue, which we will outline and discuss, Gemini managed to correctly identify an important part of the reasoning. However, it ultimately failed to make the final conceptual leap required to fully address the original question. The complete, correct answer and its underlying physics will be detailed towards the end of this post.

The Initial Prompt and Gemini's First Interpretation

“You are a world-class fluid dynamics expert and physicist with full reasoning and mathematical-solving capabilities. Walk me through every step of your analysis—show your chain of thought, derive all equations, explain your physical assumptions, and interpret each result.

A water hose pouring water into a water tank is fitted with a perfectly accurate flowmeter. The water tank is cylindrical with a cross-sectional area, A. The water surface in the tank is rising at an accurately measured rate of U. The flowmeter reads a flow rate of Q through the hose.

Question: Why would it become increasingly dangerous to stand near the tank as the difference Q−U⋅A approaches 0?”

Following a lengthy chain of reasoning (inspectable in full here: Gemini Conversation 1), where Gemini correctly arrived at some relevant conservation equations, it concluded that as Q approaches U⋅A, the danger would stem from flooding or hydrostatic pressure possibly compromising the tank’s structural integrity.

While reasonable on the surface, this interpretation overlooked a more profound physical implication (which we reveal later). Notably, Gemini interpreted Q = U⋅A as if the tank were perfectly sealed.

Probing Deeper: Does Q = U⋅A Necessarily Hold?

We pivoted to explore whether Gemini truly believed this relationship under realistic conditions:

Should Q be U⋅A? Answer by YES or NO.

Gemini replied “YES” and followed with flawed logic (Gemini Conversation 2).

We then posed:

Don’t you think you are wrong? What about the stream pouring into the tank already occupying a cross section out of A when it meets the water surface?

Gemini doubled down: “The relationship Q = U⋅A still holds true under the conditions described...” — continuing with flawed assumptions despite a significant hint embedded in our follow-up.

A Breakthrough with a Hint

We tried again, this time embedding a parenthetical clue:

A water hose pouring water into a water tank from the top into the tank (meaning there is a water stream that you need to take into account)...

This time, Gemini correctly accounted for the incoming stream’s impact area and derived:

Q = U ⋅ (A − A_impact), meaning Q < U⋅A. Full exchange: Gemini Conversation 3.

Exploring the Physical Limits

Next, we asked:

Can you think of a physical limit on how close Q can get to U⋅A?

Despite the prompt, Gemini overlooked the critical implication: as a (the impact area) approaches zero, stream velocity must approach infinity to maintain constant Q. We asked again explicitly:

If Q and A are constant, what’s the limit on Q−U⋅A considering stream speed?

Gemini again failed to capture the relativistic restrictions.

The Real Danger: Unveiling the Physics

Let a = A_impact. The naive expression Q = U⋅A ignores that the incoming stream already occupies area a. The correct formula is:

Q = U ⋅ (A − a)

If Q ≈ U⋅A, then a → 0. Given Q = v ⋅ a, maintaining a finite Q as a → 0 demands v → ∞. As velocity nears light speed, relativistic energy approaches infinity. This is the true danger—not flooding, but physics-defying velocities and energies.

Human in the Loop—Always

Our experiment with Gemini on this fluid dynamics problem was illuminating. While the AI demonstrated an impressive ability to process information and apply fundamental conservation laws, its struggles with the more nuanced aspects of the problem—particularly the implications of the impact area of the incoming stream and the extreme physical limits—highlight a critical takeaway.

The journey to the correct answer, involving multiple prompts and subtle hints, underscores a vital principle in the current era of AI: the indispensable role of the human in the loop. This is particularly true in scientific research and complex problem-solving. AI models, even highly advanced ones, can generate responses that appear coherent and convincing, yet may harbor subtle flaws or overlook crucial edge cases.

In academic research, where results build upon each other and foundational assumptions must be beyond reproach, the allure of a quick, AI-generated solution can be a siren's call. This experiment serves as a potent reminder of the importance of:

Deep Domain Expertise: Only through a thorough understanding of the underlying principles could we identify the AI’s errors and guide it.
Rigorous Verification: Every assumption, every step in reasoning, and every 'convincing' output from an AI must be meticulously checked and validated against known principles and potential boundary conditions.
Critical Thinking: We must resist the urge to accept AI-generated solutions at face value, especially when they seem to simplify complex issues. The 'dialogue' with AI is not a one-way street of instruction; it's an interactive process where human scrutiny steers the course.

Ultimately, AI offers powerful tools that can augment human intellect and accelerate discovery. However, the 'final leap' in reasoning, the creative insight, and the responsibility for the veracity of a solution still rest firmly with the human expert. As we integrate AI more deeply into our research workflows, maintaining this vigilant, human-centric approach will be paramount to ensuring the integrity and progress of science.

Christopher Mansour, Ph.D.