Post ID: 45967211 Title: Gemini 3 Points: 1312 Total Comments: 814 Model: google/gemini-3-pro-preview Generated: 2025-11-19 15:23:34 JST
- Prompt tokens: 60,178
- Completion tokens: 3,776
- Reasoning tokens: 1,622
- Total tokens: 63,954
The discussion surrounding the release of Gemini 3 is characterized by a mix of technical admiration for the model’s raw power, skepticism regarding Google’s corporate motives and benchmarking methods, and confusion over the product's convoluted access tiers. While many users report "step-change" improvements in specific reasoning tasks—particularly in math and complex SVG generation—there remains a strong cohort of developers who prefer Anthropic’s Claude for "elegant" coding and instruction following.
Here is a summary of the themes expressed in the discussion.
A dominant theme in the thread is the community’s rejection of standard industry benchmarks (like SWE-bench or ARC-AGI) in favor of idiosyncratic, private tests. Users suspect that public benchmarks have leaked into training data, making high scores meaningless. The "Pelican on a Bicycle" SVG test became a central meme of the discussion, serving as a proxy for the model's ability to generalize visual reasoning. While Gemini 3 impressed many by solving verified "unseen" problems, skepticism remains high regarding how much of this performance is genuine reasoning versus rote memorization of established internet challenges.
prodigycorp argues that relying on standardized scores is a fool's errand: "This is a reminder that benchmarks are meaningless – you should always curate your own out-of-sample benchmarks. A lot of people are going to say 'wow, look how much they jumped in x, y, and z benchmark'... Meanwhile.. I'm still wondering how they're still getting this problem wrong."
panarky interprets the high scores, specifically in math and logic, as evidence of an architectural shift rather than just more data: "This isn't an incremental gain, it's a step-change leap in reducing hallucinations. And it's exactly what you'd expect to see if there's an underlying shift from probabilistic token prediction to verified search, with better error detection and backtracking when it finds an error."
simonw comments on the specific "Pelican" visual test he popularized, noting the community's suspicion that labs are now optimizing for it: "That would mean my dastardly scheme has finally come to fruition: [link to blog post about training for pelicans]."
m3kw9 summarizes the developer sentiment regarding hype versus reality: "That looks great, but we all care how it translate to real world problems like programming where it isn't really excelling by 2x."
Opinions on Gemini 3’s coding abilities are starkly divided. Users generally agree that the model has immense "cognitive horsepower" and context retention, making it excellent for one-shotting complex tasks like creating 3D clocks or niche widgets. However, for day-to-day software engineering, many users still prefer Claude Sonnet 4.5. The criticism often centers on Gemini's tendency to produce "over-engineered" or "defensive" code, its failure to follow negative constraints (e.g., "don't do X"), and agentic failures in CLI environments.
syspec provides a nuanced comparison between Gemini and Claude: "Time and time again Gemini spits out reams and reams of code so over engineered, that totally works, but I would never want to have to interact with. When looking at the code, you can't tell why it looks 'gross', but then you ask Claude to do the same task... and the code also works, but there's a lot less of it and it has a more 'elegant' feeling to it."
ogig expresses the "wow" factor regarding rapid prototyping: "I just gave it a short description of a small game I had an idea for. It was 7 sentences. It pretty much nailed a working prototype, using React, clean css, Typescript and state management... I'm more than impressed, I'm terrified."
mparis critiques the practical application of the model in command-line interfaces: "People copy and paste text in terminals. Someone at Gemini clearly thought about this as they have an annoying ctrl-s hotkey... But they then also provide the stellar experience of copying 'a line of text where you then get | random pipes | in the middle of your content'. Codex figured this out. Claude took a while but eventually figured it out. Google, you should also figure it out."
adastra22 counters the enthusiasm for newer models by pointing out regression in instruction following: "Sonnet 4.5 fails literally every time. The last point is how it usually fails in my testing... It usually ends up borking something up, and rather than back out and fix it, it does a 'git restore' on the file - wiping out thousands of lines of unrelated, unstaged code."
A significant portion of the thread focuses on Google’s data practices. Users discussed a leaked model card suggesting that Gemini 3 training data includes user data from Google products "in accordance with terms of service." This sparked anxiety that Gmail, Drive, and Workspace content is being used to train the models that act as "replacements" for the users creating that data. There is a palpable distrust of Google's corporate direction, with users feeling like "hostages" within an ecosystem that forces AI features (like AI Overviews) upon them without valid opt-out mechanisms.
rvz highlights the specific language in the leaked documentation: "So your Gmails are being read by Gemini and is being put on the training set for future models. Oh dear and Google is being sued over using Gemini for analyzing user's data which potentially includes Gmails by default. Where is the outrage?"
Dquiroga voices a darker, systemic critique of the enthusiastic reception: "The cognitive dissonance in this thread is staggering. We are sitting here cheering for a model that effectively closes the loop on Google’s total information dominance, while simultaneously training our own replacements... Why is the sentiment here 'Wow, cool clock widget' instead of 'We just handed the keys to the kingdom to the biggest ad-tech surveillance machine in history'?"
stefs offers a counter-interpretation regarding how data is used: "'gmail being read by gemini' does NOT mean 'gemini is trained on your private gmail correspondence'. it can mean gemini loads your emails into a session context so it can answer questions about your mail, which is quite different."
Users expressed significant confusion regarding how to actually access and pay for the new model. The distinction between Gemini Advanced, Google Cloud Vertex AI, and AI Studio keys created a fragmented user experience. Additionally, there is a sense of "AI fatigue," where the constant leapfrogging of models (Gemini beating GPT-5, following Claude) is becoming exhausting rather than exciting for practitioners who just want stable tools.
mccoyb captures the confusion affecting potential customers: "I truly do not understand what plan to use so I can use this model for longer than ~2 minutes. Using Anthropic or OpenAI's models are incredibly straightforward -- pay us per month, here's the button you press, great. Where do I go for this for these Google models?"
srameshc articulates the feeling of burnout regarding new model hype: "I think I am in this AI fatigue phase. I am past all hype with models, tools and agents and back to problem and solution approach... But not offloading to AI and buying all the bs, waiting it to do magic with my codebase."
coffeecoders notes that distribution power matters more than model quality: "The winners aren’t necessarily those with the best models, but those who already control the surface where people live their digital lives... Open models and startups can innovate, but the platforms can immediately put their AI in front of billions of users without asking anyone to change behavior."
The following quotes represent distinct, niche, or highly specific viewpoints that diverge from the general consensus on coding capability or corporate policy.
DrNosferatu asks a question that highlights the reliance on aggregator tools over direct provider access: "Anyone has any idea if/when it’s coming to paid Perplexity?"
SXX on moving beyond static benchmarks to complex animation generation: "Static Pelican is boring. First attempt: Generate SVG animation... Camera view must be from behind of goblin back so we basically look at tower in front of us... [Link to CodePen animation]."
kmeisthax uses satire to mock the absurdity of automated benchmarks: "The most devastating news out of this announcement is that Vending-Bench 2 came out and it has significantly less clanker meltdowns than the first one. I mean, seriously? Not even one run where the model tried to stock goods that hadn't arrived yet... and then e-mail the FBI about the $2 daily fee being deducted from the bot?"
TechDebtDevin connects AI usage to broader societal health metaphors: "If there was an unlimited pizza machine that cost $20.00 a month to create unlimited food, people would see that as a miracle! It would greatly benefit the percentage of the population that is food insecure, but could they be trusted to not eat themselves into obesity after getting their fill? ... Both of these scenarios look great on the surface but are terrible for society in the long run."
etrem notices a peculiar hallucination where the model questions reality based on search results: "It's now zeroing in on the temporal aspect. Examining the search snippets reveals dates like '2025-10-27,' suggesting a future context relative to 2024... I am now treating the provided timestamps as accurate for a simulated 2025."
kldg shares a whimsical failure of the previous model that they actually enjoyed: "in defense of 2.5 (Pro, at least), it was able to generate for me a metric UNIX clock as a webpage which I was amused by. it uses kiloseconds/megaseconds/etc... I can't seem to get anyone interested in this very serious venture, though."