OpenAI and Microsoft add new image generation and AI agent features—and showcase their platform advantage |
|
|
Hello and welcome to Eye on AI. In today’s edition…OpenAI releases a more capable image generator, while Microsoft plants its stake in reasoning agents; Google debuts Gemini 2.5 “thinking” models; Amazon tests AI assistants for health and shopping; Character.AI gives parents a lens into their kid’s time on the platform; and AI companies’ aggressive crawlers overwhelm open-source projects.
This week, both OpenAI and its backer (and erstwhile rival…it’s complicated) Microsoft rolled out some of their newest and most powerful AI capabilities into their main platforms. OpenAI debuted a new image generation capability for GPT-4o on its main ChatGPT service, offering enhanced image creation and granular editing capabilities based on text prompts. This makes the image generation far superior to what was possible with its earlier DALL-E model. And Microsoft announced its boosting its Microsoft 360 Copilot offering with two “deep reasoning agents,” as well as “agent flows” designed to remove some of the unpredictability that comes with using AI agents.
ChatGPT already had an image generator, and Microsoft has already rolled out various sorts of agents geared toward the enterprise. Both releases, however, offer a new twist on what they were offering—and shows the power of being able to instantly roll out a new feature on a platform that already has hundreds of millions of users. Having that kind of distribution is a huge edge as the competition among similar products heats up.
4o Image Generation raises the bar Now rolling out to ChatGPT Plus, Pro, Team, and free users, OpenAI describes the new integration as its “most advanced image generator yet.” And I have to say, the results are impressive.
Overall, 4o Image Generation can produce vivid realistic scenes and produce impressive “style transfer” transformations of uploaded images based on prompts. (You can also edit key features of uploaded images, just from prompts too.) Based on the plethora of images flooding the ChatGPT subreddit, this style transfer capability is proving popular. An image the model created after a user prompted it to change the “Distracted boyfriend” meme into the style of “South Park,” for example, is honestly kind of shocking in how spot-on it is to the show’s visual look—no wonder companies creating generative AI models are being inundated with copyright lawsuits. On another note, users are already testing the boundaries of creating images of public figures such as Donald Trump and Elon Musk. OpenAI confirmed to Eye on AI that it isn’t restricting the new image model from creating images of real people except in cases of nudity or graphic violence. This represents a shift from its restrictions for DALL-E, which would refuse to generate images of real people.
Perhaps the most interesting advancement, however, is the massive leap in the model’s ability to generate text. DALL-E and other previous image generating models would usually create garbled text, but 4o Image Generation can create long, detailed, and accurate strings of text inside images. The first example in OpenAI’s blog post shows an entire whiteboard of text that is easily readable and accurate to the prompt.
A ‘researcher’ and ‘analyst’ join your 365 workspace Microsoft describes its new “deep reasoning agents” for Microsoft’s 365 Copilot as being designed to “handle complicated tasks that require detailed analysis, methodical thinking, and nuanced understanding.” Based on OpenAI’s o1 reasoning model, the Researcher agent is geared toward multi-step research and integrates with external platforms like Salesforce, ServiceNow, and Confluence to garner insights from across a company’s data. Then there’s the Analyst agent, which is based on Open AI’s o3-mini reasoning model. Microsoft claims it’s optimized to do advanced data analysis at work, uses chain-of-thought reasoning, and can run Python to tackle complex data queries. Both are set to start rolling out in April.
In addition to the new agents, Microsoft also announced a new capability it’s calling “agent flows” that’s meant to add predictability to the use of agents. Agent flows provide structured, rule-based workflows that incorporate AI actions, following predefined and deterministic paths. This is important because as I wrote in last Thursday’s newsletter, AI agents have serious issues with reliability and can be risky, especially when it comes to critical actions or sensitive data.
Various companies have been releasing AI agents touted for “deep research” lately, but Microsoft 365’s role as many businesses’ central platform—and integration with all the other data products they use—gives Microsoft a unique advantage. The AI field is crowded with companies competing with similar products, each jockeying for toeholds of differentiation. These updates make advanced AI features easily accessible right where the users of popular products already operate, which is likely to be a significant market advantage. What’s more, both Microsoft and Google moved to bundle their AI features into their enterprise software by default—and raised the prices of the core products—after previously allowing customers to opt-in to the AI features for an extra cost. That’s the platform advantage.
And with that, here’s more AI news.
Sage Lazzaro sage.lazzaro@consultant.fortune.com sagelazzaro.com
|
|
|
AI: Speed matters more, scale matters less, innovation matters most As businesses embrace AI-driven models, they’ll need to rethink everything from workforce strategies to innovation processes. Critical shifts in strategy will emphasize speed more, scale less and innovation most of all. The time to embrace AI is now. Read more
|
|
|
Google debuts Gemini 2.5 “thinking” models. The new models are the latest releases from the major AI players designed to perform chain-of-thought reasoning, wherein they pause to “reason”—producing a step-by-step plan for answering a prompt—before responding. The method produces better results for some complex tasks. Gemini 2.5 Pro is the first to roll out and will initially be available to developers and Gemini Advanced subscribers. Google says the model surpassed competitors on several benchmarks, and developer and blogger Simon Willison wrote that it’s “very, very impressive” at coding tasks following a hands-on test. You can read more from 9to5Google.
The U.S. announces new export restrictions to prevent China from acquiring U.S. computing tech. The new controls announced yesterday by the Trump administration target 80 companies—50 of them Chinese firms—with others located in Iran, Taiwan, Pakistan, South Africa, and the United Arab Emirates. The list includes many customers of Nvidia, Intel, and AMD. The Beijing Academy of Artificial Intelligence is also on the blacklist. As AI technology advances quickly in both the U.S, and China, the technology has been at the forefront of growing national security concerns and trade escalations between the two countries. You can read more from the New York Times.
Amazon tests AI assistants for health and shopping. The shopping assistant, called Interests AI, allows users to describe what they’re looking for in conversational language and receive a curated product selection. The Health AI chatbot is designed to answer common health-related questions. The company is currently testing the new features with a small subset of users, CNBC reported.
Anthropic scores an early win in a music copyright lawsuit. According to Reuters, the company convinced a California judge to reject a preliminary bid by Universal Music Group to prevent it from using lyrics owned by the group and other music publishers to train its AI model. The judge said the publishers’ request was too broad and failed to show Anthropic’s actions caused them “irreparable harm.” The lawsuit claims the company used the lyrics without permission and is one of many AI-related copyright cases currently making its way through the courts.
Character.AI adds a Parental Insights feature to give parents some clue about what their teens might be up to. The new feature sends parents a weekly report of their kid’s usage, including average time spent on the platform, the characters they most frequently interacted with, and how much time they spent talking to each character. The reports do not include transcripts of the chats or give any insight into chat content. The Parental Insights feature is part of Character.AI’s attempt to make the platform safer for underage users following a slew of concerns and lawsuits, as well as the suicide of a Florida teenager who committed suicide after developing a romantic infatuation with one of Character.ai’s chatbots. You can read more from TechCrunch.
|
|
|
April 9-11: Google Cloud Next, Las Vegas
April 24-28: International Conference on Learning Representations (ICLR), Singapore
May 6-7: Fortune Brainstorm AI London. Apply to attend here.
May 20-21: Google IO, Mountain View, Calif.
July 13-19: International Conference on Machine Learning (ICML), Vancouver
July 22-23: Fortune Brainstorm AI Singapore. Apply to attend here.
|
|
|
97% That’s how much of some open source software projects’ traffic is coming from AI crawlers, according to a report from LibreNews.
The developers who maintain open source software projects —code that is freely available and that props up a lot of the internet running—say aggressive crawlers belonging to AI companies like OpenAI, Amazon, and Alibaba are overwhelming their infrastructure. Such bots have become the backbone of the AI industry, scraping the internet for training data as well as conducting real-time searches to keep models current. Because coding is one of the most widely adopted uses of generative AI right now, and coders are looking for models that can code the best, open source code has become a particularly juicy data target for the crawlers. Open source developers say they often see the bots crawling pages over and over again, suggesting ongoing data collection.
Ars Technica further detailed their experiences in a recent story, describing how the crawlers are designed to deliberately circumvent measures to block them. Open source software is typically maintained by individual volunteers with very limited resources, so the financial and technical burdens AI crawlers are putting on them weighs heavily. Open source developers say the overwhelming and “predatory” nature of AI companies’ crawlers dramatically increase their bandwidth costs, cause service outages,and have left them feeling desperate.
|
|
|
Thanks for reading. If you liked this email, pay it forward. Share it with someone you know. Did someone share this with you? Sign up here. For previous editions, click here. To view all of Fortune's newsletters on the latest in business, go here.
|
|
|
|