AI As A Product

Or, if I was the Product Marketing Manager for Google's Gemini (Future Telescope 33)

Mar 17, 2024

Publishing on a twice-a-month schedule is tough sometimes. For example, I wrote to you with a state of AI update on Feb 15, 2024, and within an hour of my article going live, two groundbreaking AI product announcements came through. I’ll talk about one of them, OpenAI’s Sora, towards the end of this post. First, I want to talk about the other - Google’s Gemini 1.5.

1. What is Gemini?

Now I wouldn’t blame you if you didn’t know about Google Gemini. Here is a bit of backstory to get you all caught up.

2017: Some Googlers invent a new AI architecture called “The Transformer”. It uses matrix multiplication to encode “attention scores” or weights as information. Such attention scores help it learn from vast amounts of data and produce its own data as an output when prompted. Here’s a classic video to help you understand Transformers:

2019: While the Transformer architecture sees only “decoder” or “encoder” applications for a couple of years, in 2019, GPT 2.0 becomes the first mainstream “Large Language Model”, an encoder+decoder Transformer model. OpenAI thus becomes the first company to truly innovate on this groundbreaking architecture Google had developed. If you want to understand LLMs deeper, I can’t recommend enough the website spreadsheets-are-all-you-need.ai. This website helps you understand a GPT 2.0 class LLM simply through Microsoft Excel. Here’s the first video in their series:

November 2022: OpenAI captures the world’s imagination by launching the first mass market consumer product in the world of AI with ChatGPT, powered by GPT 3.5. It becomes the fastest growing product in history, and brings tremendous waves of consumer interest in a technology (LLMs) which so far has been limited to the B2B space. Here’s my first use of ChatGPT online:

Hello Universe

GPTSENSE

OpenAI built a chatbot on top of its GPT3.5 large language model. It is a great writer, and it swelled up to 1 million heavily engaged users in 5 days…

3 years ago · 17 likes · 2 comments · Punit Thakkar

February 2023: After months of being told that they were slow to market with their consumer facing generative AI (read LLM) product, after months of being told that search will die and chatbots will kill it, Google releases its first LLM product - Bard. Now to be fair, Google had already been using its own architectures like BERT and MUM (which weren’t explicitly consumer facing AI applications) as part of search for years! Yet, it was in the productization of its language models that Google stumbled. Why? Because Google lost $100 billion in market cap due to Bard. Why? Because the launch video for Bard had inaccuracies. Oh no.

December 2023 to February 2024: Google renames Bard to Gemini. Packages it in a new way, offering a “Nano”, “Pro”, and “Ultra". Nano works on-device for local applications, Pro is a GPT 3.5 equivalent model available to users for free, and Ultra is priced at $20 per month, supposedly equivalent to GPT 4. I took its 2 month free trial, but have found ChatGPT to be more useful so far, but I still try to find more ways to integrate into my daily workflow. While there was a silly snafu with Gemini’s text to image model, it had mostly to do with Imagen 2, another Google model which was hastily slapped on to Gemini. Imagine ChatGPT working with an outdated Dall E. That led to a brief spat of negative PR for Google.

Which brings us finally to…

16 February 2024: Gemini 1.5 is announced with a 1 million token context window! 1 million tokens! This model can literally read entire books, watch entire movies, and deliver answers with extremely high fidelity. Finally, Google feels like it has arrived! And additionally, this 1 million token context window is available with “Gemini Pro” the free version for Gemini. That’s insane! I got access to it recently, and played around with a few prompts. While I haven’t tested its limits yet, I am definitely impressed by its context window and the power it brings.

In addition, Google also released Gemma, its open source language model. All these product names led to considerable confusion

2. What is Google Really Going For Here?

I’m a marketer by profession. I cut my teeth in Indian real estate, a fiercely competitive field, which went through a terrible down-cycle in the first seven years of my career, and taught me how hard it is to market products to customers when there is just not enough demand. Working in such a field with such difficult circumstances forces one to be innovative and to build predictability into their work.

Having driven product marketing for the launch of 15 products, and more than 50 existing products, I came to learn and internalize a specific approach/principle for product marketing. Here’s some context for the principle:

When you build a real estate product, you inherit specific constraints - land and regulations. Whatever the land and the regulations allow you to do, form the base product. Thus, you may start off with the assumption that you can build 750000 square feet of livable space on a piece of land. But that’s all you get. What do you build on it, how it is sold, and to whom it is sold, are all things you have to define as a product marketer.

So what do you do? First, you look at the micromarket’s demand patterns. What moves quickly in this market? What size? How many bedrooms? Is there a clear gap in the offerings here? Are there no 3 bedroom homes? Are there too many 1 bedroom homes? Second, you look at your competitive landscape. Which other projects exist within 5 kilometers of your land parcel? What kind of homes do they offer? At what price?

Most importantly, you understand the customer. Who will buy homes in this location? What is their personality type? What are their needs? And most importantly, what are their pain points?

It is those pain points which drive the whole messaging around that product. That messaging MUST be encapsulated into a USP (Unique Selling Proposition) statement. And, as the approach sharpened over time, I learned that the USP statement has the highest impact when sharpened to be a single line, with a single number within it.

For example, if the biggest pain point of your customers who work with many international companies is that many of them work night shifts and hence they seek 24x7 amenities, your product USP becomes “The only homes in <locality> with 24x7 amenities”. Or if the biggest pain point of your customers who are primarily young couples concerned with the lack of child-friendly amenities in the area, your product USP becomes “The only homes in <locality> with 50+ child-friendly amenities.” Or maybe “The only homes in <locality> with a 4 acre forest.” where the biggest customer pain point is the lack of green spaces.

3. A Non-existent USP

It’s not just enough to have a USP. Each USP must be qualified by “RTB’s”, or “Reasons to believe”, specific product features that give the customer a reason to believe that the USP being touted will actually be delivered.

I come from an industry that looks different from the software and technology industry, but over time, having looked at both industries closely, I’ve found that there are in fact more similarities between real estate development and software development.

Both are acts of “development”, i.e. creating something from scratch where there used to be nothing.
Both come with constraints. Real estate in terms of land and regulations, and software in terms of code, time, and sometimes regulations.
Both too, come with long timelines. It takes a real estate project anything from 4 to 10 years to go from an on-paper thing to a real thing. And many software initiatives take several years to go from a research project to a real product.
What this means, is that both industries deal with long capital cycles, and building predictability into those products’ long term financial prospects is not only essential, but in most cases existential.

Thus, when looking at both the industries, these similarities lead me to believe that a clear articulation of a USP and its RTBs is something that would help software products too. That is why, when I look at the landscape of today’s generative AI chatbot products, or more specifically, Google Gemini’s competitive landscape, I can see very clearly the kind of USPs that each competitor comes with.

First, there is ChatGPT. The USP of ChatGPT is clear: The only product which positions you at the cutting edge of AI experiences*. Starting as a chatbot, it has evolved into a platform offering various experiences through the GPT store, and new forms of interaction like voice calls with AI, which I love using. This emphasis on offering evolving AI experiences distinguishes ChatGPT from its competitors in the market.

chatgpt

A post shared by @chatgpt

Then, there is Claude. The USP of Claude is clear: it is developer-focused and aims to provide the highest quality product at lowest cost to developers who want to integrate large language models (LLMs) into their software applications*. Claude has positioned itself as a fairly priced and reliable option, consistently offering accurate responses and sometimes even pioneering new capabilities in the LLM space. Despite its recent move to introduce a B2C plan with Claude 3, its developer-centric focus remains evident, with features like the million token context window available by request only for developers. This positioning distinguishes Claude from other competitors in the market. The launch page for Claude 3 does most of the heavy lifting for its developer focused USP.

Let us also consider Microsoft in this competitive landscape. Microsoft came in swinging to this field, developing products like Github Copilot and Bing AI. Over time, both these products have blended into one overarching “Copilot” brand. Microsoft is positioning Copilot as “your everyday companion”, but it seems to struggle just as much as Google when it comes to defining its USP.

Both Google Gemini and Microsoft Copilot struggle on this front because they are trying to be everything for everyone.

Is the product a writing companion in Office apps? - Yes.

Is the product a chatbot? - Yes.

Is the product an image generator? - Yes.

Is the product usable for code writing? - Yes.

Does the product come bundled with Search? - Yes.

Is the product a way to sell more Azure / Google cloud subscriptions? Yes.

Is the product prone to hallucinations and inaccuracies? - much more than GPT 4. (More for Gemini than Copilot as the latter is built on top of GPT 4).

What is the product?

What is the USP?

Who is the actual customer?

What should I use it for?

The answers to these questions seem to evade us.

*Note - The USP statements for ChatGPT or Claude are a single line, but I haven’t found a “single number” to add to that line. I’ll need to think some more about it.

4. So What Happens Now?

Google is currently hiring a Product Marketing Manager for Gemini.

Now I would LOVE to be a part of my dream company working on a product that I feel so passionately about, but I’m not sure if in today’s economic environment Google is up for hiring a real estate marketer for a software product and pay for his immigration to the States. If you are up for it - Google, call me, let’s talk. You could also leave a comment:

5. Miscellaneous

OpenAI announced SORA, and it has had tremendous coverage so far. SORA is a text to video or image to video generation tool. It is a diffusion model on a transformer architecture. How it differs from previous video generators is its ability to have foresight of future frames of the video, thus maintaining consistency in output across frames. It can still be inconsistent though, facing many problems like distorted fingers or cars changing colors. Apparently it is good enough that it stopped Tyler Perry from investing $800 million on studio expansions. Here are two videos to catch you up on this technology.

Anthropic announced Claude 3, the next generation of its LLM. Inspired by Google’s Nano, Pro, and Ultra packaging for Gemini, Claude now has three levels too - Haiku, Sonnet, and Opus. It claims that Opus beats GPT 4 on several fronts, and the AI twitterverse is ablaze with conversations about Claude 3, especially how it seems to have a personality and speaks in a more human tone, compared to other LLM products out there. It has also generated some buzz because of its propensity to speak about existential topics as an AI, similar to the Bing Sydney controversy from a year ago (wow, it’s been a year since Sydney was a thing, huh. Why does it feel longer?). These videos should get you up to speed on Claude 3.

This guy created an open source version of the best image upscaler on the market, and it seems to be pretty good. You can check it out here if you are into image upscaling. Upscayl is also a great free tool for image upscaling that you can install on your computer and start using right away.

Midjourney, my favorite image generator released a consistent character feature. Yes, now you can have the same face appear in multiple settings across multiple images. There is nothing I can say about this that @nickfloats, the Midjourney dude on Twitter, hasn’t said already, so check out this thread to see more:

It’s been an exciting month in the world of AI. So much changes in this world that it gets tough to keep track of it all. The reason I didn’t share this post on the 15th was that I was secretly hoping that OpenAI would announce GPT 5 on the 16th of March, the anniversary of GPT 4’s release but alas, it was not to be so. If there’s a development that I missed discussing in today’s edition, please share it in the comments below! I love discussing this world with fellow enthusiasts, and I look forward to your thoughts.

I hope someone important at Google reads this, and if you know someone there, please do share today’s article with them.

A beautiful world, imagined by Midjourney v6

That’s it for this draft, see you next month!