Talk to Google Bard GPT

Freddy Domínguez
6 min readOct 5, 2023

--

Ask whatever you want to Google Bard without typing

Prompt: a detailed portrait of a white teacher talking with robot, digital art, realistic painting, dnd, character design
Lexica Prompt: a detailed portrait of a white teacher talking with robot, digital art, realistic painting, dnd

Google Bard, much like OpenAI’s ChatGPT, is a substantial language model (LLM) that possesses the ability to respond to inquiries, generate creative text, translate languages, and create diverse types of creative content. Currently, Google Bard is accessible to the public, although it is in a restricted beta release. Users interested in accessing Bard via web.

For those keen on integrating Google Bard with Python, it is crucial to be aware that there is no official API provided at the moment. However, it is possible just using “bardapi” in a Python environment.

pip install bardapi -q

There are several advantages to using Google Bard (specifically the “bardapi” version) over OpenAI (API version):

  1. Google Bard is more up-to-date, in contrast to OpenAI, which last updated its knowledge base in September 2021 just because can access to internet context(news, websites, images and so on).
  2. Currently, Google Bard can be used for free, while the OpenAI API version may incur charges.
  3. “Bardapi” has the capacity to comprehend chat history, a feature not readily available with the OpenAI API. Nevertheless, it’s worth noting that with the use of tools like LangChain, it’s possible to integrate memory state functionality with the OpenAI API, thereby enabling similar capabilities.

Here’s a sample guide on how to integrate Bard with Python using the “bardapi”:

from bardapi import Bard
import os
import requests
from dotenv import load_dotenv()
# This enables ongoing conversation with Bard in separate queries
session = requests.Session()
session.headers = {
"Host": "bard.google.com",
"X-Same-Domain": "1",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36",
"Content-Type": "application/x-www-form-urlencoded;charset=UTF-8",
"Origin": "https://bard.google.com",
"Referer": "https://bard.google.com/",
}
session.cookies.set("__Secure-1PSID", os.getenv("VAR_1PSID"))
session.cookies.set("__Secure-1PSIDTS", os.getenv("VAR_1PSIDTS"))
session.cookies.set("__Secure-1PSIDCC", os.getenv("VAR_1PSIDCC"))
bard = Bard(token=token, session=session)
response = bard.get_answer("I am from Peru")['content']
# do something with response
# By using the session we continue chatting without passing the chat-log
response = bard.get_answer("Who is the president?")['content']
An example of barapi’s response

Additionally, the Google Bard can return images in their JSON response, which is also an available feature in the Bard API.

Now, let’s create a user-friendly graphical interface (GUI) that incorporates audio interactions. This means you can ask questions in various languages, and the bot will respond to you immediately the audio was generated.

First of all, we need to install the Python libraries. In our case, we will use, of course, BardAPI, Gradio for the GUI, and openai Whisper to transform our microphone recordings into strings:

pip install bardapi gradio git+https://github.com/openai/whisper.git -q

Then, we start building a layout which contains three parts two columns on the top and a chat log on the bottom.

Arrangement Content within the Application

The top left contains a recording audio widget and radio buttons that allow model selection to transform audio into text. The top right will have a media player that plays instantly after receiving the bot’s reply, along with an elapsed time display indicating how much time has passed since recording the user’s audio until starting reproduce the reply. At the bottom, we will include a chat log, similar to a regular chat, to keep a record of the conversation. So let’s define its layouts:

# Initialize the Gradio interface with a custom layout
with gr.Blocks(title="Bard Chatbot") as app:
gr.Markdown("<div align='center'><h1>Welcome to the Bard Chatbot!</h1></div>")
with gr.Row():
with gr.Column():
gr.Markdown("## Record your message and chat with Bard.")
audio_input = gr.Audio(label="Input Audio Channel", source="microphone", type="filepath")
transcriber_options = [
# "tiny.en_whisper",
"base_whisper",
# "large_whisper",
# "speechbrain",
]
transcriber = gr.Radio(transcriber_options, value="base_whisper", label="Transcriber")
with gr.Column():
gr.Markdown("## Reproduce the Bot Answer")
audio_output = gr.Audio("init.ogg", label="Audio Output", autoplay=True)
elapsed_time = gr.Text(label="Elapsed Time")
with gr.Column():
chatbot = gr.Chatbot()

As we can see in the above code, we built the layout using Gradio block widgets. We used gr.Row to create two columns, each of them follows the structure mentioned earlier. In the former column, we set the source of gr.Audio as the microphone and changed the type to filepath so that the callback returns the storage file in the main function. The radio group will serve as a model selection, allowing the user to decide which model to use for transcription. In the latter column, we set up an Audio player, which starts with a previously recorded message containing an introduction for the user in our app. We set autoplay to begin playing immediately when the reply message is ready to be played.

To generate the initial audio, called “init.ogg”, we use the GCloud service incorporated into the BardAPI library. We only need to pass what we want to reproduce, and the API can automatically change the tone of the voice depending on the input text language, check the following script:

with open("init.ogg", "wb") as f:
f.write(bytes(bard.speech("Hello, I am Google Bard AI. Please, let's chat.")['audio']))

While the process happens in GCloud, it’s indeed a powerful tool that works quickly. Additionally, we generate an empty audio when the application hasn't an input in memory and stop the process to not display error messages or something else.

with open("empty.ogg", "wb") as f:
f.write(bytes(bard.speech(" ")['audio']))

Finally, we define our main function called chat_with_bard which has two parameters as input: the audio path and the transcriber selector, and it returns a tuple with 3 elements: 'the audio path' with bot reply, 'the elipsed time' and a list of chat_history to display in the widget. Just because we use a gradio blocks, meaning that we don't pass the widget into a gradio interface as usual. we use gradio listener-callback with gr.on decorator. The gr.on function is used in a Python code snippet to define a custom callback function for a Gradio interface.

@gr.on(inputs=[audio_input, transcriber], outputs=[audio_output, elapsed_time, chatbot])
def chat_with_bard(file_wav, transcriber):
if file_wav is None:
# Skip if there's no audio input
return "empty.ogg", f"{0} sec.", chat_history
start_time = time.time()
user_input = transcribe(file_wav, transcriber)
raw = bard.get_answer(user_input)
bot_message = random.choice(raw['choices'])['content'][0]
if raw['images']:
image = random.choice(raw['images'])
image_markdown = f"![]({image})\n\n"
else:
image_markdown = ""
full_response = image_markdown + bot_message

chat_history.append((user_input, full_response))

with open("bard.ogg", "wb") as f:
f.write(bytes(bard.speech(clean_code_blocks(bot_message))['audio']))
return "bard.ogg", f"{round(time.time() - start_time)} sec.", chat_history

The clean_code_blocks function simply removes coding scripts from the markdown response message. In the final chat, if the response has an image, we select it and put it into the markdown in order to be displayed.

We did it. We can ask directly to Google Bart without typing:

Talking in english
Talk in other Language
Accessing Live Content via the internet
Support markdown annotation in order to include tables, images and external links

The most significant benefit of this approach is that Bard supports multiple languages, and the voice synthesizer can adapt based on Bard’s response. This allows for seamless language switching without disrupting the conversation.

I highly recommend considering either a change in the transcriptor object or the utilization of GPUs to expedite the transcription process from records. Additionally, it’s advisable to explore alternative models such as ‘tiny.en_whisper,’ ‘large_whisper,’ ‘speechbrain,’ or ‘whisper-jax’ to assess whether they can offer improved speed and efficiency.

In conclusion, the Bard-API package offers a practical and efficient method to interact with Google Bard’s response API within your Python environment, filling the gap until the official release of a python Bard API. By using Bard-API and audio interfaces, you can talk and expect the response of Bard should work as any other voice assistant.

You can find the whole project here.

You can find the Google Colab Notebook here.

Thank you for being here, please comment down your views, if any mistakes found the article will be updated

Reference

  1. https://github.com/romellfudi/talk_to_BardGPT
  2. https://github.com/dsdanielpark/Bard-API
  3. https://www.gradio.app/guides/controlling-layout
  4. https://github.com/openai/whisper

--

--

Freddy Domínguez

Peruvian #Software #Engineer CIP 206863, #Business #Intelligence #Data #Science. I work with people to create ideas, deliver outcomes, and drive change