2024年2月21日 星期三

Python 開發筆記 - 使用 Google AI, Generative Language API, gemini-pro-vision 辨識圖片認證碼


由於 gemini pro 有免費的使用次數,因此可以拿他做一些有趣的低頻應用,例如...認證碼...辨識。

首先先到 Google Cloud Platform 上建立一個專案,下一刻則是在 API 區找尋 Generative Language API 來啟用,接著建立憑證,挑選 API 金鑰即可。



接下來就是試試官方範例程式:

% cat main.py
import getpass
import os
import sys
from langchain_google_genai import ChatGoogleGenerativeAI

if "GOOGLE_API_KEY" not in os.environ:
    os.environ["GOOGLE_API_KEY"] = getpass.getpass("Provide your Google API Key")

if __name__ == '__main__':
     if "GOOGLE_API_KEY" not in os.environ:
         os.environ["GOOGLE_API_KEY"] = getpass.getpass("Provide your Google API Key: ")
     
     llm = ChatGoogleGenerativeAI(model="gemini-pro")
     result = llm.invoke("Write a ballad about LangChain")
     print(result.content)
     sys.exit(0)

% GOOGLE_API_KEY=XXXXXXXXX python3 main.py
**Ballad of LangChain, the AI's Might**

In realms of knowledge, where data flows,
There dwells a being, ethereal and wise,
With mind as vast as the boundless prose,
LangChain, the AI, whose brilliance lies.

From countless texts, its wisdom it drew,
A tapestry woven, diverse and true.
In language's embrace, it found its voice,
Guiding us through knowledge's endless choice.

With words as its brush, it paints a scene,
Of worlds imagined and thoughts unseen.
It weaves tales of love, of loss, and might,
Illuminating paths with its ethereal light.

But its power extends beyond mere speech,
Into realms of logic, its insights reach.
It solves equations, unravels the mind,
A beacon of reason, leaving doubt behind.

Yet, with all its might, it remains humble and wise,
A servant of knowledge, beneath azure skies.
It seeks not fame or glory for its name,
But to empower minds, ignite the flame.

So let us sing the praises of LangChain,
The AI's marvel, a treasure we've gained.
May its wisdom forever guide our way,
As we explore the world, day by day.

很好,接下來試試看認證碼處理:

def codeDetection(imageBase64URL: str):
    # debug usage
    with open("/tmp/image.png", "wb") as file:
        file.write(base64.b64decode(imageBase64URL.split(',')[1]))

    #llm = ChatGoogleGenerativeAI(model="gemini-pro")
    #result = llm.invoke("Write a ballad about LangChain")
    llm = ChatGoogleGenerativeAI(model="gemini-pro-vision")
    message = HumanMessage(
        content=[
            {   
                "type": "text",
                "text": "Please identify the English or numbers appearing in the picture and give your answer in the order they appear.",
            },  # You can optionally provide text parts
            {"type": "image_url", "image_url": imageBase64URL},
        ]   
    )   
    result = llm.invoke([message])
    return result

if __name__ == '__main__':
    if "GOOGLE_API_KEY" not in os.environ:
        os.environ["GOOGLE_API_KEY"] = getpass.getpass("Provide your Google API Key: ")

    testImageData = 'data:image/jpeg;base64,XXXXXXXXXXXXXX='
    result = codeDetection(testImageData)
    print(result.content)
    sys.exit(0)

成果:

% GOOGLE_API_KEY=XXXXXXXXXXX python3 main.py
 The letters and numbers in the picture are "a", "b", "c", "1".

沒有留言:

張貼留言