Using Windows 11 and Xojo 2024 release 4.1
Does anyone have some example of how to send a png image file to gpt-4o-mini so that it can analyze the image and return specified info from the image back in json format? I need to do this in a console app if possible and store the returned info in a database. Some of the images are rotated. I have been able to send text to the AI but would prefer to just send the image data without having to extract the text.
Thanks
I donât have an api account for openai to test but according to the docs it looks like you can encode your image into a base64 encoded string using something like:
var base64Image as String
Var f As New FolderItem ("image.jpg")
If f <> Nil And f.Exists Then
Var imageStream As BinaryStream = BinaryStream.Open(f, False)
Var imageData As MemoryBlock = imageStream.Read(f.Length)
imageStream.Close
base64Image = EncodeBase64(imageData)
End If
Then pass base64Image into the API in the request JSON:
{
"type": "image_url",
"image_url": {"url": f"data:image/jpeg;base64,{base64Image}"},
},
Ok, So I base64 encoded the png image file as âimagedataâ and tried to pass it to the AI. I am using the ChatGPT example project and modifying the prompt info. Can you help me get this figured out. Here is the code that I am using, In the prompt I am asking it to return specific info in json format.
If Not maintainContext Or ContextHistory = Nil Then
ContextHistory = New JSONItem
End If
'Get the connection ready to post
Var j As New JSONItem
j.Value("model") = Model
RequestHeader("Authorization") = "Bearer " + APIKey
RequestHeader("Content-Type")= "application/json"
'Create a message from the prompt
Var message As New JSONItem
message.Value("role") = "user"
message.Value("content") = prompt
if imageData <> "" then
message.Value("image") = imagedata
end if
ContextHistory.Add(message)
'Add all messages
j.Value("messages") = ContextHistory
'Set the temperature (0 to 2 - amount of creativity/hullucination)
j.Value("temperature") = Temperature
SetRequestContent(j.ToString, "application/json")
Var response As String
response = SendSync("POST", "https://api.openai.com/v1/chat/completions", TimeOut)
Var status As Integer = HTTPStatusCode
Var r As New JSONItem(response)
If status = 200 Then
'Get the reply message
Var choices As JSONItem = r.Value("choices")
Var firstChoice As JSONItem = choices.ValueAt(0)
Var theMessage As JSONItem = firstChoice.Value("message")
Var answer As String = theMessage.Value("content")
'Add the reply message
message.Value("role") = "assistant"
message.Value("content") = answer
ContextHistory.Add(message)
'If necessary, trim the context so we don't send one that is too big
TrimContext
imagedata = ""
Return answer
Else
'Remove the message the user just sent since it resulted in an error
ContextHistory.RemoveAt(ContextHistory.LastRowIndex)
Var e As New ChatGPTException
Var error As JSONItem = r.Value("error")
Var errorMessage As String = error.Value("message")
e.Message = errorMessage
Raise e
imagedata = ""
Return ""
End If
What you are looking for is called âStructured Outputsâ. Basically you need to tell the model, the type of response you need, in your case a JSON response. More info at: https://platform.openai.com/docs/guides/structured-outputs?lang=curl&example=chain-of-thought#examples