Send image to gpt-4o-mini to analize and return json

Using Windows 11 and Xojo 2024 release 4.1
Does anyone have some example of how to send a png image file to gpt-4o-mini so that it can analyze the image and return specified info from the image back in json format? I need to do this in a console app if possible and store the returned info in a database. Some of the images are rotated. I have been able to send text to the AI but would prefer to just send the image data without having to extract the text.
Thanks

I don’t have an api account for openai to test but according to the docs it looks like you can encode your image into a base64 encoded string using something like:

var base64Image as String
Var f As New FolderItem ("image.jpg") 
If f <> Nil And f.Exists Then
  Var imageStream As BinaryStream = BinaryStream.Open(f, False)
  Var imageData As MemoryBlock = imageStream.Read(f.Length)
  imageStream.Close
base64Image = EncodeBase64(imageData)
End If

Then pass base64Image into the API in the request JSON:

{
                    "type": "image_url",
                    "image_url": {"url": f"data:image/jpeg;base64,{base64Image}"},
                },

Ok, So I base64 encoded the png image file as “imagedata” and tried to pass it to the AI. I am using the ChatGPT example project and modifying the prompt info. Can you help me get this figured out. Here is the code that I am using, In the prompt I am asking it to return specific info in json format.

If Not maintainContext Or ContextHistory = Nil Then
  ContextHistory = New JSONItem
End If

'Get the connection ready to post
Var j As New JSONItem
j.Value("model") = Model
RequestHeader("Authorization") = "Bearer " + APIKey
RequestHeader("Content-Type")= "application/json"

'Create a message from the prompt
Var message As New JSONItem
message.Value("role") = "user"
message.Value("content") = prompt

if imageData <> "" then
  message.Value("image") = imagedata
end if

ContextHistory.Add(message)

'Add all messages
j.Value("messages") = ContextHistory

'Set the temperature (0 to 2 - amount of creativity/hullucination)
j.Value("temperature") = Temperature

SetRequestContent(j.ToString, "application/json")

Var response As String
response = SendSync("POST", "https://api.openai.com/v1/chat/completions", TimeOut)

Var status As Integer = HTTPStatusCode

Var r As New JSONItem(response)

If status = 200 Then
  'Get the reply message
  Var choices As JSONItem = r.Value("choices")
  Var firstChoice As JSONItem = choices.ValueAt(0)
  Var theMessage As JSONItem = firstChoice.Value("message")
  Var answer As String = theMessage.Value("content")
  
  'Add the reply message
  message.Value("role") = "assistant"
  message.Value("content") = answer
  ContextHistory.Add(message)
  
  'If necessary, trim the context so we don't send one that is too big
  TrimContext
  imagedata = ""
  Return answer
  
Else 
  'Remove the message the user just sent since it resulted in an error
  ContextHistory.RemoveAt(ContextHistory.LastRowIndex)
  Var e As New ChatGPTException
  Var error As JSONItem = r.Value("error")
  Var errorMessage As String = error.Value("message")
  e.Message = errorMessage
  Raise e
  imagedata = ""
  Return ""
End If

What you are looking for is called ‘Structured Outputs’. Basically you need to tell the model, the type of response you need, in your case a JSON response. More info at: https://platform.openai.com/docs/guides/structured-outputs?lang=curl&example=chain-of-thought#examples

1 Like