Heidi API
Transcription

Transcription

Upload a full audio file

Method: POST

Path: /sessions/{session_id}/upload-audio

Description: Upload a single audio file for transcription. Use this when you already have the full recording and don't need streaming/segmented uploads — the file is stored against the session and transcription is generated lazily on the next GET /sessions/{session_id}/transcript call.

For chunked uploads while a consult is still in progress, use the segment transcription endpoints instead.

Preferred file types: Most audio file types are accepted. .mp3 and .ogg produce the best transcription quality.

Request

POST /sessions/1234567890/upload-audio
Authorization: Bearer <your_token>
Content-Type: multipart/form-data
 
{
  "file": "consult.mp3",
  "start_time": "2024-12-11T03:57:57Z",
  "end_time": "2024-12-11T04:12:57Z"
}

Request Fields:

  • file (file, required): Audio file to upload. Must have an audio/* content type.
  • start_time (string, optional): ISO 8601 timestamp marking when the recording started. Used to align the audio against the session timeline.
  • end_time (string, optional): ISO 8601 timestamp marking when the recording ended. If omitted, it is derived from start_time plus the detected duration.

Response

{
  "uploaded_audio": {
    "id": "65f0e6b1a2b3c4d5e6f70123",
    "index": 0,
    "status": "INITIALIZED",
    "created_at": "2024-12-11T03:57:57.921000",
    "access_url": null,
    "expiry": 0
  }
}

Response Fields:

  • id (string): Unique identifier of the stored audio record.
  • index (integer): Position of this audio within the session's audio list.
  • status (string): Processing status. INITIALIZED means the file is stored and queued for transcription on the next transcript fetch.
  • created_at (string): ISO 8601 timestamp of when the audio record was created.
  • access_url (string|null): Presigned URL for the uploaded file. null unless expiry is set to request one.

Error responses

{
  "detail": "only audio file is allowed"
}
{
  "detail": "error: failed to upload audio"
}

Initialise audio transcription

Method: POST

Path: /sessions/{session_id}/restful-segment-transcription

Description: Initialise an audio transcription

Request

POST /sessions/1234567890/restful-segment-transcription
Authorization: Bearer <your_token>

Response

{
  "recording_id": "123"
}

Upload audio to transcribe

Method: POST

Path: /sessions/{session_id}/restful-segment-transcription/{recording_id}:transcribe

Description: Upload an audio chunk to transcribe. Even though it's possible to upload an entire consult audio at once, we recommend splitting the audio every 45s-1min to increase transcription accuracy.

Preferred file types: Most audio file types are supported, however .mp3, .ogg are preferred as they generate better quality transcriptions.

Request

POST /sessions/1234567890/restful-segment-transcription/123:transcribe
Authorization: Bearer <your_token>
Content-Type: multipart/form-data
 
{
  "file": "audio.mp3",
  "index": "0"
}

Response

{
  "is_success": true
}

End audio transcription

Method: POST

Path: /sessions/{session_id}/restful-segment-transcription/{recording_id}:finish

Description: Complete an audio transcription

Request

POST /sessions/1234567890/restful-segment-transcription/123:finish
Authorization: Bearer <your_token>

Response

{
  "is_success": true
}

Retrieve transcript

Method: GET

Path: /sessions/{session_id}/transcript

Description: Retrieve the transcript for a session.

Request

GET /sessions/1234567890/transcript
Authorization: Bearer <your_token>

Response

{
  "transcript": "Transcript text..."
}

Retrieve live transcript

Method: GET

Path: /open-api/sessions/{session_id}/transcript/live

Description: Get the session's live transcript with chunks. This endpoint returns transcript data from multiple recording sources (user uploaded, live recording, and backup) along with their chunked segments.

Query Parameters

ParameterTypeRequiredDefaultDescription
recording_statusstringNoin_progressStatus of the recording. Possible values: in_progress, complete

Request

GET /open-api/sessions/1234567890/transcript/live?recording_status=in_progress
Authorization: Bearer <your_token>

Response

{
  "response": {
    "user_uploaded": [
      {
        "id": "recording_123",
        "chunks": [
          {
            "id": 1,
            "status": "COMPLETE",
            "transcript": "Patient presents with...",
            "duration": 45.5
          },
          {
            "id": 2,
            "status": "IN_PROGRESS",
            "transcript": "The patient reports...",
            "duration": 30.2
          }
        ],
        "created_at": "2024-01-15T10:30:00Z"
      }
    ],
    "live": [],
    "backup": [],
    "transcript_in_use": "user_uploaded",
    "transcript": "Patient presents with... The patient reports..."
  }
}

Response Schema

response (object): The main response object containing transcript data

  • user_uploaded (array): Array of chunked recordings from user-uploaded audio
  • live (array): Array of chunked recordings from live recording
  • backup (array): Array of chunked recordings from backup audio
  • transcript_in_use (string|null): Indicates which transcript source is currently being used (user_uploaded, live, or backup)
  • transcript (string|null): The full concatenated transcript text

Chunked Recording Object:

  • id (string): Unique identifier for the recording
  • chunks (array): Array of transcript chunks
  • created_at (string|null): ISO 8601 timestamp of when the recording was created

Chunk Object:

  • id (integer): Unique identifier for the chunk
  • status (string): Transcription status of the chunk
  • transcript (string|null): The transcribed text for this chunk
  • duration (number): Duration of the audio chunk in seconds