Transcribe interviews with speaker diarization

Once you have uploaded a recording, you can start a transcription job from Pulse Qualitative. Smartinterview offers two modes: a fast standard mode for general transcription and an advanced mode that identifies who said what throughout the interview. Both modes support automatic language detection and produce a downloadable transcript.

Transcription modes

Standard
Advanced (with speaker diarization)

Standard transcription converts speech to text quickly and accurately. It is a good choice when you need a full transcript without speaker attribution, or when all participants speak in the same voice or role.

Fast turnaround
High accuracy for clear audio
Language auto-detected from your selection
No speaker labels

What is speaker diarization?

Speaker diarization is the process of automatically segmenting an audio recording by speaker, figuring out “who spoke when.” Smartinterview’s advanced mode analyzes the audio and assigns a label to each voice it detects. You will see output like:

[00:01:12] Interviewer: Can you tell me about your experience with the product?
[00:01:19] Respondent 1: Sure. I started using it about three months ago...
[00:02:45] Respondent 2: I had a different experience at first...

Speaker labels are assigned automatically. If the detected labels do not match the actual speakers, you can note the corrections when analyzing the transcript — editing labels directly in the interface is not yet supported.

Starting a transcription job

Select your file

In Pulse Qualitative, find the uploaded file in the sidebar or the upload area.

Choose the language

Select the primary language of the interview from the Language dropdown. Choosing the correct language improves accuracy, especially for French, German, and non-English recordings.

Select the transcription mode

Choose Standard for a plain transcript or Advanced for speaker diarization if you need to identify who said what; always choose Advanced.

Click Transcribe

Click the Transcribe button. Smartinterview compresses the audio and sends it for processing. A progress bar appears to indicate how far along the job is.

Monitoring transcription progress

Transcription runs asynchronously. You do not need to keep the page open, but if you do, a progress bar shows the current stage and percentage. Status labels include:

Status	Meaning
Pending	The file is queued and waiting to start
Processing	Transcription is actively running
Completed	The transcript is ready to view and download
Failed	An error occurred — see the troubleshooting section below

Viewing the transcript

When transcription completes, open the transcript to see:

Timestamped speaker turns: each line shows the timestamp (e.g., [02:34]) and the speaker label, followed by the spoken text.
Full transcript view: read the complete conversation in sequence.
Q&A pairs view: Smartinterview extracts question-and-answer pairs from the conversation, making it easy to review what the interviewer asked and how respondents replied.

Downloading the transcript

From the completed transcript view or the transcription history, click Download to save the transcript. Available formats:

Text / TSV: a plain-text version suitable for importing into spreadsheets or analysis tools.
Word document (.docx): a formatted document ready to share or annotate.

Transcription history

All your past transcriptions appear in the sidebar under Transcriptions. Standalone files are listed under Files, and workspace-grouped recordings appear under Folders. Select any entry to see its status, download the transcript, or delete the record. You can refresh the list at any time using the refresh button at the top of the sidebar.

Troubleshooting

Transcription failed

If a job shows a Failed status, an error message appears in the detail panel. Common causes include unsupported audio codecs or very low audio quality. Try converting the file to MP3 or WAV using a tool like Audacity or FFmpeg, then re-upload and retry.

Speaker labels are wrong or missing

Speaker diarization works best with clear audio and distinct voices. If the recording has heavy background noise, overlapping speech, or more than four or five participants, the model may merge or mislabel speakers. In these cases, use the full transcript view and note the correct attributions manually.

The transcript is in the wrong language

Make sure you selected the correct language before starting transcription. If the language was wrong, delete the transcript and re-run transcription with the correct language selected.

Transcription is taking a long time

Processing time scales with the length of the recording. A one-hour interview may take several minutes. If the job has been in Processing status for more than 30 minutes, use the refresh button to check for an updated status, or contact support.

Get Started

Integration

Study design

Codification

Transcript

Account & Billing

Transcribe interviews with speaker diarization

Transcription modes

What is speaker diarization?

Starting a transcription job

Monitoring transcription progress

Viewing the transcript

Downloading the transcript

Transcription history

Troubleshooting

​Transcription modes

​What is speaker diarization?

​Starting a transcription job

​Monitoring transcription progress

​Viewing the transcript

​Downloading the transcript

​Transcription history

​Troubleshooting

Transcription modes

What is speaker diarization?

Starting a transcription job

Monitoring transcription progress

Viewing the transcript

Downloading the transcript

Transcription history

Troubleshooting