Most likely, you have faced a situation where you're enjoying the seamless flow of an application—for instance, while making a train or hotel reservation. Then, suddenly—bam!—a never-ending form appears, disrupting the experience.
I'm not saying that filling out such forms is irrelevant for the business—quite the opposite. However, as an app owner, you may notice in your analytics a significant drop in user conversions at this stage.
In this post, I want to introduce a more seamless and user-friendly text input option to improve the experience of filling out multiple fields in a form.
Base project
To begin entering text, long-press the desired text field. When the bottom line turns orange, it indicates that the has been activated speech-to-text mode. Release your finger once you see the text correctly transcribed. If the transcribed text is correct, the line will turn green; otherwise, it will turn red.
Let's dig in the code...
The view is built with a language picker, which is a crucial feature. It allows you to select the language you will use later, especially when interacting with a form containing multiple text fields.
struct VoiceRecorderView: View {
@StateObject private var localeManager = appSingletons.localeManager
@State var name: String = ""
@State var surename: String = ""
@State var age: String = ""
@State var email: String = ""
var body: some View {
Form {
Section {
Picker("Select language", selection: $localeManager.localeIdentifier) {
ForEach(localeManager.locales, id: \.self) { Text($0).tag($0) }
}
.pickerStyle(SegmentedPickerStyle())
.onChange(of: localeManager.localeIdentifier) {
}
}
Section {
TextFieldView(textInputValue: $name,
placeholder: "Name:",
invalidFormatMessage: "Text must be greater than 6 characters!") { textInputValue in
textInputValue.count > 6
}
TextFieldView(textInputValue: $surename,
placeholder: "Surename:",
invalidFormatMessage: "Text must be greater than 6 characters!") { textInputValue in
textInputValue.count > 6
}
TextFieldView(textInputValue: $age,
placeholder: "Age:",
invalidFormatMessage: "Age must be between 18 and 65") { textInputValue in
if let number = Int(textInputValue) {
return number >= 18 && number <= 65
}
return false
}
}
Section {
TextFieldView(textInputValue: $email,
placeholder: "Email:",
invalidFormatMessage: "Must be a valid email address") { textInputValue in
let emailRegex = #"^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$"#
let emailPredicate = NSPredicate(format: "SELF MATCHES %@", emailRegex)
return emailPredicate.evaluate(with: textInputValue)
}
}
}
.padding()
}
}
For every text field, we need a binding variable to hold the text field’s value, a placeholder for guidance, and an error message to display when the acceptance criteria function is not satisfied.
When we examine the TextFieldView
, we see that it is essentially a text field enhanced with additional features to improve user-friendliness.
struct TextFieldView: View {
@State private var isPressed = false
@State private var borderColor = Color.gray
@StateObject private var localeManager = appSingletons.localeManager
@Binding var textInputValue: String
let placeholder: String
let invalidFormatMessage: String?
var isValid: (String) -> Bool = { _ in true }
var body: some View {
VStack(alignment: .leading) {
if !textInputValue.isEmpty {
Text(placeholder)
.font(.caption)
}
TextField(placeholder, text: $textInputValue)
.accessibleTextField(text: $textInputValue, isPressed: $isPressed)
.overlay(
Rectangle()
.frame(height: 2)
.foregroundColor(borderColor),
alignment: .bottom
)
.onChange(of: textInputValue) { oldValue, newValue in
borderColor = getColor(text: newValue, isPressed: isPressed )
}
.onChange(of: isPressed) {
borderColor = getColor(text: textInputValue, isPressed: isPressed )
}
if !textInputValue.isEmpty,
!isValid(textInputValue),
let invalidFormatMessage {
Text(invalidFormatMessage)
.foregroundColor(Color.red)
}
}
}
func getColor(text: String, isPressed: Bool) -> Color {
guard !isPressed else { return Color.orange }
guard !text.isEmpty else { return Color.gray }
return isValid(text) ? Color.green : Color.red
}
}
The key point in the above code is the modifier .accessibleTextField
, where all the magic of converting voice to text happens. We have encapsulated all speech-to-text functionality within this modifier.
extension View {
func accessibleTextField(text: Binding<String>, isPressed: Binding<Bool>) -> some View {
self.modifier(AccessibleTextField(text: text, isPressed: isPressed))
}
}
struct AccessibleTextField: ViewModifier {
@StateObject private var viewModel = VoiceRecorderViewModel()
@Binding var text: String
@Binding var isPressed: Bool
private let lock = NSLock()
func body(content: Content) -> some View {
content
.onChange(of: viewModel.transcribedText) {
guard viewModel.transcribedText != "" else { return }
self.text = viewModel.transcribedText
}
.simultaneousGesture(
DragGesture(minimumDistance: 0)
.onChanged { _ in
lock.withLock {
if !isPressed {
isPressed = true
viewModel.startRecording(locale: appSingletons.localeManager.getCurrentLocale())
}
}
}
.onEnded { _ in
if isPressed {
lock.withLock {
isPressed = false
viewModel.stopRecording()
}
}
}
)
}
}
The voice-to-text functionality is implemented in the VoiceRecorderViewModel. In the view, it is controlled by detecting a long press from the user to start recording and releasing to stop the recording. The transcribed voice text is then forwarded upward via the text Binding attribute.
Finally, here is the view model that handles the transcription:
import Foundation
import AVFoundation
import Speech
class VoiceRecorderViewModel: ObservableObject {
@Published var transcribedText: String = ""
@Published var isRecording: Bool = false
private var audioRecorder: AVAudioRecorder?
private let audioSession = AVAudioSession.sharedInstance()
private let recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
private var recognitionTask: SFSpeechRecognitionTask?
private var audioEngine = AVAudioEngine()
var speechRecognizer: SFSpeechRecognizer?
func startRecording(locale: Locale) {
do {
self.speechRecognizer = SFSpeechRecognizer(locale: locale)
recognitionTask?.cancel()
recognitionTask = nil
try audioSession.setCategory(.record, mode: .measurement, options: .duckOthers)
try audioSession.setActive(true, options: .notifyOthersOnDeactivation)
guard let recognizer = speechRecognizer, recognizer.isAvailable else {
transcribedText = "Reconocimiento de voz no disponible para el idioma seleccionado."
return
}
let inputNode = audioEngine.inputNode
let recordingFormat = inputNode.outputFormat(forBus: 0)
inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { buffer, when in
self.recognitionRequest.append(buffer)
}
audioEngine.prepare()
try audioEngine.start()
recognitionTask = recognizer.recognitionTask(with: recognitionRequest) { result, error in
if let result = result {
self.transcribedText = result.bestTranscription.formattedString
}
}
isRecording = true
} catch {
transcribedText = "Error al iniciar la grabación: \(error.localizedDescription)"
}
}
func stopRecording() {
audioEngine.stop()
audioEngine.inputNode.removeTap(onBus: 0)
recognitionRequest.endAudio()
recognitionTask?.cancel()
isRecording = false
}
}
Key Components
Properties:
@Published var transcribedText
: Holds the real-time transcribed text, allowing SwiftUI views to bind and update dynamically.@Published var isRecording
: Indicates whether the application is currently recording.audioRecorder
,audioSession
,recognitionRequest
,recognitionTask
,audioEngine
,speechRecognizer
: These manage audio recording and speech recognition.
Speech Recognition Workflow:
SFSpeechRecognizer
: Recognizes and transcribes speech from audio input for a specified locale.SFSpeechAudioBufferRecognitionRequest
: Provides an audio buffer for speech recognition tasks.AVAudioEngine
: Captures microphone input.
Conclusions
I aim you that you download the project from following github repositoryand start to play with such great techology.
References
- Speech
Apple Developer Documentation