I'm attempting to sync recorded audio (from an AVAudioEngine
inputNode
) to an audio file that was playing during the recording process. The result should be like multitrack recording where each subsequent new track is synced with the previous tracks that were playing at the time of recording.
Because sampleTime
differs between the AVAudioEngine
's output and input nodes, I use hostTime
to determine the offset of the original audio and the input buffers.
On iOS, I would assume that I'd have to use AVAudioSession
's various latency properties (inputLatency
, outputLatency
, ioBufferDuration
) to reconcile the tracks as well as the host time offset, but I haven't figured out the magic combination to make them work. The same goes for the various AVAudioEngine
and Node
properties like latency
and presentationLatency.
On macOS, AVAudioSession
doesn't exist (outside of Catalyst), meaning I don't have access to those numbers. Meanwhile, the latency
/presentationLatency
properties on the AVAudioNodes
report 0.0
in most circumstances. On macOS, I do have access to AudioObjectGetPropertyData
and can ask the system about kAudioDevicePropertyLatency,
kAudioDevicePropertyBufferSize
,kAudioDevicePropertySafetyOffset
, etc, but am again at a bit of a loss as to what the formula is to reconcile all of these.
I have a sample project at https://github.com/jnpdx/AudioEngineLoopbackLatencyTest that runs a simple loopback test (on macOS, iOS, or Mac Catalyst) and shows the result. On my Mac, the offset between tracks is ~720 samples. On others' Macs, I've seen as much as 1500 samples offset.
On my iPhone, I can get it close to sample-perfect by using AVAudioSession
's outputLatency
+ inputLatency
. However, the same formula leaves things misaligned on my iPad.
What's the magic formula for syncing the input and output timestamps on each platform? I know it may be different on each, which is fine, and I know I won't get 100% accuracy, but I would like to get as close as possible before going through my own calibration process
Here's a sample of my current code (full sync logic can be found at https://github.com/jnpdx/AudioEngineLoopbackLatencyTest/blob/main/AudioEngineLoopbackLatencyTest/AudioManager.swift):
//Schedule playback of original audio during initial playback
let delay = 0.33 * state.secondsToTicks
let audioTime = AVAudioTime(hostTime: mach_absolute_time() + UInt64(delay))
state.audioBuffersScheduledAtHost = audioTime.hostTime
...
//in the inputNode's inputTap, store the first timestamp
audioEngine.inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (pcmBuffer, timestamp) in
if self.state.inputNodeTapBeganAtHost == 0 {
self.state.inputNodeTapBeganAtHost = timestamp.hostTime
}
}
...
//after playback, attempt to reconcile/sync the timestamps recorded above
let timestampToSyncTo = state.audioBuffersScheduledAtHost
let inputNodeHostTimeDiff = Int64(state.inputNodeTapBeganAtHost) - Int64(timestampToSyncTo)
let inputNodeDiffInSamples = Double(inputNodeHostTimeDiff) / state.secondsToTicks * inputFileBuffer.format.sampleRate //secondsToTicks is calculated using mach_timebase_info
//play the original metronome audio at sample position 0 and try to sync everything else up to it
let originalAudioTime = AVAudioTime(sampleTime: 0, atRate: renderingEngine.mainMixerNode.outputFormat(forBus: 0).sampleRate)
originalAudioPlayerNode.scheduleBuffer(metronomeFileBuffer, at: originalAudioTime, options: []) {
print("Played original audio")
}
//play the tap of the input node at its determined sync time -- this _does not_ appear to line up in the result file
let inputAudioTime = AVAudioTime(sampleTime: AVAudioFramePosition(inputNodeDiffInSamples), atRate: renderingEngine.mainMixerNode.outputFormat(forBus: 0).sampleRate)
recordedInputNodePlayer.scheduleBuffer(inputFileBuffer, at: inputAudioTime, options: []) {
print("Input buffer played")
}
When running the sample app, here's the result I get:
question from:https://stackoverflow.com/questions/65600996/avaudioengine-reconcile-sync-input-output-timestamps-on-macos-ios