Skip to content

Instantly share code, notes, and snippets.

@lambiengcode
Created March 18, 2024 03:50
Show Gist options
  • Select an option

  • Save lambiengcode/c37cf087ec260cf6f2e2c25723da9962 to your computer and use it in GitHub Desktop.

Select an option

Save lambiengcode/c37cf087ec260cf6f2e2c25723da9962 to your computer and use it in GitHub Desktop.
Virtual Background with WebRTC in iOS

Overview

In today's virtual conferencing landscape, the ability to seamlessly swap out your surroundings with personalized images or videos is not just a feature, but a necessity for fostering a professional and engaging remote presence.

👉 By end of this wiki, you can expect the virtual background feature to look like this:

Virtual Background on iOS (VisionKit for Person segment)

Warning

In this document, I use VisionKit to separate background, but VNGeneratePersonInstanceMaskRequest only support iOS 17+

Common WebRTC terms you should know

  1. VideoFrame: It contains the buffer of the frame captured by the camera device in I420 format.
  2. VideoSink: It is used to send the frame back to WebRTC native source.
  3. VideoSource: It reads the camera device, produces VideoFrames, and delivers them to VideoSinks.
  4. VideoProcessor: It is an interface provided by WebRTC to update videoFrames produced by videoSource .
  5. MediaStream: It is an API related to WebRTC which provides support for streaming audio and video data. It consists of zero or more MediaStreamTrack objects, representing various audio or video tracks

Implement in code

Create a class to get VideoFrame to separate background

import Foundation
import WebRTC

@objc public class RTCVideoPipe: NSObject, RTCVideoCapturerDelegate {
    var virtualBackground: RTCVirtualBackground?
    var videoSource: RTCVideoSource?
    var latestTimestampNs: Int64 = 0
    var frameCount: Int = 0
    var lastProcessedTimestamp: Int64 = 0
    var fpsInterval: Int64 = 1000000000 / 15 // 15 fps to ensure VNGen can proccess
    var backgroundImage: UIImage?

    @objc public init(videoSource: RTCVideoSource) {
        self.videoSource = videoSource
        self.virtualBackground = RTCVirtualBackground()
        super.init()
    }
    
    @objc public func setBackgroundImage(image: UIImage?) {
        backgroundImage = image
    }

    @objc public func capturer(_ capturer: RTCVideoCapturer, didCapture frame: RTCVideoFrame) {
        let currentTimestamp = frame.timeStampNs

        // Calculate the time since the last processed frame
        let elapsedTimeSinceLastProcessedFrame = currentTimestamp - lastProcessedTimestamp

        if elapsedTimeSinceLastProcessedFrame < fpsInterval {
            // Skip processing the frame if it's too soon
            return
        }
        
        if backgroundImage == nil {
            self.videoSource?.emitFrame(frame)
            return
        }

        virtualBackground?.processForegroundMask(from: frame, backgroundImage: backgroundImage!) { processedFrame, error in
            if let error = error {
                // Handle error
                print("Error processing foreground mask: \(error.localizedDescription)")
            } else if let processedFrame = processedFrame {
                self.lastProcessedTimestamp = currentTimestamp

                if processedFrame.timeStampNs <= self.latestTimestampNs {
                    // Skip emitting frame if its timestamp is not newer than the latest one
                    return
                }

                self.latestTimestampNs = processedFrame.timeStampNs
                self.videoSource?.emitFrame(processedFrame)
            }
        }
    }
}

Set delegate VideoSouce to getting the VideoFrame from WebRTC

videoPipe = [[RTCVideoPipe alloc] initWithVideoSource: videoSource];
        
[videoSource setDelegate:videoPipe];

Create a class implement VisionKit to separate background

import Foundation
import AVFoundation
import Vision
import VisionKit
import OpenGLES

@available(iOS 17.0, *)
var maskRequest: VNGeneratePersonInstanceMaskRequest?

@objc public class RTCVirtualBackground: NSObject {
    
    public typealias ForegroundMaskCompletion = (RTCVideoFrame?, Error?) -> Void
    
    public override init() {
        if #available(iOS 17.0, *) {
            DispatchQueue.main.async {
                maskRequest = VNGeneratePersonInstanceMaskRequest()
            }
        }
    }
    
    public func processForegroundMask(from videoFrame: RTCVideoFrame, backgroundImage: UIImage, completion: @escaping ForegroundMaskCompletion) {
        guard let pixelBuffer = convertRTCVideoFrameToPixelBuffer(videoFrame) else {
            print("Failed to convert RTCVideoFrame to CVPixelBuffer")
            return
        }
        DispatchQueue.main.async(execute: {
            if #available(iOS 17.0, *) {
                let inputFrameImage = CIImage(cvPixelBuffer: pixelBuffer).resize()
                
                let handler = VNImageRequestHandler(ciImage: inputFrameImage!, options: [:])
                do {
                    try handler.perform([maskRequest!])
                    if let observation = maskRequest!.results?.first {
                        let allInstances = observation.allInstances
                        do {
                            let maskedImage = try observation.generateMaskedImage(ofInstances: allInstances, from: handler, croppedToInstancesExtent: false)
                            
                            self.applyForegroundMask(to: maskedImage, backgroundImage: backgroundImage) { maskedPixelBuffer, error in
                                if let maskedPixelBuffer = maskedPixelBuffer {
                                    let frameProcessed = self.convertPixelBufferToRTCVideoFrame(maskedPixelBuffer, rotation: videoFrame.rotation, timeStampNs: videoFrame.timeStampNs)
                                    completion(frameProcessed, nil)
                                } else {
                                    completion(nil, error)
                                }
                            }
                        } catch {
                            print("Error: \(error.localizedDescription)")
                            completion(nil, error)
                        }
                    }
                } catch {
                    print("Failed to perform Vision request: \(error)")
                    completion(nil, error)
                }
            }
        })   
    }
}

Draw segmented and background on canvas

func applyForegroundMask(to pixelBuffer: CVPixelBuffer, backgroundImage: UIImage, completion: @escaping (CVPixelBuffer?, Error?) -> Void) {
        DispatchQueue.global(qos: .userInitiated).async {
            let maskedUIImage = UIImage(ciImage: CIImage(cvPixelBuffer: pixelBuffer))
            
            let size = CGSize(width: CGFloat(CVPixelBufferGetWidth(pixelBuffer)), height: CGFloat(CVPixelBufferGetHeight(pixelBuffer)))
            
            let rotatedBackgroundImage = backgroundImage.rotateImage(orientation: UIImage.Orientation.up)
            
            UIGraphicsBeginImageContextWithOptions(size, false, 0.0)
            rotatedBackgroundImage.draw(in: CGRect(x: 0, y: 0, width: size.width, height: size.height))
            maskedUIImage.draw(in: CGRect(x: 0, y: 0, width: size.width, height: size.height))
            let composedImage = UIGraphicsGetImageFromCurrentImageContext()
            UIGraphicsEndImageContext()
            
            DispatchQueue.main.async {
                if let composedImage = composedImage {
                    guard let composedPixelBuffer = self.pixelBufferFromImage(image: composedImage) else {
                        completion(nil, nil)
                        return
                    }
                    
                    completion(composedPixelBuffer, nil)
                }
            }
        }
    }

Reference

@cp-sam-media
Copy link

cp-sam-media commented Mar 24, 2024

hi @lambiengcode thanks for sharing code to apply virtul background filters.

can you please share working code for convertPixelBufferToRTCVideoFrame and convertRTCVideoFrameToPixelBuffer method definitions. was finding it difficult to get exact working methods ..

sharing these 2 methods will be great help to progress further. awaiting a quick response.

thanks aton for giving some idea on doing this.

@lambiengcode
Copy link
Author

Hi @cp-sam-media,

This is 2 function:

    func convertPixelBufferToRTCVideoFrame(_ pixelBuffer: CVPixelBuffer, rotation: RTCVideoRotation, timeStampNs: Int64) -> RTCVideoFrame? {
        let rtcPixelBuffer = RTCCVPixelBuffer(pixelBuffer: pixelBuffer)
        
        let rtcVideoFrame = RTCVideoFrame(buffer: rtcPixelBuffer, rotation: rotation, timeStampNs: timeStampNs)
        
        return rtcVideoFrame
    }
    
    func convertRTCVideoFrameToPixelBuffer(_ rtcVideoFrame: RTCVideoFrame) -> CVPixelBuffer? {
        if let remotePixelBuffer = rtcVideoFrame.buffer as? RTCCVPixelBuffer {
            let pixelBuffer = remotePixelBuffer.pixelBuffer
            // Now you have access to 'pixelBuffer' for further use
            return pixelBuffer
        } else {
            print("Error: RTCVideoFrame buffer is not of type RTCCVPixelBuffer")
            return nil
        }
    }

if it's helpful, please give me a star ⭐ for this project https://github.com/lambiengcode/waterbus. Thank youuu

@cp-sam-media
Copy link

thanks alot @lambiengcode for quick response..

i have tried same code to convert from image to rtcvideoframe, but getting blank video.

can you please help me with right code in getting CVPixelBuffer from CIImage or UIImage... i think there should be something wrong in getting CVPixelBuffer from the converted image.

i think i need to look into your pixelBufferFromImage method, which might be causing issue in my code where i m trying to convert UIImage to RTCVideoFrame.

can you please help me with that code as well. would highly appreciate this help.

thanks buddy.. !!

@lambiengcode
Copy link
Author

func applyForegroundMask(to pixelBuffer: CVPixelBuffer, backgroundImage: CIImage, completion: @escaping (CVPixelBuffer?, Error?) -> Void) {
        DispatchQueue.global(qos: .userInitiated).async {
            let ciContext = CIContext()
            
            // Resize background image if necessary
#if os(macOS)
            let size = CGSize(width: 1920, height: 1080)
            
            let rotateBackground = backgroundImage.oriented(.upMirrored)
#elseif os(iOS)
            let size = CGSize(width: CVPixelBufferGetWidth(pixelBuffer), height: CVPixelBufferGetHeight(pixelBuffer))
            
            let rotateBackground = backgroundImage.oriented(.leftMirrored)
#endif
            
            let resizedBackground = rotateBackground.transformed(by: CGAffineTransform(scaleX: size.width / rotateBackground.extent.width, y: size.height / rotateBackground.extent.height))
            
            // Create CIImage from pixelBuffer
            let maskedCIImage = CIImage(cvPixelBuffer: pixelBuffer)
            let resizedMasked = maskedCIImage.transformed(by: CGAffineTransform(scaleX: size.width / maskedCIImage.extent.width, y: size.height / maskedCIImage.extent.height))
            
            // Composite images
            guard let resultImage = ciContext.createCGImage(resizedMasked.composited(over: resizedBackground), from: CGRect(origin: .zero, size: size)) else {
                completion(nil, nil)
                return
            }
            
            // Convert CGImage to CVPixelBuffer
            var composedPixelBuffer: CVPixelBuffer?
            CVPixelBufferCreate(kCFAllocatorDefault, Int(size.width), Int(size.height), kCVPixelFormatType_32ARGB, nil, &composedPixelBuffer)
            guard let composedBuffer = composedPixelBuffer else {
                completion(nil, nil)
                return
            }
            
            CVPixelBufferLockBaseAddress(composedBuffer, CVPixelBufferLockFlags(rawValue: 0))
            let bufferAddress = CVPixelBufferGetBaseAddress(composedBuffer)
            let bytesPerRow = CVPixelBufferGetBytesPerRow(composedBuffer)
            let rgbColorSpace = CGColorSpaceCreateDeviceRGB()
            let context = CGContext(data: bufferAddress, width: Int(size.width), height: Int(size.height), bitsPerComponent: 8, bytesPerRow: bytesPerRow, space: rgbColorSpace, bitmapInfo: CGImageAlphaInfo.noneSkipFirst.rawValue)
            
            context?.draw(resultImage, in: CGRect(x: 0, y: 0, width: size.width, height: size.height))
            
            CVPixelBufferUnlockBaseAddress(composedBuffer, CVPixelBufferLockFlags(rawValue: 0))
            
            completion(composedBuffer, nil)
        }
    }

let's try new func applyForegroundMask, I switch to use background as a CIImage because use for both iOS and MacOS and don't need to use pixelBufferFromImage anymore. @cp-sam-media

@cp-sam-media
Copy link

thanks @lambiengcode for the quickest reply. really appreciating your help.

i m using react native for my app development and trying to update the npm library react-native-webrtc jitsi based webrtc framework written in objective c.

so i m using below method to convert the UIImage which i got after applying filters from RTCVideoFrame, but i m not able to convert the image with filters back to RTCVideoFrame.
with this below method, i m getting a blank video after sending it back to videosource.

can you please help me in correcting the below method code or suggest necessary changes to make it working n give me the nonblank video frame.
this help would be much appreciated as i got struck here from few days and i m also new to this objective c code as well. so it will be a big help for me..!

  • (CVPixelBufferRef *)videoFrameFromImage:(UIImage *)image {
    CGSize frameSize = CGSizeMake(image.size.width, image.size.height);

    // Create a CVPixelBuffer from the UIImage
    NSDictionary *options = @{(NSString *)kCVPixelBufferCGImageCompatibilityKey: @(YES),
    (NSString *)kCVPixelBufferCGBitmapContextCompatibilityKey: @(YES)};
    CVPixelBufferRef pixelBuffer = NULL;
    CVReturn status = CVPixelBufferCreate(kCFAllocatorDefault,
    frameSize.width,
    frameSize.height,
    kCVPixelFormatType_32BGRA,
    (__bridge CFDictionaryRef)options,
    &pixelBuffer);
    if (status != kCVReturnSuccess) {
    NSLog(@"Error creating pixel buffer");
    return nil;
    }

    CVPixelBufferLockBaseAddress(pixelBuffer, 0);
    void *pixelData = CVPixelBufferGetBaseAddress(pixelBuffer);
    CGColorSpaceRef rgbColorSpace = CGColorSpaceCreateDeviceRGB();
    CGContextRef context = CGBitmapContextCreate(pixelData,
    frameSize.width,
    frameSize.height,
    8,
    CVPixelBufferGetBytesPerRow(pixelBuffer),
    rgbColorSpace,
    kCGImageAlphaNoneSkipFirst);

    CGContextDrawImage(context, CGRectMake(0, 0, frameSize.width, frameSize.height), image.CGImage);

    CGContextRelease(context);
    CGColorSpaceRelease(rgbColorSpace);
    CVPixelBufferUnlockBaseAddress(pixelBuffer, 0);
    RCTLog(@"[VideoCaptureController] videoFrameFromImage pixelBuffer created...");

    // Create RTCVideoFrame
    RTCVideoFrame *videoFrame = [[RTCVideoFrame alloc] initWithPixelBuffer:pixelBuffer
    rotation:RTCVideoRotation_0
    timeStampNs:0];

    CVPixelBufferRelease(pixelBuffer);

    return videoFrame;
    }

@lambiengcode
Copy link
Author

Hi @cp-sam-media,

Your code not works correct because when you create videoFrame you must pass timestampNs that is after timestampNs of previous frame, it's really important in WebRTC to define which frame is come first, which frame is come later. You can get time timestamp ns from dispatch:

uint64_t nanoseconds = dispatch_time(DISPATCH_TIME_NOW, 0).tv_nsec;
// Create RTCVideoFrame
RTCVideoFrame *videoFrame = [[RTCVideoFrame alloc] initWithPixelBuffer:pixelBuffer
rotation:RTCVideoRotation_0
timeStampNs:0]; <----- this problem

@cp-sam-media
Copy link

hi @lambiengcode , thanks for the reply.

its my bad, i have copied wrong code, yes i m aware of importance of frame timeframe ..
i have been sending the same rotation and timestamp as original frame is giving. will that also be an issue or shall i try with current timestamp only as it takes little bit of time in doing the processing.. ? pls suggest..

RTCVideoFrame *processedFrame = [[RTCVideoFrame alloc] initWithPixelBuffer:*rtcPixelBuffer
rotation:frame.rotation
timeStampNs:frame.timeStampNs];

@lambiengcode
Copy link
Author

try to use current timestamp first, if error still occur, I suggest write a extension function to save your UIImage to iPhone folder to ensure that's correct image

@cp-sam-media
Copy link

thanks @lambiengcode for the suggestion..

i m doing the same, converting frame to CIImage and storing to app documents first and then taking CIImage and doing processing on that image and trying to convert it to RTCFrame. i have tried to add few milliseconds also to the original frame. let me try with current timestamp once and will check, my bad i missed this n should have tried it .

image

@cp-sam-media
Copy link

hi @lambiengcode hope u r doing good..

i have found some strange value for timestamp in the captured rtcvideoframe, i m getting timestamp value as 138188369726875 which is the datestamp of 1974-05-19 13:39:29.
and the timestamp of the new processed videoframe is 1711575017 which is datestamp of 1974-05-19 13:40:03, so as u said, the blank screen is coming becoz of this timestamp differences.

any idea why i m getting 1970 jan 2nd as datetime from my captured frame.. ?? pls help me with this if u have got some idea and have faced similar issue earlier. that would be a great help.

thanks n much appreciated...!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment