Briefly on WebRTC
WebRTC is a video chat and conferencing development technology. It allows you to create a peer-to-peer connection between mobile devices and browsers to transmit media streams. You can find more details on how it works and its general principles in our article about WebRTC in plain language.
2 ways to implement video communication with WebRTC on Android
- The easiest and fastest option is to use one of the many commercial projects, such as Twilio or LiveSwitch. They provide their own SDKs for various platforms and implement functionality out of the box, but they have drawbacks. They are paid and the functionality is limited: you can only do the features that they have, not any that you can think of.
- Another option is to use one of the existing libraries. This approach requires more code but will save you money and give you more flexibility in functionality implementation. In this article, we will look at the second option and use https://webrtc.github.io/webrtc-org/native-code/android/ as our library.
Creating a connection
Creating a WebRTC connection consists of two steps:
- Establishing a logical connection – devices must agree on the data format, codecs, etc.
- Establishing a physical connection – devices must know each other’s addresses
To begin with, note that at the initiation of a connection, to exchange data between devices, a signaling mechanism is used. The signaling mechanism can be any channel for transmitting data, such as sockets.
Suppose we want to establish a video connection between two devices. To do this we need to establish a logical connection between them.
A logical connection is established using the Session Description Protocol (SDP), for this one peer:
Creates a PeerConnection object.
Forms an object on the SDP offer, which contains data about the upcoming session, and sends it to the interlocutor using a signaling mechanism.
In turn, the other peer:
- Also creates a PeerConnection object.
- Using the signal mechanism, receives the SDP-offer poisoned by the first peer and stores it in itself
- Forms an SDP-answer and sends it back, also using the signal mechanism
The first peer, having received the SDP answer, keeps it
After successful exchange of SessionDescription objects, the logical connection is considered established.
We now need to establish the physical connection between the devices, which is most often a non-trivial task. Typically, devices on the Internet do not have public addresses, since they are located behind routers and firewalls. To solve this problem WebRTC uses ICE (Interactive Connectivity Establishment) technology.
Stun and Turn servers are an important part of ICE. They serve one purpose – to establish connections between devices that do not have public addresses.
A device makes a request to a Stun-server and receives its public address in response. Then, using a signaling mechanism, it sends it to the interlocutor. After the interlocutor does the same, the devices recognize each other’s network location and are ready to transmit data to each other.
In some cases, the router may have a “Symmetric NAT” limitation. This restriction won’t allow a direct connection between the devices. In this case, the Turn server is used. It serves as an intermediary and all data goes through it. Read more in Mozilla’s WebRTC documentation.
As we have seen, STUN and TURN servers play an important role in establishing a physical connection between devices. It is for this purpose that we when creating the PeerConnection object, pass a list with available ICE servers.
To establish a physical connection, one peer generates ICE candidates – objects containing information about how a device can be found on the network and sends them via a signaling mechanism to the peer.
Then the second peer receives the ICE candidates of the first peer via a signaling mechanism and keeps them for itself. It also generates its own ICE-candidates and sends them back.
Now that the peers have exchanged their addresses, you can start transmitting and receiving data.
The library, after establishing logical and physical connections with the interlocutor, calls the onAddTrack header and passes into it the MediaStream object containing VideoTrack and AudioTrack of the interlocutor.
Next, we must retrieve the VideoTrack from the MediaStream and display it on the screen.
To display VideoTrack, you need to pass it an object that implements the VideoSink interface. For this purpose, the library provides SurfaceViewRenderer class.
To get the sound of the interlocutor we don’t need to do anything extra – the library does everything for us. But still, if we want to fine-tune the sound, we can get an AudioTrack object and use it to change the audio settings.
For example, we could mute the interlocutor, like this:
Sending video and audio from your device also begins by creating a PeerConnection object and sending ICE candidates. But unlike creating an SDPOffer when receiving a video stream from the interlocutor, in this case, we must first create a MediaStream object, which includes AudioTrack and VideoTrack.
To send our audio and video streams, we need to create a PeerConnection object, and then use a signaling mechanism to exchange IceCandidate and SDP packets. But instead of getting the media stream from the library, we must get the media stream from our device and pass it to the library so that it will pass it to our interlocutor.
Now we need to create a MediaStream object and pass the AudioTrack and VideoTrack objects into it.
Receive audio track:
Receiving VideoTrack is tiny bit more difficult. First, get a list of all cameras of the device.
Next, create a CameraVideoCapturer object, which will capture the image.
Now, after getting CameraVideoCapturer, start capturing the image and add it to the MediaStream.
After creating a MediaStream and adding it to the PeerConnection, the library forms an SDP offer, and the SDP packet exchange described above takes place through the signaling mechanism. When this process is complete, the interlocutor will begin to receive our video stream. Congratulations, at this point the connection is established.
Many to Many
We have considered a one-to-one connection. WebRTC also allows you to create many-to-many connections. In its simplest form, this is done in exactly the same way as a one-to-one connection. The difference is that the PeerConnection object, as well as the SDP packet and ICE-candidate exchange, is not done once but for each participant. This approach has disadvantages:
- The device is heavily loaded because it needs to send the same data stream to each interlocutor
- The implementation of additional features such as video recording, transcoding, etc. is difficult or even impossible
In this case, WebRTC can be used in conjunction with a media server that takes care of the above tasks. For the client-side the process is exactly the same as for direct connection to the interlocutors’ devices, but the media stream is not sent to all participants, but only to the media server. The media server retransmits it to the other participants.
We have considered the simplest way to create a WebRTC connection on Android. If after reading this you still don’t understand it, just go through all the steps again and try to implement them yourself – once you have grasped the key points, using this technology in practice will not be a problem.
And this is the video chat for you! Android also allows you to create a custom call notification. After reading our guide on it, even those who are really new to coding will be able to do it!
Not an Android guy? We’ve got you covered with our WebRTC on iOS guide.
You can also refer to the following resources for a better understanding of WebRTC: