What Exactly Is Early Media in VoIP?
When you dial a number and hear that familiar ring-ring-ring sound, you assume the phone on the other end is ringing. But in VoIP, that sound isn’t always coming from the recipient’s device. It’s often sent by the network itself-before the call is even answered. This is called early media.
Early media is audio (or video) that flows between callers during call setup, before the called party picks up. It includes ringback tones, hold music, automated announcements like "Your call is important to us," or even IVR menus that play while the system routes the call. This isn’t a luxury feature. It’s a core part of how modern VoIP systems keep callers from hanging up out of confusion or impatience.
Before VoIP, landlines handled this automatically. The phone company sent a ringback tone through the network, no matter where the call was going. But SIP-the protocol that runs most VoIP calls-wasn’t built for this. Early versions of SIP (RFC 3261) only allowed basic signaling. If the called phone didn’t answer, the caller got silence. That’s why RFC 3960 was created in 2004. It gave VoIP systems a way to send media during the "waiting" phase, using SIP responses like 180 Ringing and 183 Session Progress.
How Early Media Works: The SIP Timeline
Here’s what happens step by step when you make a VoIP call with early media enabled:
- You dial a number. Your phone sends an INVITE request to the recipient’s server.
- The recipient’s server replies with a 180 Ringing response. This says, "We got your call, we’re trying to reach them."
- Now, if early media is configured, the server can send audio data (RTP packets) along with that 180 response. This audio could be a ringback tone, a recording saying "Please wait," or even music.
- Meanwhile, the recipient’s phone is ringing. They haven’t answered yet.
- When they do pick up, their phone sends a 200 OK response. The call is now fully connected, and media continues-but now it’s two-way.
The key point: early media happens between the INVITE and the 200 OK. It’s not part of the call. It’s part of the setup. And if the recipient never answers, the media can keep playing-until a timer cuts it off.
Ringback Tone vs. Early Media: What’s the Difference?
People often use "ringback tone" and "early media" interchangeably. But they’re not the same.
In traditional phone systems, the ringback tone is generated by your own phone company’s network. It’s a fixed, standardized sound.
In VoIP, "ringback" can mean two things:
- Local ringback: Your phone or PBX plays a tone because it hasn’t received media from the other side yet. This is a fallback.
- Early media ringback: The recipient’s system (like their PBX or carrier) sends actual audio back to you before the call is answered.
True early media ringback is rare. Most systems default to local ringback because it’s simpler. But when you hear a custom ringback-like a company’s jingle or a message saying "This call may be monitored"-that’s early media in action.
As Ernesto Dos Santos Afonso from SIP Caller points out: "Ringback tone is a bad example for SIP, because it’s generally generated in the calling device." He’s right. The real value of early media isn’t the tone-it’s the ability to send any audio before answer.
Why Do Businesses Use Early Media? Real-World Examples
Early media isn’t just about playing a tone. It’s about managing expectations.
Here are real cases where it makes a difference:
- Call centers: A customer dials a support line. Instead of silence, they hear: "Thank you for calling TechSupport. Your call is being routed to the next available agent. Average wait time: 2 minutes." This reduces hang-ups by 30% or more, according to TeleSphere customer feedback.
- Automated routing: A call goes to an IVR. Early media plays the menu options while the system decides where to send the call. No awkward pauses.
- Call transfers: When an agent transfers you to another department, early media can play a gentle hold tone instead of dead air. One user on Reddit spent three days debugging a transfer issue-until they realized the "Transfer progress" setting in PortaOne was set to "No indication."
- International calls: Some carriers replace the local ringback with their own announcement, like "International call in progress." This helps users understand why the call is taking longer.
According to Grand View Research, "call progress tone quality" is the 4th most important factor for enterprise VoIP buyers. That’s not a small thing. If callers think the system is broken because they hear nothing, they’ll hang up-and your business loses.
How Major VoIP Platforms Handle Early Media
Not all systems do early media the same way. Here’s how the big players handle it:
| Platform | How Early Media Is Triggered | Key Limitation |
|---|---|---|
| Asterisk | Requires Progress() in dialplan before Playback() with noanswer flag |
One wrong step and the call answers accidentally |
| Cisco UCM | Uses progress_ind dial-peer settings; supports multiple ringback types |
Documentation is scattered across product lines |
| FreePBX | Configured via "Early Media" setting in trunk or extension settings | Default timeout often too short (15 sec) |
| PortaOne | Set under "Transfer progress" in service policies: options include "Ringing audio" or "Transferor MOH" | Requires correct policy assignment per account |
| AudioCodes | Forwards early media only for IP Groups that support it | Must verify IP Group compatibility first |
Asterisk is the most flexible-but also the most fragile. If you use Playback() without Progress() and noanswer, the system answers the call immediately. That’s why users on forums say: "I spent hours wondering why my announcement played after the call was answered."
Cisco’s implementation is robust but hard to find. You need to dig into SIP dial-peer settings and use the right progress indicator code (1, 2, or 8). And if you’re connecting to a carrier that doesn’t support early media? You get silence.
Why Do Carriers Limit Early Media Time?
Early media is great for users. But for phone companies, it’s a revenue risk.
Imagine a company sets up a VoIP system to play a 2-minute audio ad before connecting a call. The call never gets answered. No one is billed. That’s a problem.
That’s why 78% of Tier 1 carriers limit early media to 15-30 seconds. The VoIP Security Alliance’s 2023 report found the average limit is 22.4 seconds. Some carriers cut it off even sooner.
This is a trade-off. Users want to hear something. Carriers want to make sure they get paid for every second of service. The FCC doesn’t help-it requires ringback to be audible within 1.5 seconds of dialing (FCC Report 22-104). So carriers have to play something fast, but not too long.
If you’re setting up early media, always check your SIP trunk provider’s policy. Some will block it entirely. Others will let you play 30 seconds of music-but only if you’re using a paid service.
Common Problems and How to Fix Them
Early media is one of the most misconfigured features in VoIP. Here are the top issues:
- No sound at all: The most common problem. Check if your PBX is sending 183 Session Progress with SDP (media info). If it’s only sending 180 Ringing, the caller’s device might not know to play audio.
- Call answers too early: Happens in Asterisk if you use
Playback()withoutProgress(). Fix: Always useProgress()before any audio. - Audio cuts off after 10 seconds: Likely a carrier timeout. Contact your SIP provider and ask about early media limits.
- Music plays, but no ringback: You might be playing MOH (music on hold) instead of ringback. Ringback should be a tone or announcement. MOH is for after answer.
- Works on internal calls but not external: External calls go through your SIP trunk. Your trunk provider may block early media. Test with a different provider or check their documentation.
StarTrinity’s VoIP Troubleshooting Book (2023) found that early media issues make up 12.7% of all SIP problems. Most are simple fixes-but only if you know where to look.
What’s Next for Early Media?
Early media isn’t going away. But it’s evolving.
By 2026, Omdia predicts 45% of enterprise UC platforms will use AI-generated early media. That means personalized greetings: "Hi Sarah, this is TechSupport. We see you’re calling about your invoice. Your case number is 7821. We’ll connect you shortly."
Standards are catching up too. RFC 9409 (March 2023) updated early media for WebRTC. 3GPP added enhanced procedures for 5G Voice. The SIP Forum’s Early Media Working Group is targeting a standard by late 2025.
But the biggest challenge remains: balancing user experience with carrier revenue. Until there’s a universal way to bill for early media (like a micro-charge per second), providers will keep short timers.
Final Tip: Test Early Media Like a Pro
If you manage a VoIP system, don’t just enable early media and assume it works. Test it:
- Call your own system from a mobile phone. Listen for audio during the "ringing" phase.
- Use a SIP analyzer (like Wireshark) to check for 183 responses with RTP packets.
- Try a call transfer. Does the person on hold hear music? Or silence?
- Check your provider’s limits. Call and let it ring for 25 seconds. Does the audio stop?
Early media is invisible when it works. But when it fails, callers notice-and they leave.
Buddy Faith
30 Oct 2025 at 19:35So let me get this straight-the whole ringback tone thing is just a scam to keep us on the line while they bill us for nothing? I've been hearing those "your call is important" messages for years and now I realize it's just a loophole so carriers don't have to pay for silence. They're not helping us, they're gaslighting us into thinking the system works. Next they'll tell us the hold music is "therapeutic" and charge us for the vibes.