What Media Foundation Is - Why It Starts to Feel Like COM and Windows Media APIs
1. Short version
- Media Foundation is Windows’ media processing platform for video and audio.
- The whole API is not pure COM, but the boundaries between source / transform / sink / callback are expressed as COM interfaces.
- That is why
IUnknown,HRESULT, GUIDs, and apartments show up so naturally. - Start with Source Reader / Sink Writer, and move up to Media Session / MFT only when you need them.
2. What to use for what
| What you want to do | What to use | How COM-heavy |
|---|---|---|
| Pull frames from a file or camera | Source Reader | Medium |
| Write audio/video out to a file | Sink Writer | Medium |
| Handle play/stop/seek and A-V sync | Media Session | High |
| Plug in your own converter (codec component) | MFT (IMFTransform) |
High |
| Enumerate candidates and instantiate the one you want | IMFActivate |
High |
3. Vocabulary to pin down first
| Term | Meaning |
|---|---|
| Media Source | The entry point that feeds media data into the pipeline (file, network, camera). |
| MFT | Media Foundation Transform. The common model for decoders, encoders, and converters. |
| Media Sink | The destination for media data (display output, audio output, file). |
| Media Session | Manages the flow of the whole pipeline. Handles playback and synchronization. |
| Topology | The wiring diagram of source / transform / sink. |
| Activation Object | A helper that creates the real object later. Represented by IMFActivate. |
IMFAttributes |
A key/value store keyed by GUID. Used everywhere in Media Foundation. |
4. Where the COM face shows up
4.1 CoInitializeEx and MFStartup come as a pair
HRESULT hr = CoInitializeEx(nullptr, COINIT_MULTITHREADED); // COM init
hr = MFStartup(MF_VERSION); // Media Foundation init (nothing works without it)
- COM init alone is not enough. You also need to initialize Media Foundation itself.
- Decide up front which threads will use MF, and whether they will be STA or MTA.
4.2 Objects are passed around as COM interfaces
Most return values and out-parameters in the API are COM interfaces:
IMFSourceReader,IMFMediaType,IMFTransform,IMFSample,IMFMediaBuffer- Even type information and configuration objects are expressed as interfaces.
- Even a media type (the configuration data) is a COM interface (
IMFMediaType).
4.3 Activation Objects
Enumeration APIs do not hand you a ready-to-use object. They return an array of IMFActivate*, and you call ActivateObject() only on the ones you actually want.
4.4 Configuration and type info revolve around GUIDs
IMFAttributesis a key/value store keyed by GUID.IMFMediaTypeinherits fromIMFAttributesand stores frame size, FPS, and so on as attributes.MF_MT_MAJOR_TYPE(audio vs. video),MF_MT_SUBTYPE(H.264 / AAC / RGB32 / …).
4.5 Async, callbacks, and threading also feel COM-shaped
- Source Reader synchronous mode:
ReadSampleblocks. - Source Reader asynchronous mode: implement
IMFSourceReaderCallbackand set it via attributes. - MF work-queue threads are MTA. Keeping the application on MTA as well makes the implementation simpler.
- If you need to update the UI, marshal only the result back to the UI thread.
4.6 But Media Foundation is not just COM
Concepts that are specific to Media Foundation:
MFStartup/MFShutdown- Media Session, topology, topology loader, presentation clock
- Source Reader / Sink Writer
In other words, Media Foundation uses COM to express the contracts between components, and runs as a media processing platform on top of that - a two-layer design.
5. Code excerpts: representative patterns
5.1 Using Source Reader in synchronous mode
HRESULT ReadOneVideoSample(PCWSTR path)
{
IMFSourceReader* pReader = nullptr;
IMFMediaType* pType = nullptr;
IMFSample* pSample = nullptr;
HRESULT hr = MFCreateSourceReaderFromURL(path, nullptr, &pReader);
hr = MFCreateMediaType(&pType);
hr = pType->SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Video);
hr = pType->SetGUID(MF_MT_SUBTYPE, MFVideoFormat_RGB32);
hr = pReader->SetCurrentMediaType(MF_SOURCE_READER_FIRST_VIDEO_STREAM, nullptr, pType);
DWORD streamFlags = 0;
LONGLONG timestamp = 0;
hr = pReader->ReadSample(MF_SOURCE_READER_FIRST_VIDEO_STREAM, 0,
nullptr, &streamFlags, ×tamp, &pSample);
// Pull IMFMediaBuffer out of pSample and process it
done:
SafeRelease(&pSample);
SafeRelease(&pType);
SafeRelease(&pReader);
return hr;
}
Things to notice:
- The reader and the media type are both COM interfaces.
- Configuration is GUID-based.
- The return value is
HRESULT. - In synchronous mode,
ReadSampleblocks.
5.2 Enumerating and instantiating an MFT
HRESULT FindH264Decoder(IMFTransform** ppTransform)
{
IMFActivate** ppActivate = nullptr;
UINT32 count = 0;
MFT_REGISTER_TYPE_INFO inputType = {};
inputType.guidMajorType = MFMediaType_Video;
inputType.guidSubtype = MFVideoFormat_H264;
HRESULT hr = MFTEnumEx(MFT_CATEGORY_VIDEO_DECODER,
MFT_ENUM_FLAG_SYNCMFT | MFT_ENUM_FLAG_LOCALMFT,
&inputType, nullptr, &ppActivate, &count);
hr = ppActivate[0]->ActivateObject(__uuidof(IMFTransform),
reinterpret_cast<void**>(ppTransform));
// Cleanup
for (UINT32 i = 0; i < count; ++i) ppActivate[i]->Release();
CoTaskMemFree(ppActivate);
return hr;
}
The enumeration result comes back as IMFActivate**, not as IMFTransform*. You materialize the real object with ActivateObject.
6. Practical checklist
| Item | What to confirm | What goes wrong if you miss it |
|---|---|---|
| Initialization ownership | Who calls CoInitializeEx and MFStartup, and who tears them down |
Missing init, confused shutdown order |
| Apartment | STA or MTA for the threads that touch MF | Confusion around callbacks, conflicts with the UI |
| Source Reader mode | Where the choice between sync and async is made | ReadSample blocking when you didn’t expect it |
| Media type negotiation | Specifying the output format explicitly | MF_E_INVALIDMEDIATYPE, getting a format you didn’t want |
| Object lifetime | Clear ownership of Release and ShutdownObject |
Memory leaks, inconsistent state at shutdown |
| UI integration | Don’t touch the UI directly from a callback - marshal only the result back | Hangs, races |
7. Wrap-up
- Media Foundation is a media processing platform. COM lives deep inside its boundary surfaces.
- Start with Source Reader / Sink Writer, and move on to Media Session / MFT as you need to.
- Decide your apartment and callback policy up front.
- You need both
CoInitializeExandMFStartup. - Media Foundation is not COM, but its boundary surfaces are very COM-shaped.
Related Articles
Recent articles sharing the same tags. Deepen your understanding with closely related topics.
Burning Images and Text into MP4 Frames with Media Foundation - Source Reader, Drawing, Color Conversion, Sink Writer, and a Single-File C++ Sample
Walk through how to overlay images and text on every frame of an MP4 with Media Foundation and write out a new MP4. The flow is broken in...
Converting YUV Frames to RGB in Media Foundation - Source Reader Auto Conversion vs. Manual Conversion
Two ways to turn the NV12 or YUY2 output of a Media Foundation decoder into RGB. We compare Source Reader auto conversion against manual ...
How to Extract a Still Image from an MP4 at a Specific Time with Media Foundation - A Single .cpp File You Can Paste In
Walks through pulling the frame closest to a target time from an MP4 with Media Foundation's Source Reader and saving it as a PNG. Covers...
Pitfalls in COM, OCX, and ActiveX Development - Visual Studio Bitness, Registration, and Admin-Rights Traps
The traps that bite COM, OCX, and ActiveX work in practice: 32-bit/64-bit mismatches, regsvr32 vs Regasm, HKCU vs HKLM scope, and admin-r...
Shared Memory Pitfalls and Best Practices - Sort Out Synchronization, Visibility, Lifetime, ABI, and Security First
A practical breakdown of the typical pitfalls of shared memory in production - synchronization, visibility, lifetime, ABI, permissions, a...
Related Topics
These topic pages place the article in a broader service and decision context.
Windows Technical Topics
Topic hub for KomuraSoft LLC's Windows development, investigation, and legacy-asset articles.
ActiveX Migration
Topic page for staged decisions around keeping, wrapping, or replacing COM / ActiveX / OCX assets.
Where This Topic Connects
This article connects naturally to the following service pages.
Windows App Development
We support Windows desktop applications that involve resident processing, device integration, operational logging, and maintainable structure.
Legacy Asset Reuse & Migration Support
We help plan staged migration while continuing to reuse COM / ActiveX / OCX assets, native code, and 32-bit dependencies.