What Media Foundation Is - Why It Starts to Feel Like COM and Windows Media APIs

· · Media Foundation, COM, C++, Windows Development

1. Short version

  • Media Foundation is Windows’ media processing platform for video and audio.
  • The whole API is not pure COM, but the boundaries between source / transform / sink / callback are expressed as COM interfaces.
  • That is why IUnknown, HRESULT, GUIDs, and apartments show up so naturally.
  • Start with Source Reader / Sink Writer, and move up to Media Session / MFT only when you need them.

2. What to use for what

What you want to do What to use How COM-heavy
Pull frames from a file or camera Source Reader Medium
Write audio/video out to a file Sink Writer Medium
Handle play/stop/seek and A-V sync Media Session High
Plug in your own converter (codec component) MFT (IMFTransform) High
Enumerate candidates and instantiate the one you want IMFActivate High

3. Vocabulary to pin down first

Term Meaning
Media Source The entry point that feeds media data into the pipeline (file, network, camera).
MFT Media Foundation Transform. The common model for decoders, encoders, and converters.
Media Sink The destination for media data (display output, audio output, file).
Media Session Manages the flow of the whole pipeline. Handles playback and synchronization.
Topology The wiring diagram of source / transform / sink.
Activation Object A helper that creates the real object later. Represented by IMFActivate.
IMFAttributes A key/value store keyed by GUID. Used everywhere in Media Foundation.

4. Where the COM face shows up

4.1 CoInitializeEx and MFStartup come as a pair

HRESULT hr = CoInitializeEx(nullptr, COINIT_MULTITHREADED);  // COM init
hr = MFStartup(MF_VERSION);  // Media Foundation init (nothing works without it)
  • COM init alone is not enough. You also need to initialize Media Foundation itself.
  • Decide up front which threads will use MF, and whether they will be STA or MTA.

4.2 Objects are passed around as COM interfaces

Most return values and out-parameters in the API are COM interfaces:

  • IMFSourceReader, IMFMediaType, IMFTransform, IMFSample, IMFMediaBuffer
  • Even type information and configuration objects are expressed as interfaces.
  • Even a media type (the configuration data) is a COM interface (IMFMediaType).

4.3 Activation Objects

Enumeration APIs do not hand you a ready-to-use object. They return an array of IMFActivate*, and you call ActivateObject() only on the ones you actually want.

4.4 Configuration and type info revolve around GUIDs

  • IMFAttributes is a key/value store keyed by GUID.
  • IMFMediaType inherits from IMFAttributes and stores frame size, FPS, and so on as attributes.
  • MF_MT_MAJOR_TYPE (audio vs. video), MF_MT_SUBTYPE (H.264 / AAC / RGB32 / …).

4.5 Async, callbacks, and threading also feel COM-shaped

  • Source Reader synchronous mode: ReadSample blocks.
  • Source Reader asynchronous mode: implement IMFSourceReaderCallback and set it via attributes.
  • MF work-queue threads are MTA. Keeping the application on MTA as well makes the implementation simpler.
  • If you need to update the UI, marshal only the result back to the UI thread.

4.6 But Media Foundation is not just COM

Concepts that are specific to Media Foundation:

  • MFStartup / MFShutdown
  • Media Session, topology, topology loader, presentation clock
  • Source Reader / Sink Writer

In other words, Media Foundation uses COM to express the contracts between components, and runs as a media processing platform on top of that - a two-layer design.

5. Code excerpts: representative patterns

5.1 Using Source Reader in synchronous mode

HRESULT ReadOneVideoSample(PCWSTR path)
{
    IMFSourceReader* pReader = nullptr;
    IMFMediaType* pType = nullptr;
    IMFSample* pSample = nullptr;

    HRESULT hr = MFCreateSourceReaderFromURL(path, nullptr, &pReader);
    hr = MFCreateMediaType(&pType);
    hr = pType->SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Video);
    hr = pType->SetGUID(MF_MT_SUBTYPE, MFVideoFormat_RGB32);
    hr = pReader->SetCurrentMediaType(MF_SOURCE_READER_FIRST_VIDEO_STREAM, nullptr, pType);

    DWORD streamFlags = 0;
    LONGLONG timestamp = 0;
    hr = pReader->ReadSample(MF_SOURCE_READER_FIRST_VIDEO_STREAM, 0,
                              nullptr, &streamFlags, &timestamp, &pSample);
    // Pull IMFMediaBuffer out of pSample and process it
done:
    SafeRelease(&pSample);
    SafeRelease(&pType);
    SafeRelease(&pReader);
    return hr;
}

Things to notice:

  • The reader and the media type are both COM interfaces.
  • Configuration is GUID-based.
  • The return value is HRESULT.
  • In synchronous mode, ReadSample blocks.

5.2 Enumerating and instantiating an MFT

HRESULT FindH264Decoder(IMFTransform** ppTransform)
{
    IMFActivate** ppActivate = nullptr;
    UINT32 count = 0;
    MFT_REGISTER_TYPE_INFO inputType = {};
    inputType.guidMajorType = MFMediaType_Video;
    inputType.guidSubtype = MFVideoFormat_H264;

    HRESULT hr = MFTEnumEx(MFT_CATEGORY_VIDEO_DECODER,
        MFT_ENUM_FLAG_SYNCMFT | MFT_ENUM_FLAG_LOCALMFT,
        &inputType, nullptr, &ppActivate, &count);

    hr = ppActivate[0]->ActivateObject(__uuidof(IMFTransform),
                                        reinterpret_cast<void**>(ppTransform));
    // Cleanup
    for (UINT32 i = 0; i < count; ++i) ppActivate[i]->Release();
    CoTaskMemFree(ppActivate);
    return hr;
}

The enumeration result comes back as IMFActivate**, not as IMFTransform*. You materialize the real object with ActivateObject.

6. Practical checklist

Item What to confirm What goes wrong if you miss it
Initialization ownership Who calls CoInitializeEx and MFStartup, and who tears them down Missing init, confused shutdown order
Apartment STA or MTA for the threads that touch MF Confusion around callbacks, conflicts with the UI
Source Reader mode Where the choice between sync and async is made ReadSample blocking when you didn’t expect it
Media type negotiation Specifying the output format explicitly MF_E_INVALIDMEDIATYPE, getting a format you didn’t want
Object lifetime Clear ownership of Release and ShutdownObject Memory leaks, inconsistent state at shutdown
UI integration Don’t touch the UI directly from a callback - marshal only the result back Hangs, races

7. Wrap-up

  • Media Foundation is a media processing platform. COM lives deep inside its boundary surfaces.
  • Start with Source Reader / Sink Writer, and move on to Media Session / MFT as you need to.
  • Decide your apartment and callback policy up front.
  • You need both CoInitializeEx and MFStartup.
  • Media Foundation is not COM, but its boundary surfaces are very COM-shaped.

Related Articles

Recent articles sharing the same tags. Deepen your understanding with closely related topics.

Related Topics

These topic pages place the article in a broader service and decision context.

Where This Topic Connects

This article connects naturally to the following service pages.

Back to the Blog