Burning Images and Text into MP4 Frames with Media Foundation - Source Reader, Drawing, Color Conversion, Sink Writer, and a Single-File C++ Sample
The short version
The basic shape for burning images or text into every frame of an MP4 looks like this:
- Decode with Source Reader -> pull out uncompressed frames
- Composite with a drawing API -> use GDI+ or Direct2D to lay images and text on top
- Color convert if needed -> e.g. RGB32 -> NV12
- Re-encode with Sink Writer -> write a new MP4
Drawing images and text is not Media Foundation’s job. That part belongs to drawing APIs like GDI+ or Direct2D/DirectWrite.
Why it feels complicated
“Putting text on a video” actually mixes four separate concerns.
| Concern | What it means |
|---|---|
| Container vs. codec | mp4 is a container; the payload inside is compressed data like H.264 |
| Decode/encode | You can’t draw on compressed data, so you have to bring it back to uncompressed |
| Drawing | Compositing text and images is the job of GDI+ or Direct2D |
| Color space and pixel format | The format that’s easy to draw on and the format the encoder wants are different (RGB32 vs. NV12) |
The big picture
input.mp4 -> Source Reader -> uncompressed frame (RGB32) -> draw with GDI+ -> BGRA->NV12 conversion -> Sink Writer -> output.mp4
How the pipeline splits up
Input: Source Reader
- Turning on
MF_SOURCE_READER_ENABLE_VIDEO_PROCESSINGmakes it handle YUV->RGB32 conversion and deinterlacing automatically - Receiving frames in RGB32 is the easiest starting point because it’s draw-friendly
Drawing: GDI+ or Direct2D
- If you just want it working: GDI+ - lightweight to bring in and easy to keep in a single file
- If you care about speed: Direct2D/DirectWrite - better for long videos and high resolutions
Color conversion: RGB32 -> NV12
The H.264 encoder expects YUV formats like NV12. Either use the Video Processor MFT, or convert it yourself.
Output: Sink Writer
- Output stream type: the format you want written to disk (e.g.
MFVideoFormat_H264) - Input stream type: the format your app hands over (e.g.
MFVideoFormat_NV12)
Audio
In practice it’s much easier to re-encode only the video and remux the audio as compressed.
Notes on the sample code
This article includes a single-file sample you can paste straight into a Visual Studio 2022 C++ console app.
OverlayMp4.exe input.mp4 overlay.png output.mp4
What the code assumes
- Targets Windows 10/11, x64 build, no precompiled headers
- The input video’s width and height must be even (because NV12 is 4:2:0)
- The output is a video-only MP4
- The overlay text is fixed in
kOverlayText(defaults toHelloWorld)
Flow of the implementation
- Initialize MF and GDI+ via
ScopedMfandScopedGdiplus - Pull input video info and configure RGB32 reception in
ConfigureSourceReader - Create the output file (H.264/NV12) in
CreateSinkWriter - Loop:
ReadSample->CopySampleToTopDownBgra->DrawOverlay->BgraToNv12->WriteSample - Wrap up with
Finalize
Things to watch for when reading the code
Normalize stride and orientation early
Video frames don’t always have stride equal to width x 4, and the vertical orientation can be flipped. The code normalizes everything into a top-down BGRA buffer before drawing.
Check both the flags and the sample from ReadSample
ReadSample can return S_OK while sample == nullptr (e.g. STREAMTICK, ENDOFSTREAM). You have to check HRESULT, flags, and inputSample together.
Carry timestamp and duration from the input
Rather than recomputing on the assumption of a fixed fps every iteration, carrying through the input sample’s timestamp/duration as much as possible is more robust.
Where to take it for production
- Add audio remux: re-encode video only, pass audio through
- Use the Video Processor MFT: handles color space conversion, resizing, and deinterlacing in one place
- Swap drawing for Direct2D/DirectWrite: better for high resolution and long videos
- Move to D3D11 surfaces: when you want to push work onto the GPU path
- Factor it out as a custom MFT: when you want to reuse the logic across multiple apps
Wrap-up
When you burn images or text into video frames with Media Foundation, splitting the problem into read, draw, convert, write back keeps things tidy. Get something running first with Source Reader -> RGB32 -> GDI+ -> NV12 -> Sink Writer, then layer in improvements as the use case demands - that’s the practical path.
Related Articles
Recent articles sharing the same tags. Deepen your understanding with closely related topics.
Converting YUV Frames to RGB in Media Foundation - Source Reader Auto Conversion vs. Manual Conversion
Two ways to turn the NV12 or YUY2 output of a Media Foundation decoder into RGB. We compare Source Reader auto conversion against manual ...
How to Extract a Still Image from an MP4 at a Specific Time with Media Foundation - A Single .cpp File You Can Paste In
Walks through pulling the frame closest to a target time from an MP4 with Media Foundation's Source Reader and saving it as a PNG. Covers...
What Media Foundation Is - Why It Starts to Feel Like COM and Windows Media APIs
A practical map of Media Foundation: what Source Reader, Sink Writer, MFT, and Media Session are for, and where COM concepts like HRESULT...
Shared Memory Pitfalls and Best Practices - Sort Out Synchronization, Visibility, Lifetime, ABI, and Security First
A practical breakdown of the typical pitfalls of shared memory in production - synchronization, visibility, lifetime, ABI, permissions, a...
How to Ship C# as a Native DLL with Native AOT - Calling UnmanagedCallersOnly Exports from C/C++
A practical guide to publishing a C# class library as a native DLL with Native AOT and calling it from C/C++ via UnmanagedCallersOnly — c...
Related Topics
These topic pages place the article in a broader service and decision context.
Windows Technical Topics
Topic hub for KomuraSoft LLC's Windows development, investigation, and legacy-asset articles.
Where This Topic Connects
This article connects naturally to the following service pages.
Windows App Development
We support Windows desktop applications that involve resident processing, device integration, operational logging, and maintainable structure.