How to Convert YUV Frames to RGB with Media Foundation - Source Reader Auto Conversion and Manual Conversion Patterns
When an application wants to save a frame as PNG, pass an image to WIC or GDI, or display video content in a UI, what it usually wants is an RGB pixel buffer.
But the frames that come out of a Media Foundation decoder are very often YUV-family formats such as NV12 or YUY2. If those raw bytes are treated as if they were already RGB pixels, the result is usually a broken image: wrong colors, stripes, or a suspicious green cast.
The earlier article What Media Foundation Is - Why It Starts to Feel Like COM and Windows Media APIs at the Same Time covered the broader shape of Media Foundation.
How to Extract a Still Image from an MP4 with Media Foundation - A Single .cpp File You Can Paste into a C++ Console App focused on still-image extraction.
This article sits in the middle and focuses on YUV to RGB conversion itself.
There are two practical patterns:
- Pattern A: let
IMFSourceReaderdeliverRGB32directly - Pattern B: receive
NV12orYUY2and convert to RGB in your own code
The goal here is not to memorize API names. The goal is to make the flow clear enough that you can picture where YUV appears in Media Foundation and where RGB enters the story.
Contents
- 1. Short version
- 2. The picture first
- 3. Organizing the YUV/RGB relationship first
- 3.1. “YUV” usually means Y’CbCr in practice
- 3.2. 4:4:4, 4:2:2, and 4:2:0 describe how chroma is shared
- 3.3. YUV-to-RGB is both sampling work and color-space work
- 3.4. Quietly guessing BT.601 versus BT.709 is a color bug waiting to happen
- 3.5. A good first formula is BT.601 limited-range conversion
- 4. Pattern A: let Media Foundation convert automatically
- 4.1. Where it fits well
- 4.2. What produces RGB32
- 4.3. Code
- 4.4. Why this path is attractive
- 4.5. But it still has traps
- 5. Pattern B: write the conversion yourself
- 5.1. Where manual conversion fits
- 5.2. The overall manual flow
- 5.3. Request a YUV output type first
- 5.4. Accept only the color metadata you explicitly support
- 5.5. Read buffers using stride, not assumptions
- 5.6. Encode the per-pixel formula
- 5.7. Convert NV12 to BGRA32
- 5.8. Convert YUY2 to BGRA32
- 5.9. A practical entry point from IMFSample
- 5.10. Where manual conversion lives architecturally
- 6. Which path should you choose?
- 7. Common pitfalls in practice
- 7.1. Assuming RGB32 means fully valid alpha
- 7.2. Assuming stride is width * bytesPerPixel
- 7.3. Mixing up MF_MT_DEFAULT_STRIDE and actual pitch
- 7.4. Quietly guessing 601 versus 709
- 7.5. Splitting the NV12 UV plane at width * height
- 7.6. Processing interlaced video as if it were progressive
- 7.7. Ignoring chroma-upsampling quality
- 8. Wrap-up
- 9. References
- Related KomuraSoft LLC articles
- Microsoft Learn
1. Short version
The practical summary looks like this:
- For small numbers of extracted frames or thumbnail generation, enabling
MF_SOURCE_READER_ENABLE_VIDEO_PROCESSINGand requestingMFVideoFormat_RGB32is the easiest path - That automatic conversion is software processing, so it is not a great answer for real-time playback or high-throughput conversion
- If you write the conversion yourself, understanding
NV12andYUY2properly is the shortest path - YUV-to-RGB is not just “apply three coefficients.” In practice it also involves subsampling, nominal range, color matrix, and stride
- In Media Foundation documentation, the word
YUVoften really means Y’CbCr in practical digital-video terms - The most common color bugs come from ignoring
MF_MT_YUV_MATRIXandMF_MT_VIDEO_NOMINAL_RANGE, or from assuming stride is alwayswidth * bytesPerPixel
So the split is simple: if convenience matters most, ask Source Reader for RGB32. If control, scale, or color responsibility matters more, receive YUV and convert it yourself.
2. The picture first
It is easier to start with a picture of where the conversion happens.
flowchart LR
File["MP4 / H.264 / HEVC"] --> Decoder["decoder"]
Decoder --> YUV["NV12 / YUY2 / YV12 and other YUV frames"]
YUV -->|Pattern A| SRVP["Source Reader video processing"]
SRVP --> RGB1["RGB32"]
YUV -->|Pattern B| App["Application-side conversion code"]
App --> RGB2["BGRA / RGB"]
If the source file is compressed video such as H.264 or HEVC, the decoder first produces an uncompressed frame. That frame is often not RGB. In Windows video pipelines, YUV formats are the ordinary case.
So when an application wants RGB, it usually chooses one of two paths:
- Let Media Foundation move the frame all the way to RGB32
- Receive YUV and convert to RGB manually
That choice is the real subject of this article.
3. Organizing the YUV/RGB relationship first
3.1. “YUV” usually means Y’CbCr in practice
Windows APIs and documentation broadly say YUV, but in practical digital-video work, reading U as Cb and V as Cr is usually close enough to keep your mental model straight.
Very roughly:
Ycarries brightness-like informationUandVcarry color-difference informationRGBstores red, green, and blue directly per pixel
Human vision is more sensitive to detail in brightness than detail in chroma. That is why video formats often keep Y at higher detail and reduce the resolution of U and V.
3.2. 4:4:4, 4:2:2, and 4:2:0 describe how chroma is shared
This is the key idea that makes the formats readable.
| Notation | Meaning | Typical examples |
|---|---|---|
| 4:4:4 | Every pixel has full Y/U/V data | AYUV, I444 |
| 4:2:2 | Two horizontal pixels share chroma | YUY2, UYVY, I422 |
| 4:2:0 | A 2x2 block shares chroma | NV12, YV12, I420 |
The two formats that appear constantly in practice are worth visualizing immediately.
NV12 (4:2:0, planar)
Y plane
Y Y Y Y
Y Y Y Y
Y Y Y Y
Y Y Y Y
UV plane
U V U V
U V U V
In NV12, a 2x2 block of pixels shares one U/V pair.
YUY2 (4:2:2, packed)
bytes:
Y0 U0 Y1 V0 Y2 U2 Y3 V2 ...
In YUY2, two horizontal pixels share one U/V pair.
That already explains why YUV-to-RGB conversion is not a simple one-pixel-to-one-pixel replacement.
You first have to decide which U/V values belong to which pixel.
3.3. YUV-to-RGB is both sampling work and color-space work
If you look at Extended Color Information, a full color pipeline can involve inverse quantization, chroma upsampling, YUV-to-RGB conversion, transfer-function handling, primaries conversion, and output quantization.
For practical 8-bit SDR application code, a simpler three-part mental model is enough:
- restore chroma sampling
expand 4:2:0 or 4:2:2 chroma so each pixel can use it - restore nominal range
interpret video-range values correctly - apply the matrix
useBT.601,BT.709, or another correct set of coefficients
In other words, practical YUV-to-RGB code answers two questions:
- which U/V values belong to this pixel?
- which matrix should turn this Y/U/V triplet into RGB?
3.4. Quietly guessing BT.601 versus BT.709 is a color bug waiting to happen
Media Foundation documentation explains BT.601 and BT.709 in a reasonable way, but “it is probably 709 because the resolution is larger” is still a risky habit. Color errors do not crash, so they often make it into production quietly.
At minimum, inspect:
MF_MT_YUV_MATRIXMF_MT_VIDEO_NOMINAL_RANGE
The safer early implementation strategy is to accept only combinations your code explicitly supports.
3.5. A good first formula is BT.601 limited-range conversion
A representative 8-bit BT.601 formula looks like this:
C = Y - 16
D = U - 128
E = V - 128
R = clip(1.164383 * C + 1.596027 * E)
G = clip(1.164383 * C - 0.391762 * D - 0.812968 * E)
B = clip(1.164383 * C + 2.017232 * D)
BT.709 changes the coefficients. The important thing is not the memorization of numbers, but understanding the structure:
Yis offset from black at 16UandVare centered around 128
4. Pattern A: let Media Foundation convert automatically
4.1. Where it fits well
This path is a good match when:
- you want one still frame from MP4
- you need a handful of thumbnails
- you want to hand an RGB image to WIC or GDI
- the workload is offline or tool-like rather than real-time playback
Source Reader supports limited video processing when MF_SOURCE_READER_ENABLE_VIDEO_PROCESSING is enabled, and that processing can produce RGB32 from a YUV decode path.
Microsoft’s own documentation is clear that this is software processing and not optimized for playback. So it is convenient, but it is not the universal answer.
4.2. What produces RGB32
The flow is straightforward:
- create Source Reader attributes with
MF_SOURCE_READER_ENABLE_VIDEO_PROCESSING = TRUE - select the video stream
- request
MFMediaType_Video+MFVideoFormat_RGB32 - call
ReadSample
That causes the limited video-processing stage behind Source Reader to do the YUV-to-RGB conversion.
4.3. Code
The following assumes CoInitializeEx and MFStartup have already succeeded.
#include <windows.h>
#include <mfapi.h>
#include <mfidl.h>
#include <mfreadwrite.h>
#include <mferror.h>
#include <wrl/client.h>
#pragma comment(lib, "mfplat.lib")
#pragma comment(lib, "mfreadwrite.lib")
#pragma comment(lib, "mfuuid.lib")
#pragma comment(lib, "ole32.lib")
using Microsoft::WRL::ComPtr;
HRESULT CreateSourceReaderWithAutoRgb(
const wchar_t* path,
IMFSourceReader** ppReader)
{
if (!path || !ppReader) return E_POINTER;
*ppReader = nullptr;
ComPtr<IMFAttributes> attrs;
HRESULT hr = MFCreateAttributes(&attrs, 2);
if (FAILED(hr)) return hr;
hr = attrs->SetUINT32(MF_SOURCE_READER_ENABLE_VIDEO_PROCESSING, TRUE);
if (FAILED(hr)) return hr;
hr = MFCreateSourceReaderFromURL(path, attrs.Get(), ppReader);
if (FAILED(hr)) return hr;
hr = (*ppReader)->SetStreamSelection(MF_SOURCE_READER_ALL_STREAMS, FALSE);
if (FAILED(hr)) return hr;
hr = (*ppReader)->SetStreamSelection(MF_SOURCE_READER_FIRST_VIDEO_STREAM, TRUE);
if (FAILED(hr)) return hr;
ComPtr<IMFMediaType> outType;
hr = MFCreateMediaType(&outType);
if (FAILED(hr)) return hr;
hr = outType->SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Video);
if (FAILED(hr)) return hr;
hr = outType->SetGUID(MF_MT_SUBTYPE, MFVideoFormat_RGB32);
if (FAILED(hr)) return hr;
hr = (*ppReader)->SetCurrentMediaType(
MF_SOURCE_READER_FIRST_VIDEO_STREAM,
nullptr,
outType.Get());
if (FAILED(hr)) return hr;
return S_OK;
}
HRESULT ReadOneRgb32Sample(
IMFSourceReader* reader,
IMFSample** ppSample,
LONGLONG* pTimestamp100ns)
{
if (!reader || !ppSample) return E_POINTER;
*ppSample = nullptr;
if (pTimestamp100ns) *pTimestamp100ns = 0;
DWORD streamIndex = 0;
DWORD flags = 0;
LONGLONG timestamp = 0;
HRESULT hr = reader->ReadSample(
MF_SOURCE_READER_FIRST_VIDEO_STREAM,
0,
&streamIndex,
&flags,
×tamp,
ppSample);
if (FAILED(hr)) return hr;
if (flags & MF_SOURCE_READERF_ENDOFSTREAM) return MF_E_END_OF_STREAM;
if (*ppSample == nullptr) return MF_E_INVALID_STREAM_DATA;
if (pTimestamp100ns) *pTimestamp100ns = timestamp;
return S_OK;
}
After that, GetCurrentMediaType can be used to inspect the actual output size and stride.
4.4. Why this path is attractive
The strength of this approach is simple: it gets you to a correct-looking image quickly.
- you do not need to write 4:2:0 or 4:2:2 expansion yourself
- a lot of format detail stays hidden
- the output is easier to pass to WIC or GDI
- for a small number of frames, it is often entirely practical
For still-image extraction tools, this is often the calmest entry point.
4.5. But it still has traps
This automatic path has some important properties:
| Item | Meaning |
|---|---|
| output format | usually RGB32 |
| implementation | software processing |
| good for | small numbers of frames, thumbnails, offline conversion |
| not ideal for | D3D-centric rendering, high-throughput frame processing |
| awkward with | MF_SOURCE_READER_D3D_MANAGER, MF_READWRITE_DISABLE_CONVERTERS |
There is one more detail that matters a lot in practice: the fourth byte of RGB32.
Windows RGB32 is laid out in memory as Blue / Green / Red / Alpha or Don’t Care. It is not a guarantee of “ready-made ARGB32.” If that fourth byte is passed through to a PNG encoder as alpha, the image can become unintentionally transparent. Filling it with 0xFF before writing is usually the safer move.
5. Pattern B: write the conversion yourself
5.1. Where manual conversion fits
Manual conversion is a better fit when:
- you need to process many frames and want control over performance
- you want to keep
NV12on a GPU or SIMD path as long as possible - you want explicit control over
BT.601,BT.709, or nominal range - you need outputs other than
RGB32 - the limited automatic conversion path is not enough
In other words, this path trades convenience for control.
5.2. The overall manual flow
The steps are:
- configure Source Reader to output
NV12orYUY2 - inspect the actual media type with
GetCurrentMediaType - read
MF_MT_FRAME_SIZE,MF_MT_DEFAULT_STRIDE,MF_MT_YUV_MATRIX, andMF_MT_VIDEO_NOMINAL_RANGE - lock the sample buffer
- determine which Y/U/V values each pixel should use
- apply the matrix and write BGRA output
The code in this article intentionally narrows the scope to 8-bit SDR, progressive video, NV12 or YUY2, and limited range. That is not laziness. It is the safer way to avoid quietly incorrect color.
5.3. Request a YUV output type first
The first step is to tell Source Reader to keep the YUV path visible.
#include <windows.h>
#include <mfapi.h>
#include <mfidl.h>
#include <mfreadwrite.h>
#include <mferror.h>
#include <wrl/client.h>
using Microsoft::WRL::ComPtr;
HRESULT ConfigureSourceReaderForSubtype(
IMFSourceReader* reader,
REFGUID subtype)
{
if (!reader) return E_POINTER;
HRESULT hr = reader->SetStreamSelection(MF_SOURCE_READER_ALL_STREAMS, FALSE);
if (FAILED(hr)) return hr;
hr = reader->SetStreamSelection(MF_SOURCE_READER_FIRST_VIDEO_STREAM, TRUE);
if (FAILED(hr)) return hr;
ComPtr<IMFMediaType> outType;
hr = MFCreateMediaType(&outType);
if (FAILED(hr)) return hr;
hr = outType->SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Video);
if (FAILED(hr)) return hr;
hr = outType->SetGUID(MF_MT_SUBTYPE, subtype);
if (FAILED(hr)) return hr;
hr = reader->SetCurrentMediaType(
MF_SOURCE_READER_FIRST_VIDEO_STREAM,
nullptr,
outType.Get());
if (FAILED(hr)) return hr;
return S_OK;
}
Pass MFVideoFormat_NV12 or MFVideoFormat_YUY2 as subtype.
And do not assume the requested subtype is exactly what you will get back. Always confirm the actual type with GetCurrentMediaType.
5.4. Accept only the color metadata you explicitly support
The code below intentionally accepts only:
NV12orYUY2BT.601orBT.709MFNominalRange_16_235
#include <vector>
struct DecodedFrameInfo
{
GUID subtype = GUID_NULL;
UINT32 width = 0;
UINT32 height = 0;
LONG defaultStride = 0;
MFVideoTransferMatrix matrix = MFVideoTransferMatrix_Unknown;
MFNominalRange nominalRange = MFNominalRange_Unknown;
};
HRESULT GetDefaultStride(
IMFMediaType* pType,
LONG* plStride)
{
if (!pType || !plStride) return E_POINTER;
LONG stride = 0;
HRESULT hr = pType->GetUINT32(
MF_MT_DEFAULT_STRIDE,
reinterpret_cast<UINT32*>(&stride));
if (FAILED(hr))
{
GUID subtype = GUID_NULL;
UINT32 width = 0;
UINT32 height = 0;
hr = pType->GetGUID(MF_MT_SUBTYPE, &subtype);
if (FAILED(hr)) return hr;
hr = MFGetAttributeSize(pType, MF_MT_FRAME_SIZE, &width, &height);
if (FAILED(hr)) return hr;
hr = MFGetStrideForBitmapInfoHeader(subtype.Data1, width, &stride);
if (FAILED(hr)) return hr;
hr = pType->SetUINT32(MF_MT_DEFAULT_STRIDE, static_cast<UINT32>(stride));
if (FAILED(hr)) return hr;
}
*plStride = stride;
return S_OK;
}
HRESULT GetStrictDecodedFrameInfo(
IMFMediaType* pType,
DecodedFrameInfo* pInfo)
{
if (!pType || !pInfo) return E_POINTER;
HRESULT hr = pType->GetGUID(MF_MT_SUBTYPE, &pInfo->subtype);
if (FAILED(hr)) return hr;
if (pInfo->subtype != MFVideoFormat_NV12 &&
pInfo->subtype != MFVideoFormat_YUY2)
{
return MF_E_INVALIDMEDIATYPE;
}
hr = MFGetAttributeSize(pType, MF_MT_FRAME_SIZE, &pInfo->width, &pInfo->height);
if (FAILED(hr)) return hr;
hr = GetDefaultStride(pType, &pInfo->defaultStride);
if (FAILED(hr)) return hr;
UINT32 value = 0;
hr = pType->GetUINT32(MF_MT_YUV_MATRIX, &value);
if (FAILED(hr)) return hr;
pInfo->matrix = static_cast<MFVideoTransferMatrix>(value);
if (pInfo->matrix != MFVideoTransferMatrix_BT601 &&
pInfo->matrix != MFVideoTransferMatrix_BT709)
{
return MF_E_INVALIDMEDIATYPE;
}
hr = pType->GetUINT32(MF_MT_VIDEO_NOMINAL_RANGE, &value);
if (FAILED(hr)) return hr;
pInfo->nominalRange = static_cast<MFNominalRange>(value);
if (pInfo->nominalRange != MFNominalRange_16_235)
{
return MF_E_INVALIDMEDIATYPE;
}
return S_OK;
}
This is intentionally strict. Media Foundation documentation does describe some fallback interpretations, but silently rounding unknown metadata into a default assumption is one of the easiest ways to ship subtle color errors.
5.5. Read buffers using stride, not assumptions
This part matters a lot:
MF_MT_DEFAULT_STRIDEis the minimum logical stride- the actual sample buffer may include padding
- if
IMF2DBuffer::Lock2Dis available, prefer it
This helper is a practical adaptation of the Uncompressed Video Buffers pattern:
class BufferLock
{
public:
explicit BufferLock(IMFMediaBuffer* buffer)
: m_buffer(buffer),
m_2dBuffer(nullptr),
m_locked(false)
{
if (m_buffer)
{
m_buffer->AddRef();
m_buffer->QueryInterface(IID_PPV_ARGS(&m_2dBuffer));
}
}
~BufferLock()
{
Unlock();
if (m_2dBuffer)
{
m_2dBuffer->Release();
m_2dBuffer = nullptr;
}
if (m_buffer)
{
m_buffer->Release();
m_buffer = nullptr;
}
}
HRESULT Lock(
LONG defaultStride,
DWORD heightInPixels,
BYTE** ppScanline0,
LONG* pActualStride)
{
if (!m_buffer || !ppScanline0 || !pActualStride) return E_POINTER;
if (m_locked) return MF_E_INVALIDREQUEST;
if (m_2dBuffer)
{
HRESULT hr = m_2dBuffer->Lock2D(ppScanline0, pActualStride);
if (FAILED(hr)) return hr;
m_locked = true;
return S_OK;
}
BYTE* pData = nullptr;
HRESULT hr = m_buffer->Lock(&pData, nullptr, nullptr);
if (FAILED(hr)) return hr;
*pActualStride = defaultStride;
if (defaultStride < 0)
{
*ppScanline0 =
pData + static_cast<size_t>(-defaultStride) * (heightInPixels - 1);
}
else
{
*ppScanline0 = pData;
}
m_locked = true;
return S_OK;
}
void Unlock()
{
if (!m_locked) return;
if (m_2dBuffer)
{
m_2dBuffer->Unlock2D();
}
else
{
m_buffer->Unlock();
}
m_locked = false;
}
private:
IMFMediaBuffer* m_buffer;
IMF2DBuffer* m_2dBuffer;
bool m_locked;
};
The safe rule is simple: trust the pitch returned by the API, not a width-based guess.
5.6. Encode the per-pixel formula
The example below handles only limited-range BT.601 and BT.709, and writes BGRA32 output.
inline BYTE ClampToByte(double value)
{
if (value <= 0.0) return 0;
if (value >= 255.0) return 255;
return static_cast<BYTE>(value + 0.5);
}
HRESULT ConvertLimitedYuvPixelToBgra(
BYTE y,
BYTE u,
BYTE v,
MFVideoTransferMatrix matrix,
BYTE* dstPixel)
{
if (!dstPixel) return E_POINTER;
const double c = static_cast<double>(y) - 16.0;
const double d = static_cast<double>(u) - 128.0;
const double e = static_cast<double>(v) - 128.0;
double r = 0.0;
double g = 0.0;
double b = 0.0;
switch (matrix)
{
case MFVideoTransferMatrix_BT601:
r = 1.164383 * c + 1.596027 * e;
g = 1.164383 * c - 0.391762 * d - 0.812968 * e;
b = 1.164383 * c + 2.017232 * d;
break;
case MFVideoTransferMatrix_BT709:
r = 1.164383 * c + 1.792741 * e;
g = 1.164383 * c - 0.213249 * d - 0.532909 * e;
b = 1.164383 * c + 2.112402 * d;
break;
default:
return MF_E_INVALIDMEDIATYPE;
}
dstPixel[0] = ClampToByte(b);
dstPixel[1] = ClampToByte(g);
dstPixel[2] = ClampToByte(r);
dstPixel[3] = 255;
return S_OK;
}
The structure is straightforward:
- subtract 16 from
Y - subtract 128 from
UandV - apply the matrix
- clamp to byte range
- force alpha to
255
5.7. Convert NV12 to BGRA32
NV12 is 4:2:0, so each 2x2 block shares one U/V pair.
HRESULT ConvertNv12ToBgra32(
IMFMediaBuffer* buffer,
const DecodedFrameInfo& info,
std::vector<BYTE>& dstBgra)
{
if (!buffer) return E_POINTER;
if (info.subtype != MFVideoFormat_NV12) return MF_E_INVALIDMEDIATYPE;
if ((info.width & 1u) != 0 || (info.height & 1u) != 0)
{
return MF_E_INVALIDMEDIATYPE;
}
dstBgra.resize(static_cast<size_t>(info.width) * info.height * 4);
BufferLock lock(buffer);
BYTE* scanline0 = nullptr;
LONG actualStride = 0;
HRESULT hr = lock.Lock(
info.defaultStride,
info.height,
&scanline0,
&actualStride);
if (FAILED(hr)) return hr;
if (actualStride <= 0)
{
lock.Unlock();
return MF_E_INVALIDMEDIATYPE;
}
const BYTE* yPlane = scanline0;
const BYTE* uvPlane =
scanline0 + static_cast<size_t>(actualStride) * info.height;
for (UINT32 y = 0; y < info.height; ++y)
{
const BYTE* yRow = yPlane + static_cast<size_t>(actualStride) * y;
const BYTE* uvRow = uvPlane + static_cast<size_t>(actualStride) * (y / 2);
BYTE* dstRow =
dstBgra.data() + static_cast<size_t>(info.width) * 4 * y;
for (UINT32 x = 0; x < info.width; ++x)
{
const BYTE Y = yRow[x];
const BYTE U = uvRow[(x / 2) * 2 + 0];
const BYTE V = uvRow[(x / 2) * 2 + 1];
hr = ConvertLimitedYuvPixelToBgra(
Y,
U,
V,
info.matrix,
dstRow + static_cast<size_t>(x) * 4);
if (FAILED(hr))
{
lock.Unlock();
return hr;
}
}
}
lock.Unlock();
return S_OK;
}
This uses a nearest-neighbor-style interpretation of chroma sharing. That is often good enough for tooling and many application paths, but higher-quality upsampling is a separate design choice.
5.8. Convert YUY2 to BGRA32
YUY2 is packed 4:2:2, so two horizontal pixels share one U/V pair.
#include <cstddef>
HRESULT ConvertYuy2ToBgra32(
IMFMediaBuffer* buffer,
const DecodedFrameInfo& info,
std::vector<BYTE>& dstBgra)
{
if (!buffer) return E_POINTER;
if (info.subtype != MFVideoFormat_YUY2) return MF_E_INVALIDMEDIATYPE;
if ((info.width & 1u) != 0) return MF_E_INVALIDMEDIATYPE;
dstBgra.resize(static_cast<size_t>(info.width) * info.height * 4);
BufferLock lock(buffer);
BYTE* scanline0 = nullptr;
LONG actualStride = 0;
HRESULT hr = lock.Lock(
info.defaultStride,
info.height,
&scanline0,
&actualStride);
if (FAILED(hr)) return hr;
for (UINT32 y = 0; y < info.height; ++y)
{
const BYTE* src =
scanline0 +
static_cast<ptrdiff_t>(actualStride) * static_cast<ptrdiff_t>(y);
BYTE* dstRow =
dstBgra.data() + static_cast<size_t>(info.width) * 4 * y;
for (UINT32 x = 0; x < info.width; x += 2)
{
const BYTE Y0 = src[0];
const BYTE U = src[1];
const BYTE Y1 = src[2];
const BYTE V = src[3];
hr = ConvertLimitedYuvPixelToBgra(
Y0,
U,
V,
info.matrix,
dstRow + static_cast<size_t>(x) * 4);
if (FAILED(hr))
{
lock.Unlock();
return hr;
}
hr = ConvertLimitedYuvPixelToBgra(
Y1,
U,
V,
info.matrix,
dstRow + static_cast<size_t>(x + 1) * 4);
if (FAILED(hr))
{
lock.Unlock();
return hr;
}
src += 4;
}
}
lock.Unlock();
return S_OK;
}
Because the bytes are laid out as Y0 U Y1 V, the shared-chroma pattern is visible directly in the code.
5.9. A practical entry point from IMFSample
Once the sample is made contiguous, dispatching by subtype keeps the calling side simple.
HRESULT ConvertSampleToBgra32(
IMFSample* sample,
const DecodedFrameInfo& info,
std::vector<BYTE>& dstBgra)
{
if (!sample) return E_POINTER;
ComPtr<IMFMediaBuffer> buffer;
HRESULT hr = sample->ConvertToContiguousBuffer(&buffer);
if (FAILED(hr)) return hr;
if (info.subtype == MFVideoFormat_NV12)
{
return ConvertNv12ToBgra32(buffer.Get(), info, dstBgra);
}
if (info.subtype == MFVideoFormat_YUY2)
{
return ConvertYuy2ToBgra32(buffer.Get(), info, dstBgra);
}
return MF_E_INVALIDMEDIATYPE;
}
The surrounding flow then becomes:
- create the reader
- request
NV12orYUY2 - build
DecodedFrameInfofromGetCurrentMediaType - call
ReadSample - call
ConvertSampleToBgra32
For example:
ComPtr<IMFMediaType> currentType;
HRESULT hr = reader->GetCurrentMediaType(
MF_SOURCE_READER_FIRST_VIDEO_STREAM,
¤tType);
if (FAILED(hr)) return hr;
DecodedFrameInfo info;
hr = GetStrictDecodedFrameInfo(currentType.Get(), &info);
if (FAILED(hr)) return hr;
DWORD flags = 0;
LONGLONG timestamp = 0;
ComPtr<IMFSample> sample;
hr = reader->ReadSample(
MF_SOURCE_READER_FIRST_VIDEO_STREAM,
0,
nullptr,
&flags,
×tamp,
&sample);
if (FAILED(hr)) return hr;
if (flags & MF_SOURCE_READERF_ENDOFSTREAM) return MF_E_END_OF_STREAM;
if (!sample) return MF_E_INVALID_STREAM_DATA;
std::vector<BYTE> bgra;
hr = ConvertSampleToBgra32(sample.Get(), info, bgra);
if (FAILED(hr)) return hr;
// bgra can now be treated as top-down 32bpp BGRA
5.10. Where manual conversion lives architecturally
All of the code above converts after Source Reader, inside the application. That is the clearest place to start.
There are more advanced variants too:
- write a custom
MFT - use
Video Processor MFT/ XVP - convert
NV12to RGB on the GPU with shaders
Those are real options, but they shift the subject from “how to understand YUV-to-RGB in Media Foundation” to “where to place the conversion stage in a larger media pipeline.”
6. Which path should you choose?
This table usually makes the decision easier:
| Perspective | Auto conversion (MF_SOURCE_READER_ENABLE_VIDEO_PROCESSING) |
Manual conversion |
|---|---|---|
| implementation speed | ◎ | △ |
| extracting a few stills | ◎ | ○ |
| large batch / real-time workloads | △ | ◎ |
| explicit matrix / range control | △ | ◎ |
| D3D / GPU integration | △ | ○ to ◎ |
output formats beyond RGB32 |
△ | ◎ |
| learning the underlying model | ○ | ◎ |
As a practical first strategy:
- if you want something working quickly, use automatic conversion
- if you want full control over color and performance, use manual conversion
It is also very reasonable to use the automatic path first as a correctness baseline, then replace it with a manual path later.
7. Common pitfalls in practice
7.1. Assuming RGB32 means fully valid alpha
RGB32 in memory is B, G, R, Alpha or Don't Care.
If the fourth byte is zero and you hand it directly to a PNG path that treats it as alpha, the image can become transparent. Filling it with 0xFF is the safer default.
7.2. Assuming stride is width * bytesPerPixel
This is one of the most common mistakes.
Real sample buffers may include padding. Always move between rows using the actual stride.
7.3. Mixing up MF_MT_DEFAULT_STRIDE and actual pitch
MF_MT_DEFAULT_STRIDE describes the minimum logical stride for the format.
The actual pitch of a concrete sample should come from the buffer interface, especially IMF2DBuffer::Lock2D when available.
7.4. Quietly guessing 601 versus 709
Color bugs are easy to miss because they do not crash.
At minimum, inspect:
MF_MT_YUV_MATRIXMF_MT_VIDEO_NOMINAL_RANGE
And if a value is outside what your code explicitly supports, failing fast is often better than silently guessing.
7.5. Splitting the NV12 UV plane at width * height
The plane offset depends on actual stride and height, not simply on width * height.
Treating it as a width-based split is an easy way to corrupt color.
7.6. Processing interlaced video as if it were progressive
The manual examples in this article assume progressive video.
If the source is interlaced and you treat it as a single progressive frame, combing artifacts can appear. That is one reason the automatic video-processing path or Video Processor MFT may be more suitable in some pipelines.
7.7. Ignoring chroma-upsampling quality
The NV12 example here is intentionally simple and readable. It uses the shared chroma values directly for each covered pixel. That is often practical, but if image quality matters more than implementation simplicity, chroma-upsampling quality deserves a real design choice.
8. Wrap-up
When you convert YUV to RGB in Media Foundation, a few ideas make the whole topic much easier to reason about:
- after decode, the frame is often
NV12orYUY2, not RGB - if convenience wins, ask Source Reader for
RGB32 - if control wins, receive
NV12orYUY2and convert to BGRA yourself - in a manual path, understand sampling, nominal range, matrix, and stride before worrying about formula details
- if
BT.601,BT.709,16..235,4:2:0, and4:2:2are handled vaguely, the result is often a quietly wrong image
YUV-to-RGB looks unfriendly at first. But once the structure becomes clear,
NV12means one U/V pair per 2x2 blockYUY2means one U/V pair per two horizontal pixels- the chosen matrix turns those values into RGB
the bytes stop looking mysterious and start looking like a pipeline you can reason about.
9. References
Related KomuraSoft LLC articles
- What Media Foundation Is - Why It Starts to Feel Like COM and Windows Media APIs at the Same Time
- How to Extract a Still Image from an MP4 with Media Foundation - A Single .cpp File You Can Paste into a C++ Console App
Microsoft Learn
- Source Reader
- Using the Source Reader to Process Media Data
- MF_SOURCE_READER_ENABLE_VIDEO_PROCESSING attribute
- IMFSourceReader::SetCurrentMediaType
- Recommended 8-Bit YUV Formats for Video Rendering
- Extended Color Information
- Uncompressed Video Buffers
- IMF2DBuffer::Lock2D
- MF_MT_VIDEO_NOMINAL_RANGE attribute
- MFVideoTransferMatrix enumeration
- Video Processor MFT
- Uncompressed RGB Video Subtypes
Related Topics
These topic pages place the article in a broader service and decision context.
Windows Technical Topics
Topic hub for KomuraSoft LLC's Windows development, investigation, and legacy-asset articles.
Where This Topic Connects
This article connects naturally to the following service pages.
Windows App Development
This topic maps directly to Windows media-processing implementation work that uses Media Foundation, Source Reader, image export, and frame conversion.
Technical Consulting & Design Review
If the main need is to sort out color conversion responsibilities, stride handling, and the right YUV-to-RGB path before implementation, this also fits technical consulting and design review.