Wednesday, May 15, 2019

How to find owner dll of a MFC window

Recently i had encountered  sporadic dll unload crashes in our applications and it was numerously occurring.

This is a typical callstack for the crash
# Call Site
00 <Unloaded_SampleCOM.dll>+0x2362a4
01 user32!UserCallWinProcCheckWow
02 user32!DispatchMessageWorker
03 mfc140!AfxInternalPumpMessage
04 mfc140!CWinThread::Run
05 mfc140!AfxWinMain
06 MyApp!_security_check_cookie
07 MyApp!_security_check_cookie
08 kernel32!BaseThreadInitThunk
09 ntdll!RtlUserThreadStart

To get a meaningful callstack i had to load the dll manually.
0:000> .reload /unl SampleCOM.dll
0:000> kpn
# Call Site
00 SampleCOM!AfxWndProcDllStatic(struct HWND__ * hWnd = 0x00000000`0006084c, unsigned int nMsg = 0x31a, unsigned int64 wParam = 0x8c0009, int64 lParam = 0n1)
01 user32!UserCallWinProcCheckWow
02 user32!DispatchMessageWorker
03 mfc140!AfxInternalPumpMessage
04 mfc140!CWinThread::Run
05 mfc140!AfxWinMain
06 MyApp!_security_check_cookie
07 MyApp!_security_check_cookie
08 kernel32!BaseThreadInitThunk
09 ntdll!RtlUserThreadStart

I had made below assumption from above callstack.
An application is trying to send WM_THEMECHANGED message to an unknown window in SampleCOM.dll.

The question was how does it happen to an unloaded dll?
The first thing flashed across my mind is to log window information like Window handle, Window title and Window class during dll unload. I have selected typical method by enumerating all windows using EnumWindows() and EnumChildWindows() and it is logged from DllCanUnloadNow() of SampleCOM.dll. But there are hell lot of windows. I was only curious about windows which is created from unloaded dll.

I was thinking about an option to check whether a window owns to a particular dll? Gotcha!  there is an option if it is MFC window. Thanks to those days with debugging MFC source code. We know there is a handle map in MFC which keeps window handles. It is basically used to implement the mapping mechanism of Windows object handles to its corresponding MFC wrapper class pointers. It manages two dictionaries internally (implemented as CMapPtrTpPtr) to keep track of handle-pointer pair mapping. The two maps are purposefully named as m_permanentMap and m_temporaryMap. These maps can be accessed through CWnd::FromHandle() and CWnd::FromHandlePermanent() methods respectively. In our case we need need CWnd::FromHandlePermanent(). This function returns valid CWnd, if the window own to our dll. The most important thing that need to take care in this case is module state. It should be switched using AFX_MANAGE_STATE() macro.
BOOL CALLBACK CLogWndInfo::EnumThreadWndProc(_In_ HWND hwnd, _In_ LPARAM lParam)
{
AFX_MANAGE_STATE(AfxGetStaticModuleState())
CWnd* pWnd = CWnd::FromHandlePermanent(hwnd);
if (nullptr != pWnd)
{
LogWindow(_T("Found"), hwnd);
}
EnumChildWindows(hwnd, CLogWndInfo::EnumChildProc, 0);
return TRUE;
}

At this moment i have got the window information required. But here is the catch. The window is in another COM dll. Why did it happen so?
The real issue was due to the use of an MFC window in ATL. The  ATL coclass has a member(not a pointer) which is MFC window. Following is the issue scenario

1. SampleCOM.dll requests ATL object which is in another dll.
2.  MFC window object inside ATL class is constructed. MFC sets the object in to its handle map during CWnd object creation(not during CreateWindow). So here it has already set to handle map but the module state is still pointing to SampleCOM.dll because there is no module state handling in ATL class.
Hence for MFC framework,SampleCOM.dll is the owner of this MFC window in ATL class.  Here due to this wrong ownership, application is trying to invoke Window Procedure Entry Point of SampleCOM.dll when a message corresponds to MFC window is received.

The best fix i have is to invoke AFX_MANAGE_STATE() before the MFC object creation. The MFC stack objects are changed to pointer type to switch the module state before object creation.

Reference
https://docs.microsoft.com/en-us/cpp/mfc/tn003-mapping-of-windows-handles-to-objects?view=vs-2019

Thursday, March 21, 2019

Beware AfxGetInstanceHandle() call In MFC Extension dll

This crash was first in top charts in our product's latest release. It was with a mysterious call stack contains only MFC/Win32 calls and a freed(MEM_FREE) address.

 # Call Site
00 0x00000001`1cd15c70
01 user32!DispatchHookA
02 user32!CallHookWithSEH
03 user32!_fnHkINLPMOUSEHOOKSTRUCTEX
04 ntdll!KiUserCallbackDispatcherContinue
05 win32u!NtUserCallNextHookEx
06 user32!CallNextHookEx
07 user32!DispatchHookA
08 user32!CallHookWithSEH
09 user32!_fnHkINLPMOUSEHOOKSTRUCTEX
0a ntdll!KiUserCallbackDispatcherContinue
0b win32u!NtUserCallNextHookEx
0c user32!CallNextHookEx
0d ieframe!TLSMouseHookProc
0e user32!DispatchHookW
0f user32!CallHookWithSEH
10 user32!_fnHkINLPMOUSEHOOKSTRUCTEX
11 ntdll!KiUserCallbackDispatcherContinue
12 win32u!NtUserPeekMessage
13 user32!PeekMessage
14 user32!PeekMessageA
15 mfc140!CWinThread::Run
16 mfc140!AfxWinMain
17 MyApp!_security_check_cookie
18 MyApp!_security_check_cookie
19 kernel32!BaseThreadInitThunk
1a ntdll!RtlUserThreadStart 

It was pretty simple to say how it was occurred. It was just trying to call a hook procedure in a bogus memory address.

My initial hypothesis was that the hook procedure is in an unloaded dll. So i did some research and  created some sample app to verify it. The result was different from the expected. In this case, call stack must have <Unloaded_DllName> which was negative in crash scenario.

So what happened in this crash?

The reason was strange. A third party library initialization needs the instance handle of caller. It was obtained by AfxGetInstanceHandle() call. Here is the catch. The caller was a MFC extension dll. Extension DLLs do not have their own module state. They take on the state of the calling application or DLL. So it has returned instance handle of caller of extension dll.

I don't know the internal implementation of third party library. It seems, it was using that handle for installing a global hook  using SetWindowsHookExA(). The global hook procedure is actually part of  extension dll since it is linked with third party library. The  third party library tried to invoke hook procedure in caller dll since wrong instance handle is provided. Thus it has crashed.

To fix this  dll instance handle is obtained from dll main.

Keep an eye on your Native API prototype for interop calls

Few weeks back, we have observed "an unexpected process termination" crash in a WPF app. From the crash dump, provides below calls...