Monday, March 11, 2024

Keep an eye on your Native API prototype for interop calls

Few weeks back, we have observed "an unexpected process termination" crash in a WPF app. From the crash dump, provides below callstack and unfortunately it was a a memory corruption

00 006fea30 77172e5c     0x770f7000

01 006fea30 56d0df34     ntdll!NtTerminateProcess+0xc

02 006fea30 56d0b9bc     verifier!VerifierStopMessage+0x344

03 006fea9c 56d0bdda     verifier!AVrfpDphReportCorruptedBlock+0x2fc

04 006feaf8 56d0c2d2     verifier!AVrfpDphCheckNormalHeapBlock+0x11a

05 006feb18 56d0ab23     verifier!AVrfpDphNormalHeapFree+0x22

06 006feb3c 771dfa16     verifier!AVrfDebugPageHeapFree+0xe3

07 006feba4 77143d76     ntdll!RtlDebugFreeHeap+0x3e

08 006fed00 77187add     ntdll!RtlpFreeHeap+0xd6

09 006fed5c 77143c46     ntdll!RtlpFreeHeapInternal+0x783

0a 006fed78 754e3320     ntdll!RtlFreeHeap+0x46

0b (Inline) --------     combase!CRetailMalloc_Free+0x16 [onecore\com\combase\class\memapi.cxx @ 656] 

0c 006fed90 723cf422     combase!CoTaskMemFree+0x30 [onecore\com\combase\class\memapi.cxx @ 445] 

0d 006fedc4 08ef0bc0     mscorlib_ni!System.StubHelpers.CSTRMarshaler.ClearNative+0x2e [f:\dd\ndp\clr\src\BCL\system\stubhelpers.cs @ 125] 

0e 006fee6c 08ef0937     0x8ef0bc0

0f 006fee84 08ef08c3     CppAndCs!CppAndCs.Program.GetNativeBuffer+0x4f

10 006fee98 73df0556     CppAndCs!CppAndCs.Program.Main+0x23

11 006feea4 73df373a     clr!CallDescrWorkerInternal+0x34

12 006feef8 73df9adb     clr!CallDescrWorkerWithHandler+0x6b

13 006fef6c 73f6ff6b     clr!MethodDescCallSite::CallTargetWorker+0x16a

14 006ff090 73f7064a     clr!RunMain+0x1b3

15 006ff2fc 73f70577     clr!Assembly::ExecuteMainMethod+0xf7

16 006ff7e0 73f706f8     clr!SystemDomain::ExecuteMainMethod+0x5ef

17 006ff838 73f7081e     clr!ExecuteEXE+0x4c

18 006ff878 73f6c225     clr!_CorExeMainInternal+0xdc

19 006ff8b4 74dffa84     clr!_CorExeMain+0x4d

1a 006ff8ec 74f0e81e     mscoreei!_CorExeMain+0xd6

1b 006ff8fc 74f14338     MSCOREE!ShellShim__CorExeMain+0x9e

1c 006ff914 76cefcc9     MSCOREE!_CorExeMain_Exported+0x8

1d 006ff914 77167c5e     KERNEL32!BaseThreadInitThunk+0x19

1e 006ff970 77167c2e     ntdll!__RtlUserThreadStart+0x2f

1f 006ff980 00000000     ntdll!_RtlUserThreadStart+0x1b

From the callstack, its clear that, it happens when CLR marshaller tries to free the native memory which is temporarily allocated. The methods are defined as shown below.

C#

[DllImport("CppAndCs.dll", EntryPoint = "GetNativeBuffer",

           CharSet = CharSet.Ansi, CallingConvention = CallingConvention.Cdecl)]

          public static extern string GetNativeBuffer(StringBuilder buffer, IntPtr buuferSize);

C++

void GetNativeBuffer (char *buffer, std::size_t buufer_size);

You may have already figured out the problem. Yes, the return value. CLR’s Interop Marshaler is trying to release memory allocated for return value because C# declaration return type is string. In this case, native method does not have any return value so, CLR try to release unallocated memory and Boom!.  

Fix

===

[DllImport("CppAndCs.dll", EntryPoint = "GetNativeBuffer",

           CharSet = CharSet.Ansi, CallingConvention = CallingConvention.Cdecl)]

          public static extern void GetNativeBuffer(StringBuilder buffer, IntPtr buuferSize);

The correct way to return a value from native is to allocate it in native method implementation which will be freed by CLR’s Interop Marshaler later to avoid memory leak


References

========

https://learn.microsoft.com/en-us/dotnet/standard/native-interop/best-practices

https://limbioliong.wordpress.com/2011/06/16/returning-strings-from-a-c-api/


Wednesday, May 15, 2019

How to find owner dll of a MFC window

Recently i had encountered  sporadic dll unload crashes in our applications and it was numerously occurring.

This is a typical callstack for the crash
# Call Site
00 <Unloaded_SampleCOM.dll>+0x2362a4
01 user32!UserCallWinProcCheckWow
02 user32!DispatchMessageWorker
03 mfc140!AfxInternalPumpMessage
04 mfc140!CWinThread::Run
05 mfc140!AfxWinMain
06 MyApp!_security_check_cookie
07 MyApp!_security_check_cookie
08 kernel32!BaseThreadInitThunk
09 ntdll!RtlUserThreadStart

To get a meaningful callstack i had to load the dll manually.
0:000> .reload /unl SampleCOM.dll
0:000> kpn
# Call Site
00 SampleCOM!AfxWndProcDllStatic(struct HWND__ * hWnd = 0x00000000`0006084c, unsigned int nMsg = 0x31a, unsigned int64 wParam = 0x8c0009, int64 lParam = 0n1)
01 user32!UserCallWinProcCheckWow
02 user32!DispatchMessageWorker
03 mfc140!AfxInternalPumpMessage
04 mfc140!CWinThread::Run
05 mfc140!AfxWinMain
06 MyApp!_security_check_cookie
07 MyApp!_security_check_cookie
08 kernel32!BaseThreadInitThunk
09 ntdll!RtlUserThreadStart

I had made below assumption from above callstack.
An application is trying to send WM_THEMECHANGED message to an unknown window in SampleCOM.dll.

The question was how does it happen to an unloaded dll?
The first thing flashed across my mind is to log window information like Window handle, Window title and Window class during dll unload. I have selected typical method by enumerating all windows using EnumWindows() and EnumChildWindows() and it is logged from DllCanUnloadNow() of SampleCOM.dll. But there are hell lot of windows. I was only curious about windows which is created from unloaded dll.

I was thinking about an option to check whether a window owns to a particular dll? Gotcha!  there is an option if it is MFC window. Thanks to those days with debugging MFC source code. We know there is a handle map in MFC which keeps window handles. It is basically used to implement the mapping mechanism of Windows object handles to its corresponding MFC wrapper class pointers. It manages two dictionaries internally (implemented as CMapPtrTpPtr) to keep track of handle-pointer pair mapping. The two maps are purposefully named as m_permanentMap and m_temporaryMap. These maps can be accessed through CWnd::FromHandle() and CWnd::FromHandlePermanent() methods respectively. In our case we need need CWnd::FromHandlePermanent(). This function returns valid CWnd, if the window own to our dll. The most important thing that need to take care in this case is module state. It should be switched using AFX_MANAGE_STATE() macro.
BOOL CALLBACK CLogWndInfo::EnumThreadWndProc(_In_ HWND hwnd, _In_ LPARAM lParam)
{
AFX_MANAGE_STATE(AfxGetStaticModuleState())
CWnd* pWnd = CWnd::FromHandlePermanent(hwnd);
if (nullptr != pWnd)
{
LogWindow(_T("Found"), hwnd);
}
EnumChildWindows(hwnd, CLogWndInfo::EnumChildProc, 0);
return TRUE;
}

At this moment i have got the window information required. But here is the catch. The window is in another COM dll. Why did it happen so?
The real issue was due to the use of an MFC window in ATL. The  ATL coclass has a member(not a pointer) which is MFC window. Following is the issue scenario

1. SampleCOM.dll requests ATL object which is in another dll.
2.  MFC window object inside ATL class is constructed. MFC sets the object in to its handle map during CWnd object creation(not during CreateWindow). So here it has already set to handle map but the module state is still pointing to SampleCOM.dll because there is no module state handling in ATL class.
Hence for MFC framework,SampleCOM.dll is the owner of this MFC window in ATL class.  Here due to this wrong ownership, application is trying to invoke Window Procedure Entry Point of SampleCOM.dll when a message corresponds to MFC window is received.

The best fix i have is to invoke AFX_MANAGE_STATE() before the MFC object creation. The MFC stack objects are changed to pointer type to switch the module state before object creation.

Reference
https://docs.microsoft.com/en-us/cpp/mfc/tn003-mapping-of-windows-handles-to-objects?view=vs-2019

Thursday, March 21, 2019

Beware AfxGetInstanceHandle() call In MFC Extension dll

This crash was first in top charts in our product's latest release. It was with a mysterious call stack contains only MFC/Win32 calls and a freed(MEM_FREE) address.

 # Call Site
00 0x00000001`1cd15c70
01 user32!DispatchHookA
02 user32!CallHookWithSEH
03 user32!_fnHkINLPMOUSEHOOKSTRUCTEX
04 ntdll!KiUserCallbackDispatcherContinue
05 win32u!NtUserCallNextHookEx
06 user32!CallNextHookEx
07 user32!DispatchHookA
08 user32!CallHookWithSEH
09 user32!_fnHkINLPMOUSEHOOKSTRUCTEX
0a ntdll!KiUserCallbackDispatcherContinue
0b win32u!NtUserCallNextHookEx
0c user32!CallNextHookEx
0d ieframe!TLSMouseHookProc
0e user32!DispatchHookW
0f user32!CallHookWithSEH
10 user32!_fnHkINLPMOUSEHOOKSTRUCTEX
11 ntdll!KiUserCallbackDispatcherContinue
12 win32u!NtUserPeekMessage
13 user32!PeekMessage
14 user32!PeekMessageA
15 mfc140!CWinThread::Run
16 mfc140!AfxWinMain
17 MyApp!_security_check_cookie
18 MyApp!_security_check_cookie
19 kernel32!BaseThreadInitThunk
1a ntdll!RtlUserThreadStart 

It was pretty simple to say how it was occurred. It was just trying to call a hook procedure in a bogus memory address.

My initial hypothesis was that the hook procedure is in an unloaded dll. So i did some research and  created some sample app to verify it. The result was different from the expected. In this case, call stack must have <Unloaded_DllName> which was negative in crash scenario.

So what happened in this crash?

The reason was strange. A third party library initialization needs the instance handle of caller. It was obtained by AfxGetInstanceHandle() call. Here is the catch. The caller was a MFC extension dll. Extension DLLs do not have their own module state. They take on the state of the calling application or DLL. So it has returned instance handle of caller of extension dll.

I don't know the internal implementation of third party library. It seems, it was using that handle for installing a global hook  using SetWindowsHookExA(). The global hook procedure is actually part of  extension dll since it is linked with third party library. The  third party library tried to invoke hook procedure in caller dll since wrong instance handle is provided. Thus it has crashed.

To fix this  dll instance handle is obtained from dll main.

Wednesday, December 16, 2015

Limiting COM ourproc server instances

A common approach to limit an application to single instance is to create a named kernel objects (mutex,semaphore etc) from main(). This is not always best method and may lead to freeze the client application in case of an outproc server. A typical case is the activation of outproc server from different users or users with different privileges.

An elementary feature of COM server is to launch the server during very first CoCreateInstance() call and  connect the subsequent calls to existing server. If  client calls are from different privileges(For example standard and administrator), it will try to launch multiple server instances and it is a security measure. So the second and subsequent calls will get blocked for 120 seconds and leads to application freeze. It will return failure after 120 seconds.

To limit COM server instance to one, there is another method which can be accomplished by modifying COM identity(.rgs file). By default all COM servers are with Launching User identity unless specified explicitly. This can be changed to This User with the user credentials which is advisable to retain security. Another option is The Interactive user. But it will create multiple instances if client is not in an interactive user session(For example call from Windows Service).

Wednesday, August 12, 2015

Missing ON_WM_MEASUREITEM_REFLECT

Last week i have come across a strange issue in customized CListCtrl. A dialog is having two ListCtrls of them. The row height changes are not getting reflected to the second control while changing font size. 

Why is it happening for the second control or  is it happening only for the second object?
I swapped it for confirmation. Surprisingly the result was same. So the issue is not with the object, it could be with the Tab order. But the control does not have any tab order specific implementation.

To uncover this bizarre, i had gone through MFC source code implementation of CWnd::OnMeasureItem() since it is responsible for row height and width manipulation. As per the implementation of this method, it iterates all the ListCtrls in a dialog using CWnd::GetDescendantWindow() and invokes virtual method CWnd::MeasureItem()CWnd::GetDescendantWindow() method uses control ID to uniquely identify each control. In our scenario, the control ID was same for both ListCtrls and it only invokes  CWnd::MeasureItem() for the first control.

It can happen for other handlers too. So be conscientious while setting the control ID.




Tuesday, September 30, 2014

Display Message From Service/Session 0

The conventional methods to show message box are MessageBox()/AfxMessageBox(). But is it useful if we want to display messages from a service/an application running in session 0?  Yes you can but it is not  convenient. The developer has to enable interactive services
There is an alternative method which uses WTSSendMessage() .

DWORD dwSession = WTSGetActiveConsoleSessionId();
DWORD dwResponse = 0;
LPWSTR lpwszMessage = L"Hellow VC++ From Session 0";
WTSSendMessage(WTS_CURRENT_SERVER_HANDLE, dwSession,L"",0, lpwszMessage, wcslen(lpwszMessage)* sizeof(wchar_t), 0, 0, &dwResponse, FALSE);

Sunday, September 7, 2014

Never initialize UI thread as MTA

Recently we have faced occasional freezing issue in an application.  I think i have to share some background implementation before jumping to the issue straight. It is an outproc STA server which is very similar to mobile app notification. Client applications or dlls can call this to notify some error/warning messages to end-user. This notification window will be always super top-most in z-order. To facilitate that the server iterates though all windows(including client window) and invokes SetWindowPos().


I have created a sample app to replicate the issue and found that SetWindowPos() is freezing during the iteration. It was due to a dead-lock happened between the server and client application.

To understand the dead-lock, we should know how COM communicates internally. Below is a basic description for STA and MTA communication.
STA uses windows messages for COM communication. When a client thread is initialized as STA, it will internally pumps messages while waiting for server response. For MTA it waits using WaitForSingleObject() and there is no message pumping.

In the issue scenario, client UI thread was initialized as MTA and as part of iteration to set z-order server tried to invoke SetWindowPos() on client window. The client was unable to process SetWindowPos() since it was waiting in WaitForSingleObject() which lead to dead-lock. Pretty intriguing right?

The fix is obvious from the title of this post.

Reference
========
https://devblogs.microsoft.com/oldnewthing/20080424-00/?p=22603

Keep an eye on your Native API prototype for interop calls

Few weeks back, we have observed "an unexpected process termination" crash in a WPF app. From the crash dump, provides below calls...