When you are debugging an application, there are many tools and techniques you can use, like logs, memory dumps and Event Tracing for Windows (ETW). In this post, we will talk about Time Travel Debugging, a tool used by Microsoft Support and product teams and more advanced users, but I encourage everyone to try this approach when diagnosing hard to find bugs.
Time Travel Debugging
Time Travel Debugging or TTD, is the process of recording and then replay the execution of a process, both forwards and backward, to understand what is happening during the execution. It is vital to fixing bugs when the root cause is not clear, and the symptoms appear moments later when the source of the problem is gone. In a way it's similar to Intellitrace (available in Visual Studio 2017 Enterprise), but while Intellitrace records specific events and associated data - call stack, function parameters and return value -, TTD embraces a more general approach and let you move at a single instruction level and access any process data (heap, registers, stack).
The same way you can debug a native or managed process, you can also use TTD in both cases, including .NET Core, but it is limited to Windows.
In the following sections, we'll describe particularities when debugging a managed process.
It also allows you to rewind and play how many times you want, helping you to isolate the problem when it happened, and setting breakpoints just as you usually do in Windbg. Even when you don't need to rewind, TTD has advantages over Live Debugging, since it doesn't interrupt the process, you can create a trace and analyze it offline, but be aware that TTD is very intrusive and ideally you shouldn't use it for more than a few minutes or the traces files can become very large (>5GB).
The demo application is a simple Windows Forms bugged applications that writes a log file to disk. You can download the code from GitHub and compile it, or download the binaries. The machine where we'll record the traces must be a Windows 10 version 1809, Windows Server 2019 or newer.
Open the application and click on Save several times and after some seconds it doesn't work anymore:
Recording a trace
You can use WinDbg Preview to record the trace, but for this Lab, we'll use the built-in TTTracer.exe available on Windows 10 (1809 or newer) and Windows Server 2019, because sometimes it's impossible to install tools in a production environment.
1. Open the LabWindbgTTD.exe in the target machine and take a note of its PID:
2. Open a Command Line as admin and enter the following command, replacing <PID> with the PID of the process:
TTTracer -attach <PID>
Now, the TTD is recording the process execution.
3. Go to the LabWindbgTTD application and click on Save several times until you receive the error. After the error appears, in a different Command Prompt (with admin privileges), execute the following command to stop the trace:
TTTracer -stop <PID>
4. You are going to be notified about the trace stop in the first Command Prompt:
5. TTTracer will generate two files: LabWindbgTTD01.out and LabWindbgTTD01.run
Replay and Analyze
Copy both files to your machine, you can use WinDbg or WinDbg Preview to analyze it and don't forget to set up the symbols to be able to resolve the function names. We'll use WinDbg Preview, but the steps are similar in WinDbg, click File -> Open Trace File and select the ".run" file:
When debugging a TTD file, you can Step Into (t), Step Out and Step Over (p) like you do when Live Debugging:
You can see the current Time Travel Position when stepping through the application, like in the image above.
Using the !tt <POSITION> command you can navigate to a specific position:
The !positions command show the positions for all threads:
But it gets interesting when using the Step command contrariwise, instead of p, t, and g, you can execute p- (Step Back), t- (Trace Back), and g- (Go Back):
Loading Data Access Component and SOS debugging extension
When debugging .NET applications in a different machine from where the dump or trace was created, you need to copy the Data Access and SOS dlls:
- For .NET Framework: mscordacwks.dll and SOS.dll in C:\Windows\Microsoft.NET\Framework\v4.0.30319\ (or C:\Windows\Microsoft.NET\Framework64\v4.0.30319\ for 64 bits processes)
- For .NET Core: mscordaccore.dll and SOS.dll in C:\Program Files\dotnet\shared\Microsoft.NETCore.App\X.X.X\
And execute this command in WinDbg:
.cordll -ve -u -lp <PATH>
Finding the problem
Now we know how to navigate into the TTD file, how can we find the bug?
Let's try to stop when the error occurs, to do this you could use "sxe clr;g" like you would do in a Live Debug, but the TTD extends the Session and Process data model objects, exposing events like Exceptions. To see them all, execute "dx @$curprocess.TTD" and "dx @$cursession.TTD":
We can take advantage of this feature and filter the Exception events, "dx -r2 @$curprocess.TTD.Events.Where(t => t.Type == "Exception").Select(e => e.Exception)":
Click on [Time Travel] to navigate to the moment when the Exception was thrown and execute !pe to see the System.ObjectDisposedException.
You can now execute !clrstack to see exactly which method is throwing the Exception, but it isn't helpful since the BtnSave_Click just call StreamWriter.WriteAsync and is not disposing the object.
In this case, a log containing the Stack Trace, or a Dump file wouldn't help. The application is small and simple enough that looking the code you would easily find the problem, but let's continue the analysis using the WinDbg.
Execute !dso (or !DumpStackObjects) to see the objects in the current stack and click on StreamWriter address.
Click the address to execute !do (or !DumpObj) that shows the details of the object, including its fields, where we can see the stream is null, which means it is disposed.
We know that at this point the StreamWrite.Dispose method has been called and we need to find out who called it. Set a breakpoint to this method and continue the execution in reverse, "!bpmd mscorlib.dll System.IO.StreamWriter.Dispose;g-":
You'll stop at Dispose method of StreamWriter. Execute !dso again, we can see a StreamWriter in the same address as before, let's inspect the object and the underlying stream to find more details about it.
The object address may be different because a Garbage Collection happened or just because we are looking at a different instance of StreamWriter. In this case, you would need to check if the object is the same.
Another option, to see if the object is the same, is to use !GCRoot to find references to the object, this way we can see if in both moments the StreamWriter object is the LogFile field in Form1.
If it is not the object you are looking for, execute g- again until you find it, then execute !clrstack to show the Stack Trace and find the method that is Disposing the StreamWriter:
TTD makes it viable to analyze many scenarios that would be extremely difficult to reproduce or to collect the right data. The possibility of going back and forth is powerful and has the potential to greatly reduce troubleshooting time.
Microsoft Docs has many more details about TTTracer and Windbg Preview.