Leveraging MSBuild’s BuildInParallel (with caution)


UPDATE: 12 April 2011 – added /nodeReuse info.

A few months ago I wrote a blog post on Running Targets in Parallel in MSBuild. This presented a simple sample which illustrated how to run a target in parallel over some files. I recently looked at whether an MSBuild automated development build process could be made faster by executing parts of it in parallel.

The build process consisted of around 85 targets in total, with various combinations which could be run together (via MSBuild Explorer) to accomplish different goals. Although a full build was rarely required, it would take around 30 minutes. The goal was to try and reduce this time and in so, the time of the partial builds.

The following process was implemented

  • Define buckets of targets which could be executed in parallel and buckets which could not
  • Create temporary sets of projects to execute
  • Execute the projects in the applicable order, choosing whether to execute in parallel or not
  • Repeat for additional top level targets.

In code, this would look something like this:

<Target Name="TheWork">
    <!– Do your real work on each target here. –>
    <CallTarget Targets="$(TargetToCall)"/>
</Target>
<Target Name="Target1">
    <!– Define sets of parallel and non parallel targets –>
    <ItemGroup>
        <ParallelTargets01 Include="TargetA;TargetB"/>
        <Targets01 Include="TargetF"/>
        <ParallelTargets10 Include="TargetG;TargetC;TargetD;TargetE"/>
        <Targets10 Include="TargetM"/>
    </ItemGroup>

    <!– Create an itemgroup of temporary projects using the current MSBuild file and passing in the files –>
    <ItemGroup>
        <ParallelProjectSet01 Include="$(MSBuildProjectFile)" >
            <Properties>TargetToCall=%(ParallelTargets01.Identity)</Properties>
        </ParallelProjectSet01>
        <ParallelProjectSet10 Include="$(MSBuildProjectFile)" >
            <Properties>TargetToCall=%(ParallelTargets10.Identity)</Properties>
        </ParallelProjectSet10>
        <ProjectSet01 Include="$(MSBuildProjectFile)" >
            <Properties>TargetToCall=%(Targets01.Identity)</Properties>
        </ProjectSet01>
        <ProjectSet10 Include="$(MSBuildProjectFile)" >
            <Properties>TargetToCall=%(Targets10.Identity)</Properties>
        </ProjectSet10>
    </ItemGroup>

    <!– Execute the temporary projects in the applicable order –>
    <MSBuild Projects="@(ParallelProjectSet01)" BuildInParallel="true" Targets="TheWork" />
    <MSBuild Projects="@(ProjectSet01)" BuildInParallel="false" Targets="TheWork" />
    <MSBuild Projects="@(ProjectSet10)" BuildInParallel="false" Targets="TheWork" />
    <MSBuild Projects="@(ParallelProjectSet10)" BuildInParallel="true" Targets="TheWork" />
</Target>

Remember, you must use the /m switch when calling MSBuild.exe for parallelism to be enabled.

So that’ sounds easy enough. The build process has now been optimised to run in parallel and all is good.

Almost…

If you have a fairly basic build, with few dependencies and large bottlenecks which could run in parallel, then you may see a big improvement in your build times. If you have a more complex scenario with multiple buckets, you may not see a considerable improvement. Remember that the end execution of the buckets is synchronous. You may want to take this a level further and identify buckets which could execute in parallel, however I would urge caution as even this first level of parallelism may yield unexpected results.

Here are some potential downsides to using BuildInParallel in this way

  • Your computer may not cope – depending what products your are using, you may hit race conditions or simply max out your system. A typical error would be deadlocked transactions in SQL server.
  • Delayed failure – if one target fails, all those in the bucket will continue to run and MSBuild will only exit once it has completed the directing target. If you are watching the console you could always terminate it manually, but be aware.
  • Locked files – by design, multiple MSBuild processes hang about for re-use. I found that these locked various files during builds which caused the builds to fail. If you experience this, try setting /nodereuse:false as covered in Node Reuse in MultiProc MSBuild.
  • Varying workstation specification – if you don’t max out your workstation, you may max out a colleagues. It’s typical to have varying machine specifications. If you are going to do this, try it on the slowest spec you need to support.
  • Complex logging  – if you are used to reading a nice sequential log file, then the format of the parallel logging may surprise you. It’s not a big issue, but be aware that the log is not as easy to follow as a sequential log.
  • Maintainability, Supportability, Complexity – it may take some a few minutes to understand what is actually going on in the scripts. If you add new targets to your solution, it may also not be immediately clear where they should go. Be sure to document your process.
  • Random failures – this is the worst downside. Unexplainable random failures. It works, then it doesn’t, then it does. There may be an explanation, but do you have the time and resource to investigate and resolve it?

Here are some potential upsides to using BuildInParallel in this way

  • Speed – pure and simple speed. You may be able to save a lot of time on MSBuild automated systems by using this process. It’s definitely worth investigating.

In the end this approach was not adopted and the synchronous process was maintained. The main driver being

  • Reliability – over 50 developers and testers working together need a stable and reliable build process. The build engineering team will typically be a small outfit with limited resource. Because parts of the build could be executed quickly (via MSBuild Explorer), this full build time is not such a  major issue.

I’d be interested in hearing about your experiences. Do you have a different pattern which is working well for you?

Mike

Advertisements

3 thoughts on “Leveraging MSBuild’s BuildInParallel (with caution)

  1. Hi Mike,

    At HP within our lab we use the BuildProjectsInParallel capability of msbuild (4.0) to attempt to utilize our 24 core machines more fully and reduce a 2000 project 45 minute build to 15 minutes.

    However, we’ve encountered what I would label a defect in the msbuild implementation. Specifically in the method Microsoft.Build.Backend.SchedulableRequest.DetectIndirectCircularDependency is a poor implementation that causes msbuild parallelism to not work well on large real world builds.

    MSBuild works great for 75 projects in parallel, however when 100 or more projects are thrown at msbuild (each with their internal DependsUponTargets which invoke msbuild on any of the 100 they depend on) msbuild begins to choke.

    In the sweet spot in terms of project count the build is able to complete an order of magnitude faster due to the parallelism, however with 150 projects it can take more than an order of magnitude longer than a non parallel build.

    Here’s some data working with a set of 150 projects:
    Time to build without /m on the command line: 27 seconds
    Time to build with /m on the command line: over 10 minutes

    In fact analysis shows that almost all of the time is spent spinning inside the method mentioned above. As the number of projects grows that method bogs down even further turning a 30 minute build into an overnight endeavor.

    Roger

  2. Hi Mike,

    I’ve attempted to follow a similar approach to the one described here. I’ve re-worked our deployment script (which is implemented using MSBuild) to break it up into a set of project files and run them in parallel and it looks like there’s a lot of potential for increases in speed (the deployment is run on a server with 8 cores).

    However, I’m having a strange issue with the MSBuild ExtensionPack SqlCmd task. I’ve described the problem on StackOverflow (I’ve described using a new custom task but the problem is related to the spawning of new processes, first encountered when using the SqlCmd task): http://stackoverflow.com/questions/7085185/msbuild-buildinparallel-custom-task-spawning-process-that-fails-to-run

    I wonder if you have seen anything similar or whether you might be able to shed any light on what’s going on here?

    Thanks,
    Tom

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s