Devart Blog

XML Structure Comparison Explained

Posted by on May 22nd, 2013

This article describes benefits of structural comparison of XML files and its limitations. This new feature of Code Compare 3.0 is an extension of structural code comparison for the XML language. In this post we provide examples demonstrating when the feature is useful, and when it is not.

The next morning after the release of Code Compare 3.0, we received feedback from our user. He tried the new feature out and was puzzled by its operation.

Here’s what he wrote:

I downloaded Code Compare, did structural comparison of XML files, but this didn’t seem to work. Two XML files – file 1, and file 2 which was mostly the same but (a) had no newlines and (b) had the order of attributes changed, neither of which change the semantics of the file, so I’d expect them to compare the same. (Except for any actual differences.)

We answered this user in an email. However, we decided that it would not be out of place to shed light on how the new feature works in our blog to avoid any groundless expectations.

We shall consider:

  • 3 examples, where structural comparison of XML is useful.
  • 2 scenarios of file comparison, where this feature is useless and even confusing.

Benefits of structural XML comparison

Under a regular text comparison, files are considered as common sequences of characters in which the application tries to find differences. Herewith, the logic of these differences is practically not taken into account.

The structural code comparison algorithm (and XML in particular) is rather different. The difference is that files are first divided into structural elements (methods or nodes) that are then compared as small sequences of text.

Examples below demonstrate how you can benefit from turning the structural comparison mode on.

Example 1 – Comparing MsBuild projects

Figure 1 shows an example of two small XML files comparison. However differences, as can be seen are considerable. At least they seem to be at first sight.

Figure 1: Two versions of a MS Build project - Structural Comparison is OFF.

Figure 1: Two versions of a MS Build project – Structural Comparison is OFF.

 

Now we’ll turn the structural code comparison mode on and see the result. For this, open the Comparison menu and select Structural Code Comparison. Figure 2 shows the same files. If we look closely at them, we’ll see that the actual difference is only one: the product version was changed. Besides, one of developers just interchanged properties.

Figure 2: Two versions of a MS Build project - Structural Comparison is ON.

Figure 2: Two versions of a MS Build project – Structural Comparison is ON.

Example 2 – Comparing C# project files

The second example of usage is analysis of the modified .csproj file.
The file contains 3 types of changes:

  • file path modification;
  • project files order change;
  • files adding.

Figure 3 shows the comparison result with structural comparison disabled. As can be seen, all changes are mixed up which hinders the analysis.

Figure 3: Two versions of a C# project file - Structural Comparison is OFF

Figure 3: Two versions of a C# project file – Structural Comparison is OFF.

 

After enabling the structural comparison (Figure 4) it is clear what files have been added. This was not so obvious from the previous case.

Figure 4: Two versions of a C# project file - Structural Comparison is ON

Figure 4: Two versions of a C# project file – Structural Comparison is ON.

Example 3 – Comparing .xaml files

The last example of structural comparison usage for XML is shown on the basis of .xaml files comparison. As is often the case when customizing the controls display in a WPF application their properties are often stacked.

Again, we can change attribute values, add, delete and sort them. Figure 5 illustrates an example of such changes.

Figure 5: Two versions of a .xaml file - Structural Comparison is OFF

Figure 5: Two versions of a .xaml file – Structural Comparison is OFF.

 

And this is how these changes look, if structural comparison is turned on (Figure 6).

Figure 6: Two versions of a .xaml file - Structural Comparison is ON

Figure 6: Two versions of a .xaml file – Structural Comparison is ON.

Limitations of structural XML comparison

Now let’s consider the limitations of the new feature. Initially, when implementing this functionality, we chose between complexity of implementation and usefulness. As a result, we preferred implementing a feature that would be useful, though not always, to freezing the new release in order to create a universal functionality.

XML file is not re-formatted

Code Compare preserves the original formatting of XML files. It’s up to you to decide whether it’s good or bad. The difference in white spaces and line breaks can be refined using the Ignore White Space and Ignore Line Breaks comparison options.

XML element attributes written in one line are not recognized

In the example of .xaml files comparison provided above, XML element attributes were located in separate lines. This is what gave us the possibility to compare every attribute individually.

Had the attributes been placed in a single line, they would have been compared as simple character sequences.

No semantics of compared XML files is considered

The comparison algorithm tries to correlate XML elements on the basis of coincident attributes and child elements. But all attributes have the same importance during comparison. This is not always true in a particular domain. One of attributes can be of greater importance then the rest of them.

For example, in a MS Build file, for precise file definition (the Compile element) only one attribute is important, which is Include. Code Compare, however, will take into account the rest of the attributes (for example, Condition) alongside with Include. Thus, substantial changes in the XML text may result in comparison logic failure, when Code Compare will show more changes than there actually are.

In such cases we advise turning the structural code comparison mode off and compare files in the regular mode.

 

Feel free to comment on this post and share what you think.

3 Responses to “XML Structure Comparison Explained”

  1. Tim Vargo Says:

    I see no examples that combine XML “Structured code comparison” AND also “Three-way text comparison and merge”. Is this combination possible? This is what I desperately need.

  2. Alex Serdyuk Says:

    Yes, you can turn ‘Structural Code Comparison’ on while using 3-way text comparison.

  3. Vitaliy Says:

    >> XML element attributes written in one line are not recognized
    This is pity :(
    Could you add support for comparing attributes placed on the same line?

Leave a Reply