Strategy for reverse engineering legacy C code (C code)
This topic provides a roadmap to ease the process of reverse engineering legacy code using C Code Reverser, aimed at first time users of the Reverser. It describes the steps required for the most common usage patterns, and highlights possible pitfalls. It assumes that you are familiar with the UML, C and compiler concepts.
When to reverse engineer code
There are three common scenarios (use cases) for reverse engineering code:
To reverse in existing handwritten code (a.k.a. 'legacy' code) to better visualize, understand and document it than is possible with just code. This will also enable further software development to be done in the more productive environment of Modeler.
To reverse in existing code into a model to generate code using a different language.
To reverse in a library interface.
What the Reverser does when it reverse engineers code
The Reverser does its work in two stages:
It parses the code files selected for reverse engineering and reports any errors. It is important that you correct parsing errors before continuing.
It then performs the actual reverse engineering to update the Class Model from the selected code files.
Important Considerations
The subsections that follow detail the issues – the solutions are in the final section.
The order of your files is important
The Reverser uses a one pass pre-processor, and this means that the order of the source files to reverse can be significant, in particular when reversing .h files (.c files do not generally include other .c files – just .h files which have to be in the correct order anyway to compile correctly). The order is important because the .h files contain the struct definitions, and structs may reference each other, so we need to avoid forward references. This is exactly the same problem that compilers have when they are following the #include statements in your source files. If you using a precompiled header file, the code file from which the precompiled header file is built must be first in the list.
It's better to use .c rather than .h files
The Reverser is primarily looking for the struct declarations in the source files. These are (normally) found in '.h' files; however, the easiest way in many projects to include your '.h' files in an acceptable order is to point the Reverser at the '.c' files instead (see above). The Reverser will then follow the #include statements in the '.c' files in order. The other reason it is best to pick the '.c' files is that there is often additional information in them that is not in the header, such as comments and also named parameters (rather than just data-types). The reversal of such information will add useful information to your model.
What to do with libraries
Your source code will #include headers from the compiler, and other libraries. Do you want these libraries in your model? These libraries are potentially very large – often very much larger than the bespoke part of your application. Normally you don't want these class definitions in your model, as you will not be changing them, merely using them. Importing such definitions will just increase the size and complexity of your model and greatly increase the time taken for the Reverser to run. You have three options for reverse engineering each library:
Do not reverse engineer the library. References to the library are captured as text. This approach minimizes the Model size and the time taken to reverse engineer code.
Reverse engineer used elements of the library only. This approach allows you to see the Library files and used elements in your Model, whilst not wasting time reversing unused library elements.
Reverse engineer the complete library. This approach allows you to see the complete library file in your Model, which can be useful if you want to use currently unused library elements in the future.
Conditional Compilation
Most non-trivial C projects involve #ifdefs that result in conditional compilations. For example there may be multiple definitions of a 'Timer' struct to interface to different hardware or OS combinations - only one is actually compiled based on which of 'OS1' or 'OS2' is defined. Just as a compiler needs to know the specific definitions to compile correctly, so too does the Reverser. The Reverser reverse engineers only one configuration of your code.
Dealing with very large projects
Even once you have excluded standard libraries from your reversing, some projects are still very large. As the Reverser checks each item it is reversing against the existing model (or what it has already reversed in the case of legacy reversal), the speed at which reversing proceeds decreases as model size increases. As a result of this, for very large projects it is often a good idea to split the project up along natural interfaces and place each smaller project in a separate directory.
Each smaller project can then be reverse engineered into a separate package, so the directory mappings for each project can specify what is within the scope of each smaller project, and what is not. If you have co-dependencies, the first package you reverse engineer may need to be reverse engineered again after the subsequent packages are reverse engineered, so that all references are modeled in the model as links.
We recommend that you create a separate Model Settings File for each smaller project in the model.
As the question 'How large is very large (when it comes to reversing)' is affected by so many factors (number of modules, size of modules, connectivity between modules, code style, power of computer and length of wait that is acceptable) it is best determined by experiment and experience – try it on your model and see. As a real 'finger in the air' estimate, though, the upper limit for a manageable reversal is of the order of a few hundred modules or a hundred thousand lines of code.
Do you want to reverse engineer function bodies
Through the 'Reverse Engineer Code Bodies' check box on the Reverse Engineering Options 1 page, you can choose to either reverse engineer function bodies or not.
Shared or personal model settings files
Model Settings Files record Reverser settings used for reverse engineering code. By default, the Model Settings File for a model is saved locally to your Modeler installation folder. You can you can save a Model Settings File to a shared directory, so that everyone uses the same settings for reverse engineering code. All users that use a shared Model Setting File must use the same paths for #includes, #defines and files parsed.
In addition, you can use Model Settings Files to maintain different configurations, such as, code with and without debug constructs.
Order of Operations for Reversing
Note that there is considerably more information for the tool operations specified below in the help.
1. Decide on your projects and their contents
Consider the earlier subsections 'What to do with libraries' and 'Dealing with very large projects'.
Decide whether to reverse into one or a number of destination packages. If you decide on several then you need to decide how to split up the source code between the projects. After you have decided on the split you will probably need to move files around, to make sure that each project has its own distinct directory or directories, from which it reverses into its Class Model. If you don't do this (that is, if some directories contain class definitions meant to go to different projects) you will end up with classes defined in more than one project, which will cause problems at integration time.
If you split a large project into several projects, create a model settings file for each project.
All following steps are on a 'per project' basis
2. Decide on the root directory
The root directory defines what code the Reverser is treating as application code (code within the root directory) and what code is treated as library and external code (code outside the root directory).
3. Specify the files to reverse
See the earlier subsection 'The order of your files is important'.
If you have a dsp, vcproj or batch file (created through your make utility - not the make file) for your project, on the Select Model page, click the Project File button, and then select the dsp, vcproj or batch file. The dsp or vcproj file is easier to use than the batch file, because you have to specify additional information when using the batch file.
To include the source files in the required order, one of the following mechanisms is recommended, find the first one you can apply, they get more involved as you go down the list:
If the code is based on Visual Studio, then reference the .dsp or vcproj project file through the Project File button on the Select Model page. The list of files to reverse engineer will be populated in the correct order from information in the dsp or vcproj file. It will also set up the Reverser pre-processor define and include definitions.
If you have a batch file created through your make utility for your project (not the make file), then reference the batch file through the Project File button on the Select Model page. The list of files to reverse engineer will be populated in the correct order from information in the batch file. It will also set up the Reverser pre-processor define and include definitions. By its nature, the make file format is difficult to analyze, so while this approach is well worth trying, the Reverser will not always succeed in finding the correct information. You should cross-check the results it produces.
If you have no dsp, vcproj or batch file (created through your make utility) for your project then you will need to add .c source files (if available) or .h files (if .c files are not available when you are reversing the interface to a precompiled library).
Having specified code files to reverse engineer, you must map those files to target packages in the Model. You do this on the Reverse Engineering Options 3 page by mapping a folder that contains the files selected for reverse engineering (either directly or through any of the folder's subfolders) to a Package in the Model. By default, the Root Directory you specified for your project is mapped to the Root Object you specified in the Model, even though this mapping is not listed as a mapping:
If the Root Directory owns all of the code files selected for reverse engineering (either directly or through any of the Root Directory's its subfolders), all the code files will be reverse engineered because of the default Root Directory to Root Object mapping.
If you have selected files (or they get #included) for reverse engineering that are not owned by the root directory or any of its sub folders, you must map folders to Packages in the Model so that each code file selected for reverse engineering is reverse engineered.
* 
If you are using a precompiled header file, the code file from which the precompiled header is built must be first in the list.
Each code file you select for reverse engineering is reverse engineered only if their owning folder or one of the owning folder's parent folders is mapped to a Package (either the Model itself or a Package in the Model).
4. Set up your compiler #defines for the Reverser
On the Reverse Engineering Options 2 page, you must specify the pre-processor variables (equivalent to the #define statements in the code) needed to resolve how to reverse code segments within #ifdef and similar directives, and how to resolve macro substitutions.
If you selected a dsp, vcproj or batch file (created through your make utility) for your project on the Select Model page, the #defines list will be populated from information in the selected file.
* 
If you have code that causes Reverser parsing errors but is valid for your compiler, you can hide that code from the Reverser parser through the RTS_SYNC_INVOKED #define. RTS_SYNC_INVOKED is automatically defined when the Reverser is running, so you can make the code that causes parsing errors conditional using #ifndef RTS_SYNC_INVOKED so that the Reverser ignores it.
5. What to do with libraries and other code
The code files you have selected for reverse engineering may be dependent on other code files, such as the MFC library. The Reverser parser needs to know the paths in which dependent code files reside to correctly parse the code files selected for reverse engineering. You determine how libraries and other code files that are referenced through #includes are dealt with on the Reverse Engineering Options 3 page:
If you do not want to reverse engineer a used library, list the #include path but do not map the path to a Package in the Model. Code files that use the library will parse successfully and references to the library will be captured as text.
If you want to reverse engineer only used elements of the used library, list the #include path and map the path to a Package in the Model. The Reverser will reverse engineer only the used elements of the library.
If you want to reverse engineer the complete used library, you must reverse engineer the library as a selected file before reverse engineering your code files. You then reverse engineer your code files, list the #include path and map the path to the Package in which the library resides. For more information about reverse engineering libraries, see the Reverse Engineering Libraries section that follows.
If you do not set a #includes path, #include statements will fail and are likely to then cause further parsing errors that may prevent your own code from reverse engineering correctly; however, if you are experiencing memory problems when reverse engineering large quantities of code, not setting a #include path will reduce the amount of memory required by the reverse engineering process.
If you selected a dsp, vcproj or batch file (created through your make utility) for your project on the Select Model page, the #defines list will be populated from information in the selected file.
* 
If you are working with C code, you must ensure that the #INCLUDE path check box is selected for each path. If the #INCLUDE path check box is cleared and the path is not mapped to a package, the path is ignored.
6. Reverse engineering libraries
If the library you want to reverse engineer has been previously stored in an integrated configuration management tool (CM tool), you can add the appropriate CM tool package to your Model, rather than reverse engineering the library to your Model.
If the library you want to reverse engineer has been previously reverse engineered to another Model, you can export the Package to a directory and then import that Package to your Model, rather than reverse engineering the library to your Model.
Typically you will reverse engineer only the library header files, in which case the order they are reverse engineered may be important. To ensure they are reverse engineered in the correct order, create an 'all.c' file that lists the header files in a compilable order through #includes. Reverse engineer the all.c file to the required Package in the Model.
7. Review parsing errors
On the Parsing Complete page, resolve #include errors first, by adding search paths, then repeat the parsing process. Expect to get some of these even if you used a project or batch file (created through your make utility) to load your include paths as some compiler manufacturers hard code paths to core libraries which then do not need to be specified in the project or batch file, but are still needed.
After there are no #include errors, eliminate the pre-processor definition errors by adding or changing the #define definitions, and repeat the parsing process until there are no errors.
8. Reverse engineer the code files
After there are no errors reported on the Parsing Complete page, click Next to Reverser engineer the code files.
Was this helpful?