D issues are now tracked on GitHub. This Bugzilla instance remains as a read-only archive.
Issue 9673 - Add --incremental option to rdmd
Summary: Add --incremental option to rdmd
Status: NEW
Alias: None
Product: D
Classification: Unclassified
Component: tools (show other issues)
Version: D2
Hardware: All All
: P4 enhancement
Assignee: No Owner
URL:
Keywords: preapproved
: 4686 (view as issue list)
Depends on:
Blocks:
 
Reported: 2013-03-09 06:47 UTC by Andrei Alexandrescu
Modified: 2022-12-17 10:42 UTC (History)
5 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description Andrei Alexandrescu 2013-03-09 06:47:47 UTC
Currently rdmd follows the following process for building:

1. Fetch the main file (e.g. main.d) from the command line

2. Compute transitive dependencies for main.d and cache them in a main.deps file in a private directory. This computation is done only when dependencies change the main.deps file gets out of date.

3. Build an executable passing main.d and all of its dependencies on the same command line to dmd

This setup has a number of advantages and disadvantages. For large projects built of relatively independent parts, an --incremental option should allow a different approach to building:

1. Fetch the main file (e.g. main.d) from the command line

2. Compute transitive dependencies for main.d and cache them in a main.deps file in a private directory

3. For each such discovered file compute its own transitive dependencies in a worklist approach, until all dependencies of all files in the project are computed and cached in one .deps file for each .d file in the project. This computation shall be done only when dependencies change and some .deps files get out of date.

4. Invoke dmd once per .d file, producing object files (only for object files that are out of date). Invocations should be runnable in parallel, but this may be left as a future enhancement.

5. Invoke dmd once with all object files to link the code.

The added feature should not interfere with the existing setup. Users should compare and contrast the two approaches just by adding or removing --incremental in the rdmd command line.
Comment 1 Vladimir Panteleev 2013-03-09 21:06:31 UTC
*** Issue 4686 has been marked as a duplicate of this issue. ***
Comment 2 Martin Nowak 2013-03-11 09:16:31 UTC
(In reply to comment #0)
> 4. Invoke dmd once per .d file, producing object files (only for object files
> that are out of date). Invocations should be runnable in parallel, but this may
> be left as a future enhancement.
> 
It should cluster the source files by common dependencies so to avoid the parsing and semantic analysis overhead of the blunt parallel approach. I think a simple k-means clustering would suffice for this, k would be the number of parallel jobs.
Comment 3 Vladimir Panteleev 2013-03-11 09:22:20 UTC
How would it matter? You still need to launch the compiler one time per each source file with the current limitations.
Comment 4 Martin Nowak 2013-03-11 09:30:48 UTC
You save the time by invoking "dmd -c" k times with each cluster.
Comment 5 Vladimir Panteleev 2013-03-11 09:35:29 UTC
Martin, I think you're missing some information. Incremental compilation is currently not reliably possible when more than one file is passed to the compiler at a time. Please check the thread on the newsgroup for more discussion on the topic.
Comment 6 Martin Nowak 2013-03-11 09:52:41 UTC
(In reply to comment #5)
We should fix Bug 9571 et.al. rather than using them as design constraints.
Of course we'll have to do single invocation as a workaround.
All I want to contribute is an idea how to optimize rebuilds.
Comment 7 Vladimir Panteleev 2013-03-11 10:14:49 UTC
(In reply to comment #6)
> (In reply to comment #5)
> We should fix Bug 9571 et.al. 

Issue 9571 describes a problem with compiling files one at a time.

> rather than using them as design constraints.
> Of course we'll have to do single invocation as a workaround.

Yes.

> All I want to contribute is an idea how to optimize rebuilds.

I think sorting the file list (incl. path) is a crude but simple approximation of your idea, assuming the project follows sensible conventions for package structure.
Comment 8 Andrei Alexandrescu 2013-03-11 10:19:42 UTC
(In reply to comment #2)
> (In reply to comment #0)
> > 4. Invoke dmd once per .d file, producing object files (only for object files
> > that are out of date). Invocations should be runnable in parallel, but this may
> > be left as a future enhancement.
> > 
> It should cluster the source files by common dependencies so to avoid the
> parsing and semantic analysis overhead of the blunt parallel approach. I think
> a simple k-means clustering would suffice for this, k would be the number of
> parallel jobs.

Great idea, although we'd need to amend things. First, the graph is directed (not sure whether k-means clustering is directly applicable to directed graphs, a cursory search suggests it doesn't).

Second, for each node we don't have the edges, but instead all paths (that's what dmd -v generates). So we can take advantage of that information. A simple thought is to cluster based on the maximum symmetric difference between module dependency sets, i.e. separately compile modules that have the most mutually disjoint dependency sets.

Anyhow I wouldn't want to get too bogged down into details at this point - first we need to get the appropriate infrastructure off the ground.
Comment 9 Martin Nowak 2013-03-11 11:55:25 UTC
(In reply to comment #8)
> Great idea, although we'd need to amend things. First, the graph is directed
> (not sure whether k-means clustering is directly applicable to directed graphs,
> a cursory search suggests it doesn't).
> 
I didn't thought about graph clustering.

> Second, for each node we don't have the edges, but instead all paths (that's
> what dmd -v generates). So we can take advantage of that information. A simple
> thought is to cluster based on the maximum symmetric difference between module
> dependency sets, i.e. separately compile modules that have the most mutually
> disjoint dependency sets.
> 
That's more of what I had in mind. I'd use k-means to minimize the differences between the dependency sets of each module and the module set of their centroids.

> Anyhow I wouldn't want to get too bogged down into details at this point -
> first we need to get the appropriate infrastructure off the ground.
Right, but I'm happy to experiment with clustering once this is done.
Comment 10 Martin Nowak 2013-07-19 16:31:36 UTC
Kind of works, but there are not many independent clusters in phobos.
https://gist.github.com/dawgfoto/5747405

A better approach might be to optimize for even cluster sizes, e.g. trying to split 100KLOC into 4 independent clusters of 25KLOC. The number of lines here are sources+imports. Assignment of source files to clusters could then be optimized with simulated annealing or so.