Issue 6133 - Improvements to RedBlackTree
Summary: Improvements to RedBlackTree
Status: NEW
Alias: None
Product: D
Classification: Unclassified
Component: phobos (show other issues)
Version: D2
Hardware: All All
: P4 enhancement
Assignee: No Owner
URL:
Keywords: bootcamp
Depends on:
Blocks:
 
Reported: 2011-06-09 09:17 UTC by Carlos Ballesteros Velasco
Modified: 2024-12-01 16:14 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description Carlos Ballesteros Velasco 2011-06-09 09:17:18 UTC
I have made some modifications to RedBlackTree to include a new template argument "hasStats" that will include information about the count of the left and right childs.

I have also included some new methods that allow to do in O(log(N)) time the following things:
* Determine the length of a RedBlackTree.Range (previously you had to iterate over all the elements to get that information) the only thing you were able to do in constant time was get the length of the full collection.
* Determine the position of an element in the ordered list in O(log(N)) (that is the number of elements that are lesser than one Node)
* Slice elements from a Range by position (something like SKIP and LIMIT from SQL) in O(log(N))

This is extreme useful to do rankings. For example: I have a tree with users that have a score and I want to know the position of a User ordered by the score. Also I want to get the users nearby and do pagination without having a linear cost.

With hasStats to true, the size of Nodes will be 28 instead of 20 on DMD 2/32bits and will allow counting up to 2^^32 nodes. Inserting and deleting is a bit slower, but allows to do some new operations in O(log(N)).

The class template definition changed from:
class RedBlackTree(T, alias less = "a < b", bool allowDuplicates = false)
to:
class RedBlackTree(T, alias less = "a < b", bool allowDuplicates = false, bool hasStats = false)

RedBlackTree.Range now contain these new methods. All O(log(N)):
- Range skip(int skipCount)
- Range limit(int limitCount)
- int length()

RedBlackTree now contain these new methods. All O(log(N)):
- int countLesser(Node node) and an alias getNodePosition // Obtains the position of a Node.
- Node locateNodeAtPosition(int positionToFind) // Obtains the node that occupies a determinated position
- Range all() // Shortcut to get a range with all elements to querying.

I have made these methods with a fallback when not using hasStats that will perform O(n).

The implementation (with a self-containing a demo comparing times):
http://code.google.com/p/dutils/source/browse/trunk/rbtree_with_stats/rbtree_with_stats.d?r=132
http://dutils.googlecode.com/svn-history/r132/trunk/rbtree_with_stats/rbtree_with_stats.d

This is a sketch and I had to do some modifications: change Range from struct to class, change some fields to public...
I think if will be useful for some people, and since it's a new template option  should not break anything nor being slower or ocupying more memory if not using.

I wanted to share it to see if you think it's useful. If so, I can improve the API, clean things up, make unittesting and create a patch for phobos.

Regards.
Comment 1 Infiltrator 2014-03-19 18:37:36 UTC
Could you please make a github pull request?
Comment 2 Carlos Ballesteros Velasco 2016-12-09 12:18:18 UTC
@Infiltrator Probably it changed to not be compatible. But you can find the original collection in the git history here, if you want to try to merge or PR it:
https://github.com/soywiz/smrr-server/blob/8c89dd5120009880eb82663f161fa9dff0f6a983/server/smr/excollections.d
Comment 3 dlangBugzillaToGithub 2024-12-01 16:14:13 UTC
THIS ISSUE HAS BEEN MOVED TO GITHUB

https://github.com/dlang/phobos/issues/9907

DO NOT COMMENT HERE ANYMORE, NOBODY WILL SEE IT, THIS ISSUE HAS BEEN MOVED TO GITHUB