Composable natural sort

Natural sorting means recognising numbers and putting them in the natural order. For example, invoice 3 and invoice 10 would normally sort like this:

  1. invoice 10
  2. invoice 3

But under a natural sort we get this:

  1. invoice 3
  2. invoice 10

The natural sort comparator in the listing breaks a string apart into numeric and non-numeric parts and compares them using a supplied comparator class — this makes the natural sort composable and its behaviour to be customised for things like the sort order, case sensitivity and the string class.

Simple examples

// An ascending sort on std::string
natural_sort<>
// A descending sort on std::string
natural_sort< std::greater< std::string > >
// An ascending sort on wide character strings
natural_sort< std::less< std::wstring > >

The sort can be used with the standard containers:

std::set< std::string, natural_sort<> >
std::map< std::string, int, natural_sort<> >
std::set< std::wstring, natural_sort< std::greater< std::wstring > > >

Use with custom string types

The regex that is used to split the string comes from a specialisation of natural_sort_regex. A new specialisation will need to be supplied for the string type.

// An ascending sort on a custom string type
natural_sort< std::less< ustring > >

Implementation notes

The supplied comparator is checked using static assertions to ensure that it is type compatible. This has the side effect of giving better error messages if an incompatible comparator is given.

The numeric compare is done through padding the number matches with leading zeros so that the same string comparator can be used. This reduces complications arising from trying to automatically generate a compatible numeric comparator, but also means that the comparator must properly order padded numbers — it's unlikely that this will be a problem for any real world comparators.

The padding length is given as a template parameter and defaults to 10.

Only when the regex gives a number for both left and right at the same position does it use the numeric comparison. When numbers are compared with strings they are compared un-padded and the supplied comparator determines their relative order.

Copies of the regexes are returned from natural_sort_regex rather than references to static locals as it is the simplest way to make them thread safe.

By Kirit with contributions by nobody yet.

Recipe source code

#include <string>
#include <boost/regex.hpp>
#include <boost/type_traits/is_same.hpp>

template< typename S >
struct natural_sort_regex;

template<>
struct natural_sort_regex< std::string > {
    static const boost::regex regex() {
        return boost::regex( "(\\d+)|([^\\d]+)" );
    }
};
template<>
struct natural_sort_regex< std::wstring > {
    static const boost::wregex regex() {
        return boost::wregex( L"(\\d+)|([^\\d]+)" );
    }
};

template< typename C = std::less< std::string >, unsigned int padding_length = 10 >
struct natural_sort : public std::binary_function < typename C::first_argument_type, typename C::first_argument_type, bool > {

    BOOST_STATIC_ASSERT( ( boost::is_same< typename C::first_argument_type, typename C::second_argument_type >::value ) );
    BOOST_STATIC_ASSERT( ( boost::is_same< typename C::result_type, bool >::value ) );

    typedef C comparator_type;
    typedef typename comparator_type::first_argument_type string_type;
    typedef boost::match_results< typename string_type::const_iterator > match_results;
    typedef boost::regex_iterator< typename string_type::const_iterator > regex_iterator;
    typedef boost::basic_regex< typename string_type::value_type > regex_type;

    natural_sort() : regex( natural_sort_regex< string_type >::regex() ) {}

    bool operator()( const string_type &l, const string_type &r ) {
        for( regex_iterator lit( l.begin(), l.end(), regex ), rit( r.begin(), r.end(), regex ), end;
                lit != end && rit != end; ++lit, ++rit )
            if ( compare( *lit, *rit ) )
                return true;
            else if ( compare( *rit, *lit ) )
                return false;
        return false;
    }

private:
    comparator_type comp;
    const regex_type regex;
    bool compare( const match_results &l, const match_results &r ) {
        if ( !l[ 1 ].str().empty() && !r[ 1 ].str().empty() )
            return comp( pad( l.str() ), pad( r.str() ) );
        else
            return comp( l.str(), r.str() );
    }
    string_type pad( const string_type &s ) {
        return string_type( padding_length - s.length(), '0' ) + s;
    }
};


#include <iostream>
#include <set>
#include <boost/lambda/lambda.hpp>

int main() {
    std::list< std::string > items;
    items.push_back( "99 bottles 0f b33r" );
    items.push_back( "Date: 13/10/2008" );
    items.push_back( "100 bottles 0f b33r" );
    items.push_back( "Date: 3/4/2008" );
    items.push_back( "Date: 13/4/2008" );
    items.push_back( "13/4/2008" );

    std::set< std::string, natural_sort<> > ascending( items.begin(), items.end() );
    std::for_each( ascending.begin(), ascending.end(), std::cout << boost::lambda::_1 << '\n' );
    std::cout << "\n";


    std::set< std::string, natural_sort< std::greater< std::string > > > descending( items.begin(), items.end() );
    std::for_each( descending.begin(), descending.end(), std::cout << boost::lambda::_1 << '\n' );
    std::cout << "\n";

    return 0;
}

This is not an official Boost site. For more information on Boost please see Boost.org.