Natural sorting means recognising numbers and putting them in the natural order. For example, invoice 3 and invoice 10 would normally sort like this:
But under a natural sort we get this:
The natural sort comparator in the listing breaks a string apart into numeric and non-numeric parts and compares them using a supplied comparator class — this makes the natural sort composable and its behaviour to be customised for things like the sort order, case sensitivity and the string class.
// An ascending sort on std::string natural_sort<> // A descending sort on std::string natural_sort< std::greater< std::string > > // An ascending sort on wide character strings natural_sort< std::less< std::wstring > >
The sort can be used with the standard containers:
std::set< std::string, natural_sort<> > std::map< std::string, int, natural_sort<> > std::set< std::wstring, natural_sort< std::greater< std::wstring > > >
The regex that is used to split the string comes from a specialisation of natural_sort_regex. A new specialisation will need to be supplied for the string type.
// An ascending sort on a custom string type natural_sort< std::less< ustring > >
The supplied comparator is checked using static assertions to ensure that it is type compatible. This has the side effect of giving better error messages if an incompatible comparator is given.
The numeric compare is done through padding the number matches with leading zeros so that the same string comparator can be used. This reduces complications arising from trying to automatically generate a compatible numeric comparator, but also means that the comparator must properly order padded numbers — it's unlikely that this will be a problem for any real world comparators.
The padding length is given as a template parameter and defaults to 10.
Only when the regex gives a number for both left and right at the same position does it use the numeric comparison. When numbers are compared with strings they are compared un-padded and the supplied comparator determines their relative order.
Copies of the regexes are returned from natural_sort_regex rather than references to static locals as it is the simplest way to make them thread safe.
#include <string>
#include <boost/regex.hpp>
#include <boost/type_traits/is_same.hpp>
template< typename S >
struct natural_sort_regex;
template<>
struct natural_sort_regex< std::string > {
static const boost::regex regex() {
return boost::regex( "(\\d+)|([^\\d]+)" );
}
};
template<>
struct natural_sort_regex< std::wstring > {
static const boost::wregex regex() {
return boost::wregex( L"(\\d+)|([^\\d]+)" );
}
};
template< typename C = std::less< std::string >, unsigned int padding_length = 10 >
struct natural_sort : public std::binary_function < typename C::first_argument_type, typename C::first_argument_type, bool > {
BOOST_STATIC_ASSERT( ( boost::is_same< typename C::first_argument_type, typename C::second_argument_type >::value ) );
BOOST_STATIC_ASSERT( ( boost::is_same< typename C::result_type, bool >::value ) );
typedef C comparator_type;
typedef typename comparator_type::first_argument_type string_type;
typedef boost::match_results< typename string_type::const_iterator > match_results;
typedef boost::regex_iterator< typename string_type::const_iterator > regex_iterator;
typedef boost::basic_regex< typename string_type::value_type > regex_type;
natural_sort() : regex( natural_sort_regex< string_type >::regex() ) {}
bool operator()( const string_type &l, const string_type &r ) {
for( regex_iterator lit( l.begin(), l.end(), regex ), rit( r.begin(), r.end(), regex ), end;
lit != end && rit != end; ++lit, ++rit )
if ( compare( *lit, *rit ) )
return true;
else if ( compare( *rit, *lit ) )
return false;
return false;
}
private:
comparator_type comp;
const regex_type regex;
bool compare( const match_results &l, const match_results &r ) {
if ( !l[ 1 ].str().empty() && !r[ 1 ].str().empty() )
return comp( pad( l.str() ), pad( r.str() ) );
else
return comp( l.str(), r.str() );
}
string_type pad( const string_type &s ) {
return string_type( padding_length - s.length(), '0' ) + s;
}
};
#include <iostream>
#include <set>
#include <boost/lambda/lambda.hpp>
int main() {
std::list< std::string > items;
items.push_back( "99 bottles 0f b33r" );
items.push_back( "Date: 13/10/2008" );
items.push_back( "100 bottles 0f b33r" );
items.push_back( "Date: 3/4/2008" );
items.push_back( "Date: 13/4/2008" );
items.push_back( "13/4/2008" );
std::set< std::string, natural_sort<> > ascending( items.begin(), items.end() );
std::for_each( ascending.begin(), ascending.end(), std::cout << boost::lambda::_1 << '\n' );
std::cout << "\n";
std::set< std::string, natural_sort< std::greater< std::string > > > descending( items.begin(), items.end() );
std::for_each( descending.begin(), descending.end(), std::cout << boost::lambda::_1 << '\n' );
std::cout << "\n";
return 0;
}
This is not an official Boost site. For more information on Boost please see Boost.org.