Sonntag, 14. Februar 2016

Credit the first or last uploader on meta::cpan?

Credit problem


This post is related to the post the "credit the last uploader" problem by RJBS.

A related thread on twitter caused IMHO more misunderstandings than solutions. Twitter is not the best medium to discuss complex problems.

RJBS still did a deep analysis in the above referenced post.

In summary I agree mostly with the opinions. But the suggested solution

MetaCPAN Web should stop showing the name and image of the latest uploader as prominently, and should show the first-come user instead.

isn't IMHO an improvement. It solves a problem and creates a new one.

Persons in the life cycle


At the minimum a distribution on CPAN has an author and an uploader. In many cases the author is the same person as the uploader. Maybe the distribution has users. Also there can be persons submitting issues or patches.

Distributions can also have co-maintainers, persons with the permission to upload new versions of the distribution.

Persons without permission can upload non-authorized versions. They are available but marked as non-authorized.

During the life of a distribution the maintainers (people with upload permission) can change.

In my opinion everybody contributing to a distribution should be attributed, including submitters of issues. Even bad contributions which are removed later improve the quality. In free software with unpaid developers honor and attribution is important for motivation.

Prominent person


The release page of a distribution on meta::cpan shows in the top right corner name and image of one person, the last uploader.

Now RJBS means that the first uploader of a distribution is more important.

Maybe, if the first uploader is still very active. But what does "very active" mean and how cam meta::cpan measure it? It can't.

Imagine distributions first uploaded 15 years ago. Meanwhile maintainers changed, features where added, code completely rewritten. Which person should be displayed?

Only one


If only one can be displayed, who should it be?

In my role as user I prefer the last uploader. For me it is important to get a feeling of quality, which is best associated with a picture and a name of a person. This sounds like pre-justice, and it is. Experience and trust is a main factor of human decisions. The picture and name of a person work like a brand of a computer manufacturer. 

But if person A writes most of the code and person B is the last uploader, shouldn't A be displayed? No, as a user I prefer B, because the last uploader did the last quality check.

Solution


Let's look at very successful solutions like github. Github allows repositories owned by organisations (all members have permissions) or single persons. Organisations are not possible on PAUSE without change.

Another idea inspired from github could be to display miniature pictures of all persons. Github displays all contributors. In case of CPAN this could be all current persons with upload permissions. I prefer this solution.

If there is no better solution, nothing should be changed. The change suggested by RJBS can harm the motivation of maintainers, some could feel it like a slap into the face.

Anyway there should be broad consensus and not a decision by one or a few persons.

Edit: Some hours after finishing this post I detected a similar post by Neil Bowers: It takes a community to raise a CPAN module.

Dienstag, 19. Januar 2016

Porting LCS::BV to Perl 6 - a fast core loop for Algorithm::Diff

Christmas happened and Perl6 is now reasonable stable. It's time to use it for real problems, learn more about pros and cons, detect the elegance.

Porting my CPAN distribution LCS::BV is a nice task. It's short code, has full test coverage and allows benchmarks. Also it could be expected that LCS::BV (a bit-vector implementation of the Longest Common Subsequence algorithm) will be faster than the Perl6 port of Algorithm::Diff and the Perl6 module Algorithm::LCS. LCS::BV is as fast as the XS version of A::D, see my blogpost Tuning Algorithm::Diff.

Test dependencies

LCS algorithms are complex beasts. It's nearly impossible to implement them without good test cases. One problem is, that these algorithms find a sequence of maximum length, but only one of maybe more possible solutions. All possible solutions are provided by the slow CPAN distribution LCS and the method allLCS() as explained in a another blog post. So I need to port this to Perl 6 first.

To compare the results during tests Test::Deep is needed. Should be no problem, sure somebody already ported it. Or not? No, there is no Test::Deep on modules.perl6.org. So I need to port this too, or the part needed. Or not?

This is the relevant part of the tests in Perl 5:


use Test::More;
use Test::Deep;

use LCS;
use LCS::BV;

cmp_deeply(
    LCS::BV->LCS(\@a,\@b),
    any(@{LCS->allLCS(\@a,\@b)} ),
    "$a, $b"
);

done_testing;

Trying to write a subroutine from bottom I got the following solution using any():


# say so $[[2,3],[2,4],] eqv $[[[0,1],[2,3],],[[0,1],[2,4],]].any;
  
my sub any_of ($result, $expected) {
  return so $result eqv $expected.any;
}
  
ok(
  any_of(
    LCS::BV::LCS($A, $B),
    LCS::All::allLCS($A, $B),
  ),
  "'$a', '$b'",
);

That's an example how many features of Perl 6 are worth the half of CPAN.

Length of an integer

In Perl 5 bit operations are limited to length of an integer. In most implementations this is 64 bits, but can be 32 bits. If an array is mapped to bits it needs ceil(array_length/integer_width) integers.

This needs the following code in Perl 5:


our $width = int 0.999+log(~0)/log(2);

use integer;
no warnings 'portable'; # for 0xffffffffffffffff

my $positions;

# $a ... array of elements, e.g. ['f','o','o']

$positions->{$a->[$_]}->[$_ / $width] |= 1 << ($_ % $width) 
    for $amin..$amax;

# gives {'f' => [ 0b001 ], 'o' => [ 0b110 ], }

Doing this the first time in Perl 6, I wondered how long integers are and how to find out. Took me very long with the result: infinite. That's awesome.

This part ported to Perl 6:


my $positions;
$positions{$a[$_]} +|= 1 +< $_  for $amin..$amax;

This is approximately half the code. You can imagine that the rest of the code also gets simpler. It also is the half.

Just 30 lines

For rosettacode.org I adapted the code for strings and removed the usual prefix and suffix optimization, see Bit Vector:


sub lcs(Str $xstr, Str $ystr) {
    my ($a,$b) = ([$xstr.comb],[$ystr.comb]);

    my $positions;
    for $a.kv -> $i,$x { $positions{$x} +|= 1 +< $i };

    my $S = +^0;
    my $Vs = [];
    my ($y,$u);
    for (0..+$b-1) -> $j {
        $y = $positions{$b[$j]} // 0;
        $u = $S +& $y;
        $S = ($S + $u) +| ($S - $u);
        $Vs[$j] = $S;
    }

    my ($i,$j) = (+$a-1, +$b-1);
    my $result = "";
    while ($i >= 0 && $j >= 0) {
        if ($Vs[$j] +& (1 +< $i)) { $i-- }
        else {
            unless ($j && +^$Vs[$j-1] +& (1 +< $i)) {
                $result = $a[$i] ~ $result;
                $i--;
            }
            $j--;
        }
    }
    return $result;
}

You can install LCS::BV and LCS::All via Panda or clone them from github.

Summary

Perl 6 has a lot of convenience, allows short and clear code. It's huge. But learning all the nice features is easier than I thought. Understanding e.g. any() and trying it out in the REPL is a magnitude faster, than searching in CPAN, reading docs and trying out some distributions.