Day 17: New operations for Perl Data Language
If you want to create new PDL operations, it's easy! You can make an ad-hoc one with Inline::Pdlpp, and expand it (or just start) by using pptemplate.
"Genetic engineering?" Santa's chief planner wasn't fully losing her mind, but it was visibly an effort.
"The presents stuff doesn't take up the whole year!" Santa said, defensively. If they were going to analyse genetic data, they might as well do it efficiently. Perhaps PDL could help?
As discussed recently on PerlMonks, processing data in a highly efficient way is already possible using existing PDL operations, and it's easy to engage the automatic POSIX threading too.
Those existing operations are made with PDL::PP, which allows making type-generic, n-dimensional operations for PDL. But taking the first step can feel a bit intimidating. Inline::Pdlpp allows you to make a script with a PP operation in minutes.
Inline::Pdlpp's SYNOPSIS has a working example of making a new PDL operation for processing genetic data:
use strict; use warnings;
use PDL;
use Inline Pdlpp => 'DATA';
# make data with: echo -n 'ATCGZATCG' >input.data
# use it with aa_to_int.pl input.data
my $file; ($file = shift, -f $file) || die "Usage: $0 filename";
my $size = -s $file;
my $pdl = zeroes(byte, $size);
${$pdl->get_dataref} = do { open my $fh, $file or die "$file: $!"; local $/; <$fh> };
$pdl->upd_data;
$pdl->inplace->aa_to_int;
print $pdl, "\n";
__DATA__
__Pdlpp__
pp_def('aa_to_int',
Pars => 'i();[o] o()',
GenericTypes => ['B'],
Inplace => 1,
Code => <<'EOF',
switch($i()) {
case 'A': $o() = 0; break;
case 'T': $o() = 1; break;
case 'C': $o() = 2; break;
case 'G': $o() = 3; break;
default: $o() = 255; break;
}
EOF
Doc => "=for ref\n\nConvert amino acid names to integers.\n",
);
A further improvement to that script might be to mark those unknown letters as bad values:
$pdl->inplace->setvaltobad(255);
Or, the operation itself could set the values bad:
$pdl->badflag(1);
$pdl->inplace->aa_to_int;
# ...
pp_def('aa_to_int',
Pars => 'i();[o] o()',
GenericTypes => ['B'],
Inplace => 1,
HandleBad => 1,
Code => <<'EOF',
switch($i()) {
case 'A': $o() = 0; break;
case 'T': $o() = 1; break;
case 'C': $o() = 2; break;
case 'G': $o() = 3; break;
default: PDL_IF_BAD($SETBAD(o()),$o() = 255); break;
}
EOF
Doc => "=for ref\n\nConvert amino acid names to integers.\n",
);
To turn this into a fully-fledged PDL module, use pptemplate:
$ pptemplate PDL::MyModule
Put your pp_def
code into mymodule.pd
(or, in the new scheme as of
PDL 2.096, lib/PDL/MyModule.pd
), make some tests, write documentation,
and publish to the world on CPAN!
Further resources
Be sure to check the documentation for
PDL::PP, to see further
possibilities. You can supply additional keys to the pp_def
call to control whether to disable POSIX threading (which is
on by default, but some C libraries can't handle that). Or you
can automatically add lvalue processing, as with the slice
operation.
Prometheus Brings Fire to Mankind by Heinrich Füger, circa 1790
Ed J
Ed J (aka "mohawk" on IRC) has been using Perl for a long time. He ported the reference GraphQL implementation from the JavaScript version to Perl. He is currently release manager for PDL.