[EP-tech] Search filters: negation or searching for "either X or NULL"
John Salter
J.Salter at leeds.ac.uk
Fri Feb 17 18:01:28 GMT 2023
CAUTION: This e-mail originated outside the University of Southampton.
Hi Dennis,
Simple-ish sounding question... complex answer (sorry!).
**There may be a 'not equal to' operator that I've overlooked. If so, hopefully someone will jump in!**
I think the OAI-PMH code prevents you from doing this in a simple way, as the 'filters' configuration doesn't allow this approach, as they are joined with an 'AND'.
I have previously added a volatile/automatic field to our repository to pre-calculate useful values to overcome this sort of thing, and allow use of the normal OAI 'filters' approach.
You can create a set with just the undef values like this:
{
spec => "undef-test",
name => "Undef Test",
filters => [
{ meta_fields => [ "my_field" ], value=> undef, match=>'EX' }, #this results in a query like 'my_field IS NULL'.
],
}
If you're happy hacking about in cgi/oai2, this searches for an un-set field:
$searchexp->add_field(
fields => [
$eprint_ds->field( 'my_field' )
],
value => undef,
match => "EX",
);
I think it is possible with some hacking about in cgi/oai2 to change the join method to an OR - but might not be a good idea (see: https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.eprints.org%2Feptech%2Fmsg07402.html&data=05%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C2d62f62cda4848a42ea408db1110ff3d%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C638122536912202525%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=HVeeKMJ1iPoCgfgOATJifheTH%2FeHxa3LR14B5oxUB24%3D&reserved=0 )
There is another possible route (it's even more horrible, hacky, but interesting at the same time).
It might not be the most sensible for OAI-PMH for performance reasons.
EPrints doesn't seem to ship with any 'NOT' type search conditions.
It does however ship with a 'regexp' search condition (EPrints::Search::Condition::Regexp) - which isn't very documented, and doesn't have an easy way to use it. Note: it uses MySQL RegExp - which might be Posix instead of PCRE (recent MariaDB might be PCRE..?).
It does seem to work :)
You can do something like this to get an EPrints::List:
my $ds = $session->dataset( "eprint" );
my @conds = ();
# my_field is null... (a null value wouldn't match the regex clause below)
push @conds, EPrints::Search::Condition->new( 'is_null', $ds, $ds->get_field( "my_field" ) );
# my_field does not start with a 'T'
push @conds, EPrints::Search::Condition->new( 'regexp', $ds, $ds->get_field( "my_field" ), '^[^T]' );
# OR the two conditions above together. Then AND the result with whether the datestamp is set.
my $cond = EPrints::Search::Condition->new(
"AND",
EPrints::Search::Condition->new( "OR", @conds ),
EPrints::Search::Condition->new( 'is_not_null', $ds, $ds->get_field( "datestamp" ) )
);
my $ids = $cond->process(
session => $session,
dataset => $ds,
);
my $list = EPrints::List->new(
session => $session,
dataset => $ds,
ids => $ids,
);
I feel really bad even suggesting the above - as it's a bit 'deep'.
I used the regex search condition for finding specific history items (action='note', details =~ /^Embargo alert/ sort of thing).
As this question has come up a few times, it might be worth a group EPrints chat about support for:
- complex search config in OAI-PMH (will need consideration about performance)
- inclusion of 'NOT' search operators (might need to include IFNULL logic)
Cheers,
John
-----Original Message-----
From: Dennis Müller [mailto:dennis.mueller at uni-mannheim.de]
Sent: 17 February 2023 13:16
To: eprints-tech at ecs.soton.ac.uk
Subject: Search filters: negation or searching for "either X or NULL"
Hi everyone,
I'm having a hard time creating the correct filter for a custom OAI set
and I was hoping for some help here.
The set is based on a field which can be either TRUE, FALSE or NULL. It
should contain only records where the value is not TRUE (alternatively
where it's either FALSE or NULL).
I have not found out how to negate/invert a filter, so I could search
for "not TRUE". Neither have I managed to filter for "has value of FALSE
or has no value at all".
The snipped below excludes all NULLs, presumably because it treats
"NULL" as a string here.
{
meta_fields => [ "my_field" ],
value => "FALSE NULL",
match => "IN",
merge => "ANY"
}
AFAIK, multiple filters are always combined with logical AND. Is there a
way to change this to "AND NOT" or "OR"?
I'm sure this is possible somehow and I'll feel quite stupid once
someone points it out. :)
Many thanks in advance and best regards
Dennis
--
Dennis Müller, B.A.
Universität Mannheim
Universitätsbibliothek
Digitale Bibliotheksdienste | Schloss Schneckenhof West | 68131 Mannheim
Tel: +49 621 181-3023
E-Mail:
- dennis.mueller at uni-mannheim.de (Persönlich)
- alma.ub at uni-mannheim.de (Alma, Primo, Systembibliothekarisches)
- support.ub at uni-mannheim.de (PC-Support)
Web: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.bib.uni-mannheim.de%2F&data=05%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C2d62f62cda4848a42ea408db1110ff3d%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C638122536912202525%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=2hu%2BVBy%2FZiwMCVgjRJOxA%2BvhUBTYxDtCg62BNYNF8M0%3D&reserved=0
More information about the Eprints-tech
mailing list