[EP-tech] Search filters: negation or searching for "either X or NULL"

John Salter J.Salter at leeds.ac.uk
Fri Feb 17 18:01:28 GMT 2023

CAUTION: This e-mail originated outside the University of Southampton.

Hi Dennis,
Simple-ish sounding question... complex answer (sorry!).
**There may be a 'not equal to' operator that I've overlooked. If so, hopefully someone will jump in!**

I think the OAI-PMH code prevents you from doing this in a simple way, as the 'filters' configuration doesn't allow this approach, as they are joined with an 'AND'.
I have previously added a volatile/automatic field to our repository to pre-calculate useful values to overcome this sort of thing, and allow use of the normal OAI 'filters' approach.

You can create a set with just the undef values like this:
    spec => "undef-test",
    name => "Undef Test",
    filters => [
        { meta_fields => [ "my_field" ], value=> undef, match=>'EX' }, #this results in a query like  'my_field IS NULL'.

If you're happy hacking about in cgi/oai2, this searches for an un-set field:
    fields => [
      $eprint_ds->field( 'my_field' )
    value => undef,
    match => "EX",

I think it is possible with some hacking about in cgi/oai2 to change the join method to an OR - but might not be a good idea (see: https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.eprints.org%2Feptech%2Fmsg07402.html&data=05%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C2d62f62cda4848a42ea408db1110ff3d%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C638122536912202525%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=HVeeKMJ1iPoCgfgOATJifheTH%2FeHxa3LR14B5oxUB24%3D&reserved=0 )

There is another possible route (it's even more horrible, hacky, but interesting at the same time).
It might not be the most sensible for OAI-PMH for performance reasons.

EPrints doesn't seem to ship with any 'NOT' type search conditions.
It does however ship with a 'regexp' search condition (EPrints::Search::Condition::Regexp) - which isn't very documented, and doesn't have an easy way to use it. Note: it uses MySQL RegExp - which might be Posix instead of PCRE (recent MariaDB might be PCRE..?).

It does seem to work :)

You can do something like this to get an EPrints::List:
my $ds = $session->dataset( "eprint" );
my @conds = ();
# my_field is null... (a null value wouldn't match the regex clause below)
push @conds, EPrints::Search::Condition->new( 'is_null', $ds, $ds->get_field( "my_field" ) );
# my_field does not start with a 'T'
push @conds, EPrints::Search::Condition->new( 'regexp', $ds, $ds->get_field( "my_field" ), '^[^T]' );

# OR the two conditions above together. Then AND the result with whether the datestamp is set.
my $cond = EPrints::Search::Condition->new(
        EPrints::Search::Condition->new( "OR", @conds ),
        EPrints::Search::Condition->new( 'is_not_null', $ds, $ds->get_field( "datestamp" ) )

my $ids = $cond->process(
        session => $session,
        dataset => $ds,

my $list = EPrints::List->new(
        session => $session,
        dataset => $ds,
        ids => $ids,

I feel really bad even suggesting the above - as it's a bit 'deep'.
I used the regex search condition for finding specific history items (action='note', details =~ /^Embargo alert/ sort of thing).

As this question has come up a few times, it might be worth a group EPrints chat about support for:
- complex search config in OAI-PMH (will need consideration about performance)
- inclusion of 'NOT' search operators (might need to include IFNULL logic)


-----Original Message-----
From: Dennis Müller [mailto:dennis.mueller at uni-mannheim.de]
Sent: 17 February 2023 13:16
To: eprints-tech at ecs.soton.ac.uk
Subject: Search filters: negation or searching for "either X or NULL"

Hi everyone,

I'm having a hard time creating the correct filter for a custom OAI set
and I was hoping for some help here.

The set is based on a field which can be either TRUE, FALSE or NULL. It
should contain only records where the value is not TRUE (alternatively
where it's either FALSE or NULL).

I have not found out how to negate/invert a filter, so I could search
for "not TRUE". Neither have I managed to filter for "has value of FALSE
or has no value at all".

The snipped below excludes all NULLs, presumably because it treats
"NULL" as a string here.

   meta_fields => [ "my_field" ],
   value => "FALSE NULL",
   match => "IN",
   merge => "ANY"

AFAIK, multiple filters are always combined with logical AND. Is there a
way to change this to "AND NOT" or "OR"?

I'm sure this is possible somehow and I'll feel quite stupid once
someone points it out. :)

Many thanks in advance and best regards
Dennis Müller, B.A.

Universität Mannheim
Digitale Bibliotheksdienste | Schloss Schneckenhof West | 68131 Mannheim

Tel: +49 621 181-3023
- dennis.mueller at uni-mannheim.de (Persönlich)
- alma.ub at uni-mannheim.de (Alma, Primo, Systembibliothekarisches)
- support.ub at uni-mannheim.de (PC-Support)

Web: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.bib.uni-mannheim.de%2F&data=05%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7C2d62f62cda4848a42ea408db1110ff3d%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C638122536912202525%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=2hu%2BVBy%2FZiwMCVgjRJOxA%2BvhUBTYxDtCg62BNYNF8M0%3D&reserved=0

More information about the Eprints-tech mailing list