[EP-tech] Re: Simple search - length of query

John Salter J.Salter at leeds.ac.uk
Wed Mar 5 12:55:06 GMT 2014


Hi Jan/Seb,
It looks like my indexes are the same as yours (see below).
Which version of MySQL are you running? What is your ' optimizer_search_depth' set to?

Seb,
When I took a snapshot of the process', the query was in the 'statistics' stage - for a long time - so I'll see if tweaking the optimizer_search_depth helps!
Longer term we'll be switching to Xapian :o)

Cheers,
John


mysql> SHOW INDEXES FROM eprint__rindex\G
*************************** 1. row ***************************
       Table: eprint__rindex
  Non_unique: 0
    Key_name: PRIMARY
Seq_in_index: 1
 Column_name: field
   Collation: A
 Cardinality: 49
    Sub_part: NULL
      Packed: NULL
        Null:
  Index_type: BTREE
     Comment:
*************************** 2. row ***************************
       Table: eprint__rindex
  Non_unique: 0
    Key_name: PRIMARY
Seq_in_index: 2
 Column_name: word
   Collation: A
 Cardinality: 1238267
    Sub_part: NULL
      Packed: NULL
        Null:
  Index_type: BTREE
     Comment:
*************************** 3. row ***************************
       Table: eprint__rindex
  Non_unique: 0
    Key_name: PRIMARY
Seq_in_index: 3
 Column_name: eprintid
   Collation: A
 Cardinality: 14859215
    Sub_part: NULL
      Packed: NULL
        Null:
  Index_type: BTREE
     Comment:
*************************** 4. row ***************************
       Table: eprint__rindex
  Non_unique: 1
    Key_name: eprint__rindex_eprintid_2
Seq_in_index: 1
 Column_name: eprintid
   Collation: A
 Cardinality: 15994
    Sub_part: NULL
      Packed: NULL
        Null:
  Index_type: BTREE
     Comment:
*************************** 5. row ***************************
       Table: eprint__rindex
  Non_unique: 1
    Key_name: eprint__rindex_eprintid_2
Seq_in_index: 2
 Column_name: field
   Collation: A
 Cardinality: 154783
    Sub_part: NULL
      Packed: NULL
        Null:
  Index_type: BTREE
     Comment:
5 rows in set (0.00 sec)

-----Original Message-----
From: eprints-tech-bounces at ecs.soton.ac.uk [mailto:eprints-tech-bounces at ecs.soton.ac.uk] On Behalf Of Jan Ploski
Sent: 05 March 2014 12:20
To: eprints-tech at ecs.soton.ac.uk
Cc: John Salter
Subject: [EP-tech] Re: Simple search - length of query

Could it be that you're missing some essential indexes on eprint__rindex?

Your example query is very hard on my 3.0.5 database while it returns
quickly on 3.3.12. The difference is that the eprint__rindex primary key
in 3.3.12 consists of (field, word, eprintid) and there is also a
non-unique index on (eprintid, field). In the 3.0.5 database I only have
a non-unique index on eprintid.

John Salter wrote:
> Hi,
> Yesterday we had a bit of an issue with our repository when someone pasted a full citation into the simple search box.
> This produced an impressive SQL query that locked things up and made users unhappy...
> 
> Is there a way to sanitise what a 'simple' search might try to handle? e.g. would restricting it to a certain number of words be acceptable?
> Would the Xapian search handle a request like the one below any better?
> 
> Details below/attached if you're interested!
> Cheers,
> John
> 
> GET /cgi/search/simple?full=%E2%80%98Families%2C+Domesticity+and+Intimacy%3A+Changing+Relationships+in+Changi
> ng+Times%E2%80%99%2C+in+Richardson%2C+D%2C+and+Robinson%2C+V.+%28eds%29+Introducing+Women%27s+Studies%2C+third+edition.+Basingstoke%3A+Palgrave%2C+2008+pp.+1
> 25-143.+&_action_search=Search&_order=bytitle&basic_srchtype=ALL&_satisfyall=ALL
> 
> searchexp created in cache table:
> 0|1|-date/creators_name/title|archive|-|full:abstract/creators_name/date/documents/title:ALL:IN:▒Families, Domesticity and Intimacy%3A Changing Relationships in Changing Times▒, in Richardson, D, and Robinson, V. (eds) Introducing Women's Studies, third edition. Basingstoke%3A Palgrave, 2008 pp. 125-143. |-|eprint_status:eprint_status:ANY:EQ:archive|metadata_visibility:metadata_visibility:ANY:EQ:show
> 
> The SQL generated by search is attached (get ready for this - it's a thing of beauty ;o) - you can see why it took a while to run!
> 
> 
> 
> 
> *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
> *** Archive: http://www.eprints.org/tech.php/
> *** EPrints community wiki: http://wiki.eprints.org/
> *** EPrints developers Forum: http://forum.eprints.org/
*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/
*** EPrints developers Forum: http://forum.eprints.org/



More information about the Eprints-tech mailing list