Is there any way to disable SQL_CALC_FOUND_ROWS?

softforge · May 6, 2013

OK, this is a strange one. I have list which has been growing and growing and now it times out when trying to view it, this is the query:

Code:

SELECT SQL_CALC_FOUND_ROWS DISTINCT `fabrik_matches`.`id` AS `fabrik_matches___id` , `fabrik_matches`.`id` AS `fabrik_matches___id_raw` , `fabrik_matches`.`group` AS `fabrik_matches___group` , `fabrik_matches`.`group` AS `fabrik_matches___group_raw` , `fabrik_matches`.`mode` AS `fabrik_matches___mode_raw` , `fabrik_requests_1`.`camera` AS `fabrik_matches___mode` , `fabrik_matches`.`type` AS `fabrik_matches___type` , `fabrik_matches`.`type` AS `fabrik_matches___type_raw` , `fabrik_matches`.`request1` AS `fabrik_matches___request1_raw` , `fabrik_requests`.`id` AS `fabrik_matches___request1` , `fabrik_matches`.`user1` AS `fabrik_matches___user1_raw` , `jos_users`.`name` AS `fabrik_matches___user1` , `fabrik_matches`.`user2` AS `fabrik_matches___user2_raw` , `jos_users_0`.`name` AS `fabrik_matches___user2` , `fabrik_matches`.`date_time` AS `fabrik_matches___date_time` , `fabrik_matches`.`date_time` AS `fabrik_matches___date_time_raw` , `fabrik_matches`.`match` AS `fabrik_matches___match` , `fabrik_matches`.`match` AS `fabrik_matches___match_raw` , `fabrik_matches`.`frame` AS `fabrik_matches___frame` , `fabrik_matches`.`frame` AS `fabrik_matches___frame_raw` , `fabrik_matches`.`score0` AS `fabrik_matches___score0` , `fabrik_matches`.`score0` AS `fabrik_matches___score0_raw` , `fabrik_matches`.`score1` AS `fabrik_matches___score1` , `fabrik_matches`.`score1` AS `fabrik_matches___score1_raw` , `fabrik_matches`.`score2` AS `fabrik_matches___score2` , `fabrik_matches`.`score2` AS `fabrik_matches___score2_raw` , `fabrik_matches`.`score3` AS `fabrik_matches___score3` , `fabrik_matches`.`score3` AS `fabrik_matches___score3_raw` , `fabrik_matches`.`score4` AS `fabrik_matches___score4` , `fabrik_matches`.`score4` AS `fabrik_matches___score4_raw` , `fabrik_matches`.`score5` AS `fabrik_matches___score5` , `fabrik_matches`.`score5` AS `fabrik_matches___score5_raw` , `fabrik_matches`.`score6` AS `fabrik_matches___score6` , `fabrik_matches`.`score6` AS `fabrik_matches___score6_raw` , `fabrik_matches`.`score7` AS `fabrik_matches___score7` , `fabrik_matches`.`score7` AS `fabrik_matches___score7_raw` , `fabrik_matches`.`score8` AS `fabrik_matches___score8` , `fabrik_matches`.`score8` AS `fabrik_matches___score8_raw` , `fabrik_matches`.`score9` AS `fabrik_matches___score9` , `fabrik_matches`.`score9` AS `fabrik_matches___score9_raw` , `fabrik_matches`.`id` AS slug, `fabrik_matches`.`id` AS `__pk_val`
FROM `fabrik_matches`
LEFT JOIN `jos_users` AS `jos_users` ON `jos_users`.`id` = `fabrik_matches`.`user1`
LEFT JOIN `jos_users` AS `jos_users_0` ON `jos_users_0`.`id` = `fabrik_matches`.`user2`
LEFT JOIN `fabrik_requests` AS `fabrik_requests` ON `fabrik_requests`.`token` = `fabrik_matches`.`request1`
LEFT JOIN `fabrik_requests` AS `fabrik_requests_0` ON `fabrik_requests_0`.`token` = `fabrik_matches`.`request2`
LEFT JOIN `fabrik_requests` AS `fabrik_requests_1` ON `fabrik_requests_1`.`camera` = `fabrik_matches`.`mode`
LIMIT 0 , 100

I have spend some time debugging this and found that the problem is SQL_CALC_FOUND_ROWS. If I remove that and run it in phpMyAdmin, the query takes about 0.15 seconds to execute. If it leave it in, the query is currently taking about 45 mins to complete.

Admittedly it is not the most optimised query in the world but it looks like something is going seriously wrong with such a big difference. The following are the rough numbers of records in each table:

fabrik_matches - 400,000
fabrik_requests - 5,000
jos_users - 100

So the amount of data being processed is big enough, but not outrageous. However the joins seems to kill it when it has to count the them all before returning the first 100.

After some research I found this:

http://bugs.mysql.com/bug.php?id=18454

So I upgraded from MySQL 5.5 to 5.6 but it didn't help.

I have spent hours on this and I'm at a loss what to try next. I have tried optimising the query manually but nothing even makes a dent in the processing time except removing SQL_CALC_FOUND_ROWS.

To be honest, this list is a tool which users would use the filters to drill down to just 1 request so counting the total including the joins is not useful in this case, pagination is not even necessary.

So, back to the original question, is there any way to disable SQL_CALC_FOUND_ROWS? Or is there another solution anyone can think of? This is crippling as the matches table continues to grow but the managers currently have no way to view the data.

Many thanks for any help you can give.

softforge · May 7, 2013

Friendly bump

softforge · May 8, 2013

Friendly bump

softforge · May 9, 2013

Super friendly bump

cheesegrits · May 10, 2013

We're looking at one option we may have.

-- hugh

cheesegrits · May 10, 2013

OK, I don't want to commit this one to github just yet till we've tested it for a while. So you'll need to edit ./components/com_fabrik/models/list.php, and around line 2317 you should find this chunk of code:

PHP:

        // $$$rob added raw as an option to fix issue in saving calendar data
        if (trim($table->db_primary_key) != '' && (in_array($this->outPutFormat, array('raw', 'html', 'feed', 'pdf', 'phocapdf', 'csv'))))
        {
            $sfields .= ', ';
            $strPKey = $pk . ' AS ' . $db->quoteName('__pk_val') . "\n";
            $query = 'SELECT SQL_CALC_FOUND_ROWS DISTINCT ' . $sfields . $strPKey;
        }
        else
        {
            $query = 'SELECT SQL_CALC_FOUND_ROWS DISTINCT ' . trim($sfields, ", \n") . "\n";
        }

... which you'll need to replace with this chunk:

PHP:

        /**
        * Testing potential fix for FOUND_ROWS performance issue on large tables.  If merging,
        * we never do a SELECT FOUND_ROWS(), so no need to use SQL_CALC_FOUND_ROWS.
        */
        $calc_found_rows = $this->mergeJoinedData() ? '' : 'SQL_CALC_FOUND_ROWS';
        // $$$rob added raw as an option to fix issue in saving calendar data
        if (trim($table->db_primary_key) != '' && (in_array($this->outPutFormat, array('raw', 'html', 'feed', 'pdf', 'phocapdf', 'csv'))))
        {
            $sfields .= ', ';
            $strPKey = $pk . ' AS ' . $db->quoteName('__pk_val') . "\n";
            $query = 'SELECT ' . $calc_found_rows . ' DISTINCT ' . $sfields . $strPKey;
        }
        else
        {
            $query = 'SELECT ' . $calc_found_rows . ' DISTINCT ' . trim($sfields, ", \n") . "\n";
        }

Then go and set your "Join Mode" to "Merge rows", under the "Joins" section in the Lists's "Data" tab.

That should skip the SQL_CALC_FOUND_ROWS in the query, as we use a different method for counting the rows when merging vs not merging.

Let me know how it goes.

-- hugh

cheesegrits · May 11, 2013

BTW, if for whatever reason you can't use the 'merge rows' option, then I'm afraid you are sh*t out of luck. We can't avoid using SQL_CALC_FOUND_ROWS when not merging repeated joined data. It's a long and gory story, but is to do with pagination when you have joined data.

When merging joined data, all the repeated joined data goes in the "main" row, so there's one row per selected main row. Easy to paginate. But when not merging, there can be any number of repeats for each "main" row, one for each "many" join. So if you are trying to display (say) 10 rows per page of the list, knowing how many actual main rows that translates to is much tougher. Might just be one main table row, if it has 10 related repeat join rows. Or could be 10 rows, if none of the joins have more than one "repeat".

Which is where SQL_CALC_FOUND_ROWS DISTINCT comes in, to avoid doing two extra expensive queries to count the rows - one count with all the joins added, and one without the joins. Which is what we used to do, but it slowed things down for 90% of normal usage.

What puzzles me is your observed difference between 0.15 seconds and 45 minutes, which just doesn't sound right to me. I've never heard anyone have this issue before, in the three years since we switched to using the SQL_CALC_FOUND_ROWS method. I've seen some which will jump from <1s to maybe 10s, but never anything as drastic as 45 minutes. Which leads me to believe there may be something else going on here.

-- hugh

cheesegrits · May 13, 2013

I've been running this patch for a few days and it seems to work fine, so I went ahead and committed it to github.

-- hugh

softforge · May 17, 2013

Thanks very much for this, it did indeed seem to work well!

However, I just uploaded a load more data and now I get a white screen because PHP is running out of memory! The exact error is as follows:

Code:

PHP Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 72 bytes) in libraries/joomla/database/database/mysqli.php on line 498

I have tried upping the memory to 1024M (normally it runs at 128M) but it still runs out of memory!!!

So, I have investigated further and this is the query that kills it:

Code:

SELECT DISTINCT `fabrik_matches`.`id` AS __pk_val0
FROM `fabrik_matches`
LEFT JOIN `jos_users` AS `jos_users` ON `jos_users`.`id` = `fabrik_matches`.`user1`
LEFT JOIN `jos_users` AS `jos_users_0` ON `jos_users_0`.`id` = `fabrik_matches`.`user2`
LEFT JOIN `fabrik_requests` AS `fabrik_requests` ON `fabrik_requests`.`token` = `fabrik_matches`.`request1`
LEFT JOIN `fabrik_requests` AS `fabrik_requests_0` ON `fabrik_requests_0`.`token` = `fabrik_matches`.`request2`
LEFT JOIN `fabrik_requests` AS `fabrik_requests_1` ON `fabrik_requests_1`.`camera` = `fabrik_matches`.`mode`

But this returns over a million results so I can understand why PHP would run out of memory while processing them. Is this query simply to count the number or records for pagination?

When I run this query in phpMyAdmin, it limits the results to 30 but tells me the total almost instantly. So I figured there must be a more efficient way to get the count and produced a modified version that just returns the count:

Code:

SELECT COUNT(DISTINCT `fabrik_matches`.`id`) AS __pk_val0
FROM `fabrik_matches`
LEFT JOIN `jos_users` AS `jos_users` ON `jos_users`.`id` = `fabrik_matches`.`user1`
LEFT JOIN `jos_users` AS `jos_users_0` ON `jos_users_0`.`id` = `fabrik_matches`.`user2`
LEFT JOIN `fabrik_requests` AS `fabrik_requests` ON `fabrik_requests`.`token` = `fabrik_matches`.`request1`
LEFT JOIN `fabrik_requests` AS `fabrik_requests_0` ON `fabrik_requests_0`.`token` = `fabrik_matches`.`request2`
LEFT JOIN `fabrik_requests` AS `fabrik_requests_1` ON `fabrik_requests_1`.`camera` = `fabrik_matches`.`mode`

The problem with this is it takes about 3 minutes to complete. However, the next time it runs, it takes 0.0002 seconds to complete because the result it cached. But as soon as the data or query is changed, it takes 3 minutes again.

So how does phpMyAdmin know the total results of the query so quickly? It can't be caching because changing the data or query always returns the first 30 and the total in under 0.002 seconds!

Any ideas on how to solve this one? Thanks for all your help!

softforge · May 17, 2013

OK, so phpMyAdmin uses ROW_COUNT() to get the total which it then uses to work out the paging:

http://stackoverflow.com/questions/8432819/mysql-found-rows-not-working-from-phpmyadmin

Is ROW_COUNT() an option that could be used in this case (or more generally) to get round this problem?

Again, thanks your your help on this!

softforge · May 20, 2013

Friendly bump

softforge · May 21, 2013

Friendly bump

softforge · May 22, 2013

Friendly bump

softforge · May 23, 2013

Friendly bump

rob · May 24, 2013

Row count is only available if you use SQL_CALC_FOUND_ROWS - they go hand in hand.Its why we use SQL_CALC_FOUND_ROWS in the first place - to avoid doing two queries to get the data and get the total

Have you tried doing an EXPLAIN sql query in phpmyadmin - it should tell you what the main issues with the sql are.

Code:

EXPLAIN
SELECT COUNT(DISTINCT `fabrik_matches`.`id`) AS __pk_val0
FROM `fabrik_matches`
LEFT JOIN `jos_users` AS `jos_users` ON `jos_users`.`id` = `fabrik_matches`.`user1`
LEFT JOIN `jos_users` AS `jos_users_0` ON `jos_users_0`.`id` = `fabrik_matches`.`user2`
LEFT JOIN `fabrik_requests` AS `fabrik_requests` ON `fabrik_requests`.`token` = `fabrik_matches`.`request1`
LEFT JOIN `fabrik_requests` AS `fabrik_requests_0` ON `fabrik_requests_0`.`token` = `fabrik_matches`.`request2`
LEFT JOIN `fabrik_requests` AS `fabrik_requests_1` ON `fabrik_requests_1`.`camera` = `fabrik_matches`.`mode`

softforge · May 24, 2013

I had a look at EXPLAIN and the indexes are being used but running the query is still very slow.

I just reread my last post and I actually meant phpMyAdmin uses FOUND_ROWS() (rather then ROW_COUNT()) to get the total which does work without SQL_CALC_FOUND_ROWS. I have just written this quick script to prove it:

PHP:

$db =& JFactory::getDBO();
$sql = "SELECT DISTINCT `fabrik_matches`.`id` AS __pk_val0
FROM `fabrik_matches`
LEFT JOIN `jos_users` AS `jos_users` ON `jos_users`.`id` = `fabrik_matches`.`user1`
LEFT JOIN `jos_users` AS `jos_users_0` ON `jos_users_0`.`id` = `fabrik_matches`.`user2`
LEFT JOIN `fabrik_requests` AS `fabrik_requests` ON `fabrik_requests`.`token` = `fabrik_matches`.`request1`
LEFT JOIN `fabrik_requests` AS `fabrik_requests_0` ON `fabrik_requests_0`.`token` = `fabrik_matches`.`request2`
LEFT JOIN `fabrik_requests` AS `fabrik_requests_1` ON `fabrik_requests_1`.`camera` = `fabrik_matches`.`mode`";
 
$db->setQuery($sql);
$db->query();
$db->setQuery('SELECT FOUND_ROWS()');
$count = $db->loadResult();
 
echo $count;

This returns the total count almost instantly and is it not using a caches result as I changed the data and it runs just as fast.

This makes me wonder why PHP is running out of memory but I can only assume that is because currently PHP is doing the counting. If you use the FOUND_ROWS() technique above then it is MySQL that is doing the counting and simply returning the count to PHP.

Is this something that could be done in Fabrik (for the merge rows option)?

Many thanks!

cheesegrits · May 27, 2013

Simple answer, no.

It gets really hard to explain, but when doing row merging, we have to handle the row counting ourselves.

Say we need 10 rows, i.e. that's the list pagination size. When not merging, that's simple. The main SELECT includes all the joins, and that's the number of rows we need to select, so we can just use a normal LIMIT x,y.

When merging, we don't know how many of the SELECT'ed rows are repeats, which are going to get merged down to one row. So we have to manually boil it down.

-- hugh

softforge · May 28, 2013

OK, I understand.

I've had a hack and got my page working for now by changing this:

PHP:

$calc_found_rows = $this->mergeJoinedData() ? '' : 'SQL_CALC_FOUND_ROWS';

To this:

PHP:

$calc_found_rows = JRequest::getInt('Itemid') == 368 ? '' : 'SQL_CALC_FOUND_ROWS';

Where 368 is the id of my problem page. Paging doesn't work of course but I don't really need this at the moment as the filter are used to drill down the data.

If I found a way to get paging working without using SQL_CALC_FOUND_ROWS, would you consider adding a list parameter to use this alternative method?

Many thanks

cheesegrits · May 31, 2013

Sure. If you can cure the issue, and not break any existing usage. But that's the biggest problem, testing every conceivable combination of list joining and merging.

-- hugh

Is there any way to disable SQL_CALC_FOUND_ROWS?

Member

Member

Member

Member

Support Gopher

Support Gopher

Support Gopher

Support Gopher

Member

Member

Member

Member

Member

Member

Administrator

Member

Support Gopher

Member

Support Gopher

Useful links