API Memory Use

So, I’m running a script to loop through the page titles list and get specific stats by using the segmentation API. I notice that, no matter how much I try to destroy/unset variables/garbage collect, memory usage increases with every API call until either I reach the PHP memory limit or crash the system for using too much memory.

Thinking that it was the API, I made a point to try (the undocumented and, as far as I can tell, ineffective) Piwik_FrontController::getInstance()->__destruct(); for ever Piwik_FrontController::getInstance()->init();, and unset the piwik variable, but that didn’t do anything. Is there any way I can force the API to release memory after each loop/use?

Thanks in advance.

Are you using PHP 5.4 ?

5.3.2, I think? I’m away from my computer so I can’t verify that at the moment but I’ll double-check when I can.

Make sure you use 5.4 at least for massive memory improvements.

So, updated to 5.4, and memory use went up much faster, until I hit this:

PHP Fatal error: Allowed memory size of 1073741824 bytes exhausted (tried to allocate 72 bytes) in /home//public_html/piwik/core/DataTable/Manager.php on line 120

By faster, you mean it reached the limit even earlier than before ?

How many page views do you get per day on the website?

It reached the limit earlier than before. I also have (as I’m debugging) it report with memory_get_peak_usage() as it loops, and add it to each row of the report I’m generating as an element attribute. The kicker is, the site doesn’t even get that much traffic, but the client is insistent about this format.

Basically, the client wanted a very specific format for a report that they would use in Excel. (I have posted about it in the past.) It would show
the total number of visits and unique visits for each page, and then the breakdown of that number across the top four visiting countries, and then the number that were direct access, the number for the top referring search engine, and then the numbers for the top three referring sites.

The site is set up to report the Page Title in hierarchy. i.e, Category :: Author :: Project :: Document. The Page Titles report returns this in hierarchical format (XML for now which is fed into a SimpleXMLElement, since the PHP data format wasn’t hierarchical, and I just realized I could convert JSON response into arrays, which I’ll experiment with next), and then the script loops through that hierarchy. It calls a function that assembles a Bulk API request for each page title, in hierarchy, getting the segments corresponding to each of the statistics I mentioned above. It makes the API request, returning all the statistics for that page. It saves those statistics to file, clears out all used variables, and then proceeds to the next page title. Each time it hits the API, the memory usage jumps quite a bit. I was almost tempted to assemble one monster bulk API request for every single page title and it’s statistics, in order, and add each statistic table cell by table cell to the DOM, but that seemed like a bit much to hit the server with in one go.

I understand that this is not an efficient way to get this, but I’m not really sure of another way to approach it without trying to hit the SQL directly (which I had an abortive effort at some time ago. It wasn’t pretty). They would only need to generate the report every month or so, and given the specificity of what they want, they’re perfectly accepting that the report would take quite a bit of time as it retrieved the stats piece by piece.