The Magento XML sitemap Splitter & Generator is a very useful Extension for both large and small-sized stores. This extension helps in creating and organizing sitemaps by splitting or generating search engine compliant sitemaps.
Since Magento does not currently offer this functionality, the XML sitemap splitter & generator extension has proved to be of great significance. However, to best utilize this, you need to take care of the integration process on your website/ web store and prevent bad usage of memory.
There are two aspects that we should take care of when we code:
A common understanding today is ‘Computers today are powerful, memory cheap, so we don’t need optimization’ (Microsoft developers probably used to think this when they developed Vista).
People relying on the exponential approach don’t realize that you can multiply the power of cpu with the number of times you want but the root square will still be a bad enemy to defeat.
(Internet security and cryptography is based on this idea so if you find a way to crumble it let me know )
In this article we have a bad example of memory usage that we found out while implementing the Magento XML Sitemap Extension. Here are the two files involved. If you don’t have magento installation you can click on the links to see the code)
/app/code/core/Mage/Sitemap/Model/Sitemap.php
/app/code/core/Mage/Sitemap/Model/Resource/Catalog/Product.php
The first file is not very interesting, it just calls a function in product.php (->getCollection($storeId);) and iterates on the returned array, we need it just to understand the scenario.
This is the part that we need:
/** * Generate products sitemap */ $changefreq = (string)Mage::getStoreConfig('sitemap/product/changefreq', $storeId); $priority = (string)Mage::getStoreConfig('sitemap/product/priority', $storeId); $collection = Mage::getResourceModel('sitemap/catalog_product')->getCollection($storeId); foreach ($collection as $item) { $xml = sprintf('<url><loc>%s</loc><lastmod>%s</lastmod><changefreq>%s</changefreq><priority>%.1f</priority></url>', htmlspecialchars($baseUrl . $item->getUrl()), $date, $changefreq, $priority ); $io->streamWrite($xml); } unset($collection);
What is interesting here is the last line of code unset($collection); by this line seems clear that the developer was aware about the memory consumption.
Let us see the second file and the relative issue ..
The insane approach
Product.php is a resource model that executes a custom query (this is good: reduce Mysql load) in the method getCollection($storeId); there is a part that is totally not optimized:
while ($row = $query->fetch()) { $product = $this->_prepareProduct($row); $products[$product->getId()] = $product; } return $products;
This code before give back $products array, it iterates the rows returned by the query and for everyone calls _prepareProduct($row);
/** * Prepare product * * @param array $productRow * @return Varien_Object */ protected function _prepareProduct(array $productRow) { $product = new Varien_Object(); $product->setId($productRow[$this->getIdFieldName()]); $productUrl = !empty($productRow['url']) ? $productRow['url'] : 'catalog/product/view/id/' . $product->getId(); $product->setUrl($productUrl); return $product; }
There are 2 performance issues here:
It does N iteration, just to prepare an array and then it will iterates again other N times ( in Sitemap.php ), so it reads all data 2 times.
It creates a big array with N obj, that is very memory consuming because we are speaking about total products (on a big store this can produce more than 100.000 obj)
The result is that if you launch sitemap generation on a medium/large store, you will get a nice black page and a ‘Allowed Memory Size Exhausted’ message.
This is good as I was not able to try/catch this error and then manage it. –
Now I will add some log to measure the performance and test it on one big store that we manage.
Just remember that Magento by default has access to 256mb of memory. (Look in default .htaccess )
We can note the total memory usage before the execution and after it.
something like this:
/* DEBUG MEMORY */ $baseMemory = memory_get_usage(); $totMem = (int) str_replace('M', '', ini_get('memory_limit')); Mage::log('------------------------------------------------------------------'); Mage::log('Total Php Memory ("memory_limit"):'.$totMem.'M'); Mage::log('FreeMem: ' . ( $totMem - ($baseMemory / 1024 / 1024))); $i = 1; /* DEBUG MEMORY */ while ($row = $query->fetch()) { $product = $this->_prepareProduct($row); $products[$product->getId()] = $product; /* DEBUG MEMORY */ if (($i % 10000) == 0) { $mem = (memory_get_usage() - $baseMemory); $media = $mem / $i; Mage::log($i . " - FreeMem: " . ( $totMem - ($mem / 1024 / 1024)) . ' --- ' . $media); } $i++; /* DEBUG MEMORY */ } $i % 10000 condition means every 10000 cycle (products/row).
Now we’ll carry out the execution of the script and will look in /var/log/system.log for result:
2012-05-11T20:44:00+00:00 DEBUG (7): ------------------------------------------------------------------ 2012-05-11T20:44:00+00:00 DEBUG (7): Total Php Memory ("memory_limit"):256M 2012-05-11T20:44:00+00:00 DEBUG (7): FreeMem: 229.54512023926 2012-05-11T20:44:01+00:00 DEBUG (7): 10000 - FreeMem: 234.45269012451 --- 2259.3992 2012-05-11T20:44:03+00:00 DEBUG (7): 20000 - FreeMem: 212.66645050049 --- 2271.926 2012-05-11T20:44:05+00:00 DEBUG (7): 30000 - FreeMem: 192.04777526855 --- 2235.2922666667 2012-05-11T20:44:06+00:00 DEBUG (7): 40000 - FreeMem: 169.10470581055 --- 2277.908 2012-05-11T20:44:08+00:00 DEBUG (7): 50000 - FreeMem: 148.45632171631 --- 2255.3544 2012-05-11T20:44:09+00:00 DEBUG (7): 60000 - FreeMem: 127.8698348999 --- 2239.2369333333 2012-05-11T20:44:11+00:00 DEBUG (7): 70000 - FreeMem: 102.72708892822 --- 2295.9756571429 2012-05-11T20:44:13+00:00 DEBUG (7): 80000 - FreeMem: 82.065711975098 --- 2279.7915 2012-05-11T20:44:15+00:00 DEBUG (7): 90000 - FreeMem: 61.429832458496 --- 2266.9067555556 2012-05-11T20:44:18+00:00 DEBUG (7): 100000 - FreeMem: 40.794189453125 --- 2256.59648
You can see at the beginning the we have 229Mb of free ram and after 100000 cycle it is about to finish: every 10000 iteration consume about 20mb and each obj about 2200 bytes
Yes Totally Insane (or just Digging the grave ) at least if your customer has a real shop 🙂
Possible Solution
One solution to avoid this is to pass the obj $query directly to the calling function and manage it there, without iterate 2 times on data and ,for sure, without create any obj for every row.
So getCollection($storeId) code will become something like this:
/** * Get category collection array * * @return array */ public function getCollection($storeId) { $products = array(); $store = Mage::app()->getStore($storeId); /* @var $store Mage_Core_Model_Store */ if (!$store) { return false; } $urCondions = array( 'e.entity_id=ur.product_id', 'ur.category_id IS NULL', $this->_getWriteAdapter()->quoteInto('ur.store_id=?', $store->getId()), $this->_getWriteAdapter()->quoteInto('ur.is_system=?', 1), ); $this->_select = $this->_getWriteAdapter()->select() ->from(array('e' => $this->getMainTable()), array($this->getIdFieldName())) ->join( array('w' => $this->getTable('catalog/product_website')), 'e.entity_id=w.product_id', array() ) ->where('w.website_id=?', $store->getWebsiteId()) ->joinLeft( array('ur' => $this->getTable('core/url_rewrite')), join(' AND ', $urCondions), array('url' => 'request_path') ) ; $this->_addFilter($storeId, 'visibility', Mage::getSingleton('catalog/product_visibility')->getVisibleInSiteIds(), 'in'); $this->_addFilter($storeId, 'status', Mage::getSingleton('catalog/product_status')->getVisibleStatusIds(), 'in'); $query = $this->_getWriteAdapter()->query($this->_select); return $query; }
and the related Product.php part will change in this way:
/** * Generate products sitemap */ $changefreq = (string) $productConf->getChangefreq(); $priority = (string) $productConf->getPriority(); $queryCollection = Mage::getResourceModel('sitemapEnhanced/catalog_product')->getCollection($this->getStoreId(), $catId, $filterOutOfStock, $filterInStock); while ($productRow = $queryCollection->fetch()) { $prodId = $productRow['entity_id']; $url = !empty($productRow['url']) ? $productRow['url'] : 'catalog/product/view/id/' . $prodId; $xml = sprintf('<url><loc>%s</loc><lastmod>%s</lastmod><changefreq>%s</changefreq><priority>%.1f</priority></url>', htmlspecialchars($this->_baseUrl . $url), $this->_date, $changefreq, $priority); $io->streamWrite($xml); } unset($collection);
In this way the memory used is constant and doesn’t get exhausted iteration by iteration: we don’t create and store anything so every cycle php internal engine can keep on cleaning the garbage.
Let us know what’s your way of making good use of storage.
I am a writer, reader, and a part time adventure and travel enthusiast. The other three things that vie for my mind share are dark chocolate, coffee, and photography. I am highly motivated by user perspectives and addressing the common human experience when I write.
Summary: Jiva’s efforts to empower smallholder farmers weren’t gaining the digital traction they deserved. With a strategic overhaul led by …
Continue reading “Jiva’s Organic Traffic Growth: 354% Surge in 6 Months | CueForGood”
At CueForGood (CFG), we’ve embraced a refreshing change: reusable cloth napkins. While the switch may seem minor, it’s rooted in …
Continue reading “What We Learned When We Switched From Disposable Tissues to Reusable Napkins”
On any given day, walking into the Cue For Good office feels like stepping into a space with heart. It’s …
Continue reading “Of Light, Laughter & Transformation: Diwali 2024 at Cue For Good”
At its peak in the early 2010s, PHP powered the majority of websites globally, including major platforms like Facebook and …
Continue reading “Why PHP Still Matters in 2024: A Look at Its Continued Relevance”
This year, Tapcart’s 2024 BFCM Consumer Trends Report suggests that nearly 60% of shoppers kick off their holiday shopping in …
Continue reading “How Meta’s New Holiday Ad Features Can Transform Your Business This Season”
On September 10th and 11th, 2024, I had the incredible experience at the “Louder Than Words” B Corp Festival, held …
Continue reading “Cue For Good’s Journey at the B Corp Festival 2024: Exploring Louder Than Words”