Merging two Piwik Installations

Hi All,

we have two Piwik (2.13.1) installtions.
One with old data (OLD), the other one is productive (PROD).
Both installations have the same configuration of websites, around 30 profiles.

Requestet target is to get all data in one installation, perfect solution for this task seems to be the Site Migration plugin.
Also a nice feature would be to lose the fewest data possible.
http://plugins.piwik.org/SiteMigration

so I installed Piwik a third time (TEMP).

I used the plugin to
import idsite=1 from OLD to TEMP, new data in TEMP got idsite=1 then
import idsite=1 from PROD to TEMP, new data in TEMP got idsite=2, then
updated idsite=2 in TEMP to idsite=1, and then run
console core:archive on TEMP


~/dev/piwik/old
$ ./console migration:site 1 -v --skip-archive-data -H 1.2.3.4 -U user -P password -N Db_TEMP --db-prefix=TEMP_
~/dev/piwik/prod
$ ./console migration:site 1 -v --skip-archive-data -H 1.2.3.4 -U user -P password -N Db_TEMP --db-prefix=TEMP_


UPDATE TEMP_log_visit SET idsite=1 WHERE idsite=2;
UPDATE TEMP_log_link_visit_action SET idsite=1 WHERE idsite=2;

currently PROD is still tracking data for idsite=1

now it’s time to change the tracking code on the website.
or do a redirect in piwik.php.
if it’s idsite=1 reditect to www.piwik.TEMP/piwik.php?idsite=1,
other profiles use www.piwik.PROD/piwik.php?idsite=1
so we lose only the tracked data during the import.

PROD is tracking, all profiles idsite=2 to idsite=30
TEMP is tracking, only idsite=1

then import the missing profiles, after each import add the redirect for the imported profile.
so the TEMP installation becomes more and more the NEWPROD installtion.
after importing all profiles, delete the redirects in piwik.php and change the config.ini from the PROD to use the database from TEMP.
voilà a new installtion with all the historic data!
Thank you Piwik.PRO for this great and helpful plugin!!

big question is now: is it possible to import and track websites at the same time?

Not an answer to your question but I have a question for the process you have followed.

Can you please tell me how did you merge data from different idsite?

as you have told

updated idsite=2 in TEMP to idsite=1, and then run
console core:archive on TEMP (I did not do this step)

I updated the idsite in log_visit and log_link_visit_action even though I see same site on different idsite.

My issue:
I have 2 servers running Piwik and tracking 8 sites. Server 1 has new data (After Oct 13th) and Server 2 has old data (Before Oct 13th).
After SiteMigration plugin, piwik saved old data (from server 2) under siteID 16, where in my server 1 the same website is being tracked under siteID 2, I am trying to merge both data.

Hello

Merging two websites into one is not yet support I believe. Please see feature request: combine site 1 and site2 data? · Issue #10 · PiwikPRO/plugin-SiteMigration · GitHub

Merging to installations is possible!

sorry for my late reply. it’s no easy task and is very time consuming, it’s not implemented but possible :wink:

essentials:

  • every piwik installation must use the same version
  • every piwik installtion has the Site Migration plugin installed
    additional info:
  • the ID of an imported (migrated) site is the next free ID from the table “site”
  • site ID’s are only used in the tables “site”, “log_visit” and “log_link_visit_action”
  • disable browser triggers for Piwik archiving

so you start with 3 installations (in my case, they are all on one webserver):


D:\INETPUB
\---piwik_2
    +---piwik_1
    \---piwik_3

PIWIK_1
5 Profiles (id1, id2, id3, id4, id5)
old data
Url: https://piwik.example.com/piwik_1
Db: piwik_1
Db Prefix: piwik_1

PIWIK_2
5 Profiles (id1, id2, id3, id4, id5),
aktive, production data
Url: https://piwik.example.com/
Db: piwik_2
Db Prefix: piwik_2

PIWIK_3
0 Profiles
should become the new production environment
Url: https://piwik.example.com/piwik_3
Db: piwik_3
Db Prefix: piwik_3

tracking code on all Websites actual points to https://piwik.example.com/piwik.js

step by step description:

1.) migrate idsite=1 from PIWIK_1 to PIWIK_3
navigate on the commandline to d:\inetpub\piwik_2\piwik_1
start here site migration to piwik 3, command should look similar:


d:\inetpub\piwik_2\piwik_1>php.exe .\console migration:site 1 --skip-archive-data -H 127.127.127.127 -U root -P pass -N piwik_3 --db-prefix=piwik_3_

now there is a new entry in Db “piwik_3.piwik_3_site” with ID=1, like in the PIWIK_1 Environment.
autoincrement of “idsite” in “piwik_3.piwik_3_site” is set to 2, any new site migrated to PIWIK_3 gets the ID=2!

2.) on the Website of Profile 1 change the JS Tracking Code
Change that all future requests go to PIWIK_3 (beware of proxies and caching of JS code!!)

example old tracking code:


var _paq = _paq || [];
  _paq.push(["setDocumentTitle", document.domain + "/" + document.title]);
  _paq.push(['setCustomUrl', location.href.toLowerCase()]);
  _paq.push(['trackPageView']);
  _paq.push(['enableLinkTracking']);
  (function() {
    var u="//piwik.example.com/";
    _paq.push(['setTrackerUrl', u+'piwik.php']);
    _paq.push(['setSiteId', 1]);
    var d=document, g=d.createElement('script'), s=d.getElementsByTagName('script')[0];
    g.type='text/javascript'; g.async=true; g.defer=true; g.src=u+'piwik.js'; s.parentNode.insertBefore(g,s);
  })();

new tracking code:


var _paq = _paq || [];
  _paq.push(["setDocumentTitle", document.domain + "/" + document.title]);
  _paq.push(['setCustomUrl', location.href.toLowerCase()]);
  _paq.push(['trackPageView']);
  _paq.push(['enableLinkTracking']);
  (function() {
    var u="//piwik.example.com/piwik_3/";
    _paq.push(['setTrackerUrl', u+'piwik.php']);
    _paq.push(['setSiteId', 1]);
    var d=document, g=d.createElement('script'), s=d.getElementsByTagName('script')[0];
    g.type='text/javascript'; g.async=true; g.defer=true; g.src=u+'piwik.js'; s.parentNode.insertBefore(g,s);
  })();

3.) migrate missing Data from PIWIK_2 to PIWIK_3
if there are no new requests in the PIWIK_2 Db, it’s time to migrate the missing Data.
navigate on the commandline to d:\inetpub\piwik_2\ and start the site Migration, you import now PIWIK_2 ID=1 and the command looks nearly like the command in step one.
please note, the only difference is the working path, destination is the same.


d:\inetpub\piwik_2\>php.exe .\console migration:site 1 --skip-archive-data -H 127.127.127.127 -U root -P pass -N piwik_3 --db-prefix=piwik_3_

now there are two entries in the table site:


mysql> SELECT idsite, name, main_url, ts_created FROM piwik_3.piwik_3_site;
+--------+-------------------------+-------------------------------+---------------------+
| idsite | name                    | main_url                      | ts_created          |
+--------+-------------------------+-------------------------------+---------------------+
|      1 | Example www.example.com | http://www.example.com        | 2013-03-14 16:20:00 |
|      2 | Example www.example.com | http://www.example.com        | 2014-10-30 13:33:37 |
+--------+-------------------------+-------------------------------+---------------------+
2 rows in set (0.00 sec)

4.) merge the profiles
update “idsite” in “piwik_3.piwik_3_log_visit” and “piwik_3.piwik_3_log_link_visit_action”


UPDATE piwik_3.piwik_3_log_visit SET idsite=1 WHERE idsite=2;
UPDATE piwik_3.piwik_3_log_link_visit_action SET idsite=1 WHERE idsite=2;

5.) delete redundant profiles, update autoincrement and run core:archive


DELETE FROM piwik_3.piwik_3_site WHERE  idsite=2;
ALTER TABLE piwik_3.piwik_3_site AUTO_INCREMENT=2;


d:\inetpub\piwik_3\>php.exe .\console core:archive --url=https://piwik.example.com/piwik_3/ --force-idsites=1 

voila. all data in one profile!

PIWIK_1
5 Profiles (id1, id2, id3, id4, id5)
old data

PIWIK_2
5 Profiles (id1, id2, id3, id4, id5),
aktive, production data for id2-id5

PIWIK_3
1 Profile (id1)
aktice only for id1

so back to step one. and proceed with id=2 from PIWIK_1
migrate PIWIK_1(ID=2) to PIWIK_3(ID=2)
change JS Code
migrate PIWIK_2(ID=2) to PIWIK_3(ID=3)
merge PIWIK_3(ID=3) with PIWIK_3(ID=2)
clean DELETE PIWIK_3(ID=3)
repeat.

if every profile is migrated, change the Piwik config.ini of the PIWIK_2 Website to use the PIWIK_3 database.
finaly change the JS tracking code.

hints:
if you have problems with the php “memory_limit”, try to start the script with “php.exe -d Memory_limit=-1” or increase the script execution time
if the processing takes to long, try to split the import with the “–date-from” and “–date-to” Option. or start core:archive with --force-date-range.
be familiar with the options of “core:archive” and “migration:site”

D:\inetpub\piwik>c:\php\php .\console migration:site -h
Usage:
 migration:site [--skip-archive-data] [--skip-log-data] [-H|--db-host="..."] [-U|--db-username="..."] [-P|--db-password="..."] [-N|--db-name="..."] [--db-prefix[="..."]] [--db-port="..."] [-F|--date-from="..."] [-T|--date-to="..."] idSite

Arguments:
 idSite                Site id

Options:
 --skip-archive-data   Skip migration of archived data
 --skip-log-data       Skip migration of log data
 --db-host (-H)        Destination database host
 --db-username (-U)    Destination database username
 --db-password (-P)    Destination database password
 --db-name (-N)        Destination database name
 --db-prefix           Destination database table prefix
 --db-port             Destination database port (default: "3306")
 --date-from (-F)      Start date from which data should be migrated
 --date-to (-T)        Start date from which data should be migrated
 --help (-h)           Display this help message
 --quiet (-q)          Do not output any message
 --verbose (-v|vv|vvv) Increase the verbosity of messages: 1 for normal output, 2 for more verbose output and 3 for debug
 --version (-V)        Display this application version
 --ansi                Force ANSI output
 --no-ansi             Disable ANSI output
 --no-interaction (-n) Do not ask any interactive question
 --piwik-domain        Piwik URL (protocol and domain) eg. "http://piwik.example.org"
 --xhprof              Enable profiling with XHProf


D:\inetpub\piwik>c:\php\php .\console core:archive -h
Usage:
 core:archive [--url="..."] [--force-all-websites] [--force-all-periods[="..."]] [--force-timeout-for-periods[="..."]] [--skip-idsites[="..."]] [--force-idsites[="..."]] [--force-periods[="..."]] [--force-date-last-n="..."] [--force-date-range[="..."]] [--concurrent-requests-per-website[="..."]] [--disable-scheduled-tasks] [--accept-invalid-ssl-certificate]

Options:
 --url                              Mandatory option as an alternative to '--piwik-domain'. Must be set to the Piwik base URL.
                                    For example: --url=http://analytics.example.org/ or --url=https://example.org/piwik/
 --force-all-websites               If specified, the script will trigger archiving on all websites.
                                    Use with --force-all-periods=[seconds] to also process those websites that had visits in the last [secon
ds] seconds.
                                    Launching several processes with this option will make them share the list of sites to process.
 --force-all-periods                Limits archiving to websites with some traffic in the last [seconds] seconds.
                                    For example --force-all-periods=86400 will archive websites that had visits in the last 24 hours.
                                    If [seconds] is not specified, all websites with visits in the last 604800 seconds (7 days) will be archived.
 --force-timeout-for-periods        The current week/ current month/ current year will be processed at most every [seconds].
                                    If not specified, defaults to 3600.
 --skip-idsites                     If specified, archiving will be skipped for these websites (in case these website ids would have been archived).
 --force-idsites                    If specified, archiving will be processed only for these Sites Ids (comma separated)
 --force-periods                    If specified, archiving will be processed only for these Periods (comma separated eg. day,week,month)
 --force-date-last-n                This script calls the API with period=lastN. You can force the N in lastN by specifying this value.
 --force-date-range                 If specified, archiving will be processed only for periods included in this date range. Format: YYYY-MM-DD,YYYY-MM-DD
 --concurrent-requests-per-website  When processing a website and its segments, number of requests to process in parallel (default: 3)
 --disable-scheduled-tasks          Skips executing Scheduled tasks (sending scheduled reports, db optimization, etc.).
 --accept-invalid-ssl-certificate   It is _NOT_ recommended to use this argument. Instead, you should use a valid SSL certificate!
                                    It can be useful if you specified --url=https://... or if you are using Piwik with force_ssl=1
 --help (-h)                        Display this help message
 --quiet (-q)                       Do not output any message
 --verbose (-v|vv|vvv)              Increase the verbosity of messages: 1 for normal output, 2 for more verbose output and 3 for debug
 --version (-V)                     Display this application version
 --ansi                             Force ANSI output
 --no-ansi                          Disable ANSI output
 --no-interaction (-n)              Do not ask any interactive question
 --piwik-domain                     Piwik URL (protocol and domain) eg. "http://piwik.example.org"
 --xhprof                           Enable profiling with XHProf

Help:
 * It is recommended to run the script with the option --url=[piwik-server-url] only. Other options are not required.
   Try --piwik-domain if --url does not work for you.
 * This script should be executed every hour via crontab, or as a daemon.
 * You can also run it via http:// by specifying the Super User &token_auth=XYZ as a parameter ('Web Cron'),
   but it is recommended to run it via command line/CLI instead.
 * If you have any suggestion about this script, please let the team know at feedback@piwik.org
 * Enjoy!

D:\inetpub\piwik>

WoW!

That is one hell of an answer.

Thank you bbkfp for the whole bunch of information.

I do have some questions,

A) Why are you not migrating Archived data? I do want that data.

B) With your step 5 ‘5.) delete redundant profiles, update autoincrement and run core:archive’, you deleting redundant data, exactly what type of data will be redundant even after changing the both of the idSite’s to ‘1’ (taking your example). Coz, I cannot dare to loose any data and to scroll through my whole dataset to find out which data will be redundant is not an option as I have very huge collection :expressionless:

C) What exactly are you doing here? archiving all the data under idsite = 1 in new Piwik (piwik_3)?? can you shed some more light on this.
d:\inetpub\piwik_3>php.exe .\console core:archive --url=https://piwik.example.com/piwik_3/ --force-idsites=1

I am new to piwik and trying different things with this,

Thank you for your terrific answer.

I would really appreciate your help at this, http://forum.piwik.org/read.php?2,130323, I am stuck at this point for very long and unable to find a solution for it.

Regards,
Shivateja,
Eureka King Inc.,

Hi @matthieu,

I am told the site migration plugin is now deprecated and unsupported. Do you confirm this matches your expectations?

Thanks in advance!

Hi Patryk,

Can you please forward me to matt@piwik.org this information which was sent to you? (as well as a link to this forum post) ?

@matthieu this was me, since not all features are supported by SiteMigration currently (e.g. CustomDimension configuration).