Tutorial for merge 2 site IDS data (Working method)

In this example we have two site ids,

In site id 25 we only have data from 1 march 2016 up to 14 September 2022 and in site id 2, we have data from 15 September 2022 until today.

The idea is to merge all the data in site id 2.

You must run this sql query on the database of your matomo installation, but remember to change the date range and ids of sites you want to merge first.

I would like to ask the Matomo team to implement this option.

delimiter //

CREATE PROCEDURE merge_sites(
    IN id_old INT, 
    IN id_new INT, 
    IN month_start VARCHAR(7), 
    IN month_end VARCHAR(7), 
    IN day_end INT
)
BEGIN
    DECLARE done BOOL DEFAULT FALSE;
    DECLARE tablename CHAR(255);
    DECLARE cur_tablename CURSOR FOR 
        SELECT TABLE_NAME 
        FROM INFORMATION_SCHEMA.TABLES 
        WHERE (
            TABLE_NAME >= CONCAT("matomo_archive_numeric_", REPLACE(month_start, '-', '_'))
            AND TABLE_NAME <= CONCAT("matomo_archive_numeric_", REPLACE(month_end, '-', '_'))
        ) OR (
            TABLE_NAME >= CONCAT("matomo_archive_blob_", REPLACE(month_start, '-', '_'))
            AND TABLE_NAME <= CONCAT("matomo_archive_blob_", REPLACE(month_end, '-', '_'))
        ) 
    ;
    
    DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
    DECLARE CONTINUE HANDLER FOR SQLEXCEPTION BEGIN END;

    OPEN cur_tablename;
    myloop: WHILE NOT done DO
        FETCH cur_tablename INTO tablename;

        IF NOT done THEN
            SET @sql = CONCAT(
                'UPDATE ', tablename, 
                ' SET idsite = ', id_new, 
                ' WHERE idsite = ', id_old, 
                ' AND date1 <= \'', DATE_FORMAT(
                    STR_TO_DATE(
                        CONCAT(YEAR(DATE_ADD(NOW(), INTERVAL -1 MONTH)), '-', MONTH(DATE_ADD(NOW(), INTERVAL -1 MONTH)), '-', day_end),
                        '%Y-%m-%d'
                    ),
                    '\''
                ),
                ';'
            );
        
            PREPARE stmt FROM @sql;
            EXECUTE stmt;
            DEALLOCATE PREPARE stmt;
        END IF;
    END WHILE;

    CLOSE cur_tablename;

    UPDATE matomo_site 
    SET ts_created = TIMESTAMP(CONCAT(month_start, '-01 00:00:00.000')) 
    WHERE idsite = id_new;
    
    UPDATE matomo_log_link_visit_action 
    SET idsite = id_new
    WHERE idsite = id_old;
    
    UPDATE matomo_log_visit 
    SET idsite = id_new
    WHERE idsite = id_old;
    
    UPDATE matomo_log_conversion 
    SET idsite = id_new
    WHERE idsite = id_old;
    
    UPDATE matomo_report 
    SET idsite = id_new
    WHERE idsite = id_old;
    
    UPDATE matomo_report_numeric 
    SET idsite = id_new
    WHERE idsite = id_old;
    
    UPDATE matomo_report_blob 
    SET idsite = id_new
    WHERE idsite = id_old;
    
    UPDATE matomo_sequence 
    SET idsite = id_new
    WHERE idsite = id_old;
    
    UPDATE matomo_session 
    SET idsite = id_new
    WHERE idsite = id_old;
    
    UPDATE matomo_site_setting 
    SET idsite = id_new
    WHERE idsite = id_old;
    
    UPDATE matomo_site_url 
    SET idsite = id_new
    WHERE idsite = id_old;
    
    UPDATE matomo_tracking_goal 
    SET idsite = id_new
    WHERE idsite = id_old;
    
    UPDATE matomo_user 
    SET idsite = id_new
    WHERE idsite = id_old;
    
    UPDATE matomo_user_dashboard 
    SET idsite = id_new
    WHERE idsite = id_old;
END;
//

DELIMITER ;

CALL merge_sites(
    25, -- source site id
    2, -- target site id
    '2016-03', -- start month
    '2022-09', -- end month
    14 -- day_end
);
2 Likes

With this query the properties were merged well, so the old ID could now be removed.

But before do that, the blob_archive and blob_numeric tables would still need to be changed in the database, as they still retain the old ID

For this we would have to make the same query for all years and months, except for the month of September 2022, since in our case it had half a month in one site id and another half a month in the other one.

This is the query for that year:

UPDATE matomo_archive_blob_2022_01 SET idsite = 2 WHERE idsite = 25;
UPDATE matomo_archive_blob_2022_02 SET idsite = 2 WHERE idsite = 25;
UPDATE matomo_archive_blob_2022_03 SET idsite = 2 WHERE idsite = 25;
UPDATE matomo_archive_blob_2022_04 SET idsite = 2 WHERE idsite = 25;
UPDATE matomo_archive_blob_2022_05 SET idsite = 2 WHERE idsite = 25;
UPDATE matomo_archive_blob_2022_06 SET idsite = 2 WHERE idsite = 25;
UPDATE matomo_archive_blob_2022_07 SET idsite = 2 WHERE idsite = 25;
UPDATE matomo_archive_blob_2022_08 SET idsite = 2 WHERE idsite = 25;
UPDATE matomo_archive_blob_2022_09 SET idsite = 2 WHERE idsite = 25 AND date1 <= ‘2022-09-14’;
UPDATE matomo_archive_numeric_2022_01 SET idsite = 2 WHERE idsite = 25;
UPDATE matomo_archive_numeric_2022_02 SET idsite = 2 WHERE idsite = 25;
UPDATE matomo_archive_numeric_2022_03 SET idsite = 2 WHERE idsite = 25;
UPDATE matomo_archive_numeric_2022_04 SET idsite = 2 WHERE idsite = 25;
UPDATE matomo_archive_numeric_2022_05 SET idsite = 2 WHERE idsite = 25;
UPDATE matomo_archive_numeric_2022_06 SET idsite = 2 WHERE idsite = 25;
UPDATE matomo_archive_numeric_2022_07 SET idsite = 2 WHERE idsite = 25;
UPDATE matomo_archive_numeric_2022_08 SET idsite = 2 WHERE idsite = 25;
UPDATE matomo_archive_numeric_2022_09 SET idsite = 2 WHERE idsite = 25 AND date1 <= ‘2022-09-14’;

If you find a better way to achieve this, I would appreciate your comments.

Hi @Iago_de_la_Vega1

There is already such a feature request:

Add some comment in this ticket (and/or try to create a PR to solve this :wink: )

I am currently working on a Matomo plugin that would execute the query you provided. After running some tests, I discovered that the SQL query is not perfect for merging subsequent periods of the two sites. Specifically, the statistics from the source site are not being properly transferred to the target site.

But whene I use the procedure in this procedure it work better but with some issue bellow in the screenshots :

USE matomo;

drop procedure if exists merge_sites;
create procedure merge_sites(
    IN  id_old INT,
    IN  id_new INT,
    IN  month_start VARCHAR(7),
    IN  month_end VARCHAR(7)
)
begin
    DECLARE done BOOL default false;
    DECLARE tablename CHAR(255);

    DECLARE cur1 cursor for
        SELECT TABLE_NAME
        FROM INFORMATION_SCHEMA.TABLES
        WHERE (
                    TABLE_NAME >= CONCAT("matomo_archive_numeric_", REPLACE( month_start, '-', '_' ) )
                AND TABLE_NAME <= CONCAT("matomo_archive_numeric_", REPLACE( month_end, '-', '_' ) )
            ) OR (
                    TABLE_NAME >= CONCAT("matomo_archive_blob_", REPLACE( month_start, '-', '_' ) )
                AND TABLE_NAME <= CONCAT("matomo_archive_blob_", REPLACE( month_end, '-', '_' ) )
            );

    DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;

    open cur1;

    myloop: loop
        fetch cur1 into tablename;
        if done then
            leave myloop;
        end if;
        -- SELECT tablename;

        set @sql = CONCAT('UPDATE ', tablename, ' SET idsite=', id_new, ' WHERE idsite=', id_old );
        -- SELECT @sql;
        prepare stmt from @sql;
        execute stmt;
        drop prepare stmt;

    end loop;

    close cur1;

    UPDATE matomo_site
    SET ts_created=TIMESTAMP( CONCAT( month_start, '-01 00:00:00.000' ) )
    WHERE idsite = id_new ;

    END;

Could you please take a look at the query and suggest any modifications that may be necessary to achieve a complete and accurate merge of the two sites’ statistics?
Here some screen shots of Site source, target Site and the merged sites into source site,

take a look to the 25-31 january periode for each sites :

Hey ben.

Have you been able to solve your issue with the visits not being visible in the new site when just changing idsite to matomo_archive_{numeric, blob* tables) ?

I could run the procedure, but no change appears on the graphs…

I have come up with a work-around to ensure consistent merging of sites. In order to achieve this, I adjust the start and end dates of existing statistics in consecutive sites. Let’s consider an example:

If we have statistics for site1 starting on 2023-01-01 and ending on 2023-03-25, and site2 starting on 2023-03-22 and ending on 2023-05-30, since these are the same sites, we can ignore and remove the statistics for site2 from 2023-03-22 to 2023-05-25.

In my previous message, this use case ideally shouldn’t occur because site1 and site2 are the same site. However, in my specific case, this situation did happen for the some reasons.

To address this, I have integrated the procedure into a Matomo Plugin, making it accessible as a Matomo console command.

Please note that the above explanation is a rough draft, and it could benefit from further refinement based on your specific requirements and the context of the forum thread. Let me know if there are any specific points you would like me to address or if you have any additional information to provide for further enhancement.

1 Like

Hi @ben_sghaier_samir
Please note that your script / plugin work only on archived data, not on raw data.
this is very good in case you don’t have raw data anymore for these periods.
But in case you still have access to raw data, it is better to merge raw data first, then use the standard Matomo archiving process to build the archives (and then the reports).

Thank you, @heurteph-ei, for providing this detailed information. Do you have any ready-made SQL script or any ideas on how to merge RAW data from different site IDs? Is this approach valuable?

My motivation for creating a plugin to merge sites by their ID stems from the fact that when I use the Google Analytics importer module, it can create different sites each time I rerun the import command for the same property but with different date ranges or views. This plugin is a crucial part of our process to retrieve all historical Google Analytics data from GA3 in preparation for migrating our sites to the new GA4.

Hi @ben_sghaier_samir
About raw data, there is a table description here:
https://developer.matomo.org/guides/database-schema
Most of the time, if you need to copy some lines, you just have to update the siteid…

In this case, your unique choice will be to copy reports, as the Google import module doesn’t copy raw data, but aggregated ones… :frowning: