Tracking Behaviors

claytondaley · October 25, 2014, 4:00pm

I’ve noticed two tracking behaviors that don’t strike me as intuitive:
[ol]
[li] It’s possible to change an IP address during a visit, but the database masks such a change because the (first) IP address is kept on the visit object and isn’t even available for update.
[/li][li] If you send a PHP tracking request with a bad token_auth, a visit is still logged with “erroneous” data.
[/li][/ol]
I believe this first item should be included under a broader category of “we should maintain a full and accurate accounting of submitted data”, consistent with my previous post about actual page referrers. We should also make some simplifying assumptions for display purposes, but the database should “know the difference”. Without digging into the codebase, I believe this would include:
[ul]
[li] Moving IP Address (and anything else that isn’t tightly coupled to the computer holding the cookie) to the visit_action
[/li][li] Changing the logic for referrer to default to the submitted referrer and fallback on the last page visited (with a flag distinguishing the two).
[/li][/ul]
The token_auth item (2) is more complicated.
[ul]
[li] Certainly, one way to view this (paraphrasing current behavior) is that the caller is making a tracking API call and all token_auth does is allow the caller to submit certain “extra” (different) details.
[/li][li] Practically speaking, I prefer to think of including token_auth as, first and foremost, an attempt to authenticate (that it is the tracking API is irrelevant). You can (and should) log a failed authentication (both real and reported details), but a failed authentication is fundamentally not a tracked page view.
[/li][li] A “compromise” (helpful for debugging) would be to keep the submitted data as-if the token was correct (not the current strategy) and instead flag the visit/actions as “bad auth”. Obviously, the “bad auth” report should include information about the submitter (for debug and blacklisting).
[/li][/ul]
In principle (but probably not in practice), an operating site would hide bad auth in all reports. When debugging, the bad auth flag is a good way to distinguish a wrong IP due to a proxy vs. a bad auth_token. Finally, the second and third strategies could permit an admin to “approve” old reports. In the current model, bad reports are is unmarked, inexorably linked to the server’s IP address, and the (real) user info is lost.