[mythtv-users] Commercial Flagging Idea - Distributed Commercial Flagging (long)

Thu Apr 14 19:33:06 UTC 2005

I'm starting this thread so I don't pollute the discussion of fingerprinting 
for commercial flagging (which is a brilliant, but separate idea).
 This thread is meant to discuss that idea of distributed commercial 
flagging (DCF) via existing algorithms to reduce load and increase accuracy.
 In brief, here's the basic idea:
 1) Collect commercial flagging information from participating users at a 
central server (hopefully, this isn't a DMCA violation).
2) Analyze the data to determine groups of users who have performed 
duplicate work.
3) Analyze the data to predict groups of users who will be performing 
duplicate work.
4) Distribute the future duplicate work among the users to reduce each users 
individual load.
  Here's an example using _Lost_ and _Alias_ (chosen for their short names):
 Givens:
- 23 users with Comcast Analog Basic Cable service in Portland, OR record 
and flag new episodes of _Lost_ and _Alias_ each week.
- These users use a variety of commercial flagging methods.
- The machines have a variety of available CPU power.
 Scenario:
- 23 users submit data to the DCF server via a secured and anonymous 
interface. This data includes, which shows they flagged and the start and 
end times of each commercial segment. All times are synchronized, to the DCF 
server's highly-accurate clock (more on this later).
 - After submitting each show's data, the DCF server indicates to the client 
whether the client can join a "partnership".
 - "Partnerships" are created when the DCF server determines that 2 or more 
users are performing (and will perform) duplicate work (with similar output) 
for 1 or more shows.
 - 10 users are invited to join the new partnership for new episodes of 
_Lost_ and _Alias_ on Comcast Analog Basic Cable, Portland, OR. These ten 
users are invited because they're machines are of similar power (i.e. 
commercial flagging occurs after a similar delay and in a similar amount of 
time). These users are now "Partners" within the "Partnership".
 - At first, none of the partners are "trusted" or have earned any "credits" 
within the partnership. As partners submit more data they earn more credits. 
The exact amount they earn per submission is weighted by how much they are 
trusted (their "fidelity" factor") and the accuracy of the submitted data 
(how similar it is to other data). Once partners have earned enough credits, 
they can "purchase" data from the partnership.
 - After N weeks, only 7 users have earned enough credits to share data.
 - 3 partners are selected to flag next week's episode of _Lost_.
 - 4 partners are selected to flag next week's episode of _Alias_.
 - Of the 3 partners selected to flag _Lost_, all do so and submit their 
data.
 - Of the 4 partners selected to flag _Alias_, only 3 do so and submit their 
data. The 1 user who did not submit data has lessened his "fidelity" factor.
 - The 6 partners who submitted data, earn credits and increase their 
fidelity factor.
 - The 3 partners who *do not* have _Lost_ flag data spend credits to 
receive this data (at a discounted cost, because of their increased 
fidelity).
 - The 3 partners who *do not* have _Alias_ flag data spend credits to 
receive this data (again, at the discount cost).
 - The flag data is not perfect: clocks, settings, reception, etc. vary. The 
partners use the "purchased" flag data to limit their own commercial 
flagging to those suspect times within the shows (with perhaps a 1 minute 
margin before and after). The results of these "verification" flag jobs are 
submitted back to the server.
 Summary:
So, now 7 users have formed a partnership to share the load of flagging 
_Alias_ and _Lost_.
6 of them are significantly reduced their flagging load for these two shows.
1 partner needs to regain the trust of the partnership by submitting data in 
a timely manner.
 One can easily imagine a greatly expanded model, where a particular user 
could belong to dozens of partnerships. Each partnership could have hundreds 
of users, and dozens of shows. As a result of participating in partnerships, 
the user may one be required to flag a few shows (in their entirety) each 
week.
 Benefits:
- Reduced commercial flagging for individual partners.
- Increased accuracy of commercial flagging (via consensus).
- "Leaching" is not allowed.
- Negative effects of poisoning are reduced through "fidelity" factors and 
credits.
- New methods of commercial flagging (either local or distributed) can be 
seamlessly incorporated.
- The available CPU power could be used for new extremely processor 
intensive flagging methods.
 Drawbacks:
- Requires central server.
- Requires many participants.
- Requires frequent communications with the server (albeit, not much data s 
transferred).
- Requires changing commercial flagging to acquire partnership data.
- Requires changing commercial flagging to allow for flagging just parts of 
the show.
- Requires interface changes to alert users when they are about to "fail in 
their partnership duties" by not recording and flagging a show.
- The central DCF server stores recording habits of users. It's anonymous, 
but still concerning.
- Requires similar "content streams". Anecdotal experience (hearing the same 
commercials over the phone with friends) make me suspect that commercial 
*times* don't vary within the same Service Provider. Analysis of submitted 
data will be the acid test. If they server never finds suitable 
partnerships, then everybody's content streams must be different, and the 
whole project is a failure.
 - If the project is successful, content providers will further vary the 
content streams.
 Progress:
- I have built a local database to store the DCF data.
- I am building a sql script to populate the DCF database from mythconverg.
- I will be collecting data (via emailed output of the sql script) from 
other users.
 - I have outlined a solution for time synchronization. Basically, partners 
submit the machines local time with every transaction.
- I am defining a secure and anonymous interface for the DCF server.
- I am defining factors which I believe should effect the "fidelity" of data 
submitted.
 Ideas, questions, comments, criticisms are welcome.
  -- 
Christopher David Petersen
Member of PoORMUG http://poormug.bitbucket.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mythtv.org/pipermail/mythtv-users/attachments/20050414/d109f695/attachment.htm