[mythtv-users] Commercial Flagging Idea - Distributed Commercial Flagging (long)

Jomama tirebiter at gmail.com
Fri Apr 15 07:13:02 UTC 2005


I will be glad to help out with this very daring pilot program.



On 4/14/05, Christopher David Petersen
<christopher.david.petersen at gmail.com> wrote:
> I'm starting this thread so I don't pollute the discussion of fingerprinting
> for commercial flagging (which is a brilliant, but separate idea). 
>   
> This thread is meant to discuss that idea of distributed commercial flagging
> (DCF) via existing algorithms to reduce load and increase accuracy. 
>   
> In brief, here's the basic idea: 
>   
> 1) Collect commercial flagging information from participating users at a
> central server (hopefully, this isn't a DMCA violation). 
> 2) Analyze the data to determine groups of users who have performed
> duplicate work. 
> 3) Analyze the data to predict groups of users who will be performing
> duplicate work. 
> 4) Distribute the future duplicate work among the users to reduce each users
> individual load. 
>   
>   
> Here's an example using _Lost_ and _Alias_ (chosen for their short names): 
>   
> Givens: 
> - 23 users with Comcast Analog Basic Cable service in Portland, OR record
> and flag new episodes of _Lost_ and _Alias_ each week. 
> - These users use a variety of commercial flagging methods. 
> - The machines have a variety of available CPU power. 
>   
> Scenario: 
> - 23 users submit data to the DCF server via a secured and anonymous
> interface. This data includes, which shows they flagged and the start and
> end times of each commercial segment. All times are synchronized, to the DCF
> server's highly-accurate clock (more on this later). 
>   
> - After submitting each show's data, the DCF server indicates to the client
> whether the client can join a "partnership". 
>   
> - "Partnerships" are created when the DCF server determines that 2 or more
> users are performing (and will perform) duplicate work (with similar output)
> for 1 or more shows. 
>   
> - 10 users are invited to join the new partnership for new episodes of
> _Lost_ and _Alias_ on Comcast Analog Basic Cable, Portland, OR. These ten
> users are invited because they're machines are of similar power (i.e.
> commercial flagging occurs after a similar delay and in a similar amount of
> time). These users are now "Partners" within the "Partnership". 
>   
> - At first, none of the partners are "trusted" or have earned any "credits"
> within the partnership. As partners submit more data they earn more credits.
> The exact amount they earn per submission is weighted by how much they are
> trusted (their "fidelity" factor") and the accuracy of the submitted data
> (how similar it is to other data). Once partners have earned enough credits,
> they can "purchase" data from the partnership. 
>   
> - After N weeks, only 7 users have earned enough credits to share data. 
>   
> - 3 partners are selected to flag next week's episode of _Lost_. 
>   
> - 4 partners are selected to flag next week's episode of _Alias_. 
>   
> - Of the 3 partners selected to flag _Lost_, all do so and submit their
> data. 
>   
> - Of the 4 partners selected to flag _Alias_, only 3 do so and submit their
> data. The 1 user who did not submit data has lessened his "fidelity" factor.
>   
> - The 6 partners who submitted data, earn credits and increase their
> fidelity factor. 
>   
> - The 3 partners who *do not* have _Lost_ flag data spend credits to receive
> this data (at a discounted cost, because of their increased fidelity). 
>   
> - The 3 partners who *do not* have _Alias_ flag data spend credits to
> receive this data (again, at the discount cost). 
>   
> - The flag data is not perfect: clocks, settings, reception, etc. vary. The
> partners use the "purchased" flag data to limit their own commercial
> flagging to those suspect times within the shows (with perhaps a 1 minute
> margin before and after). The results of these "verification" flag jobs are
> submitted back to the server. 
>   
> Summary: 
> So, now 7 users have formed a partnership to share the load of flagging
> _Alias_ and _Lost_. 
> 6 of them are significantly reduced their flagging load for these two shows.
> 1 partner needs to regain the trust of the partnership by submitting data in
> a timely manner. 
>   
> One can easily imagine a greatly expanded model, where a particular user
> could belong to dozens of partnerships. Each partnership could have hundreds
> of users, and dozens of shows. As a result of participating in partnerships,
> the user may one be required to flag a few shows (in their entirety) each
> week. 
>   
> Benefits: 
> - Reduced commercial flagging for individual partners. 
> - Increased accuracy of commercial flagging (via consensus). 
> - "Leaching" is not allowed. 
> - Negative effects of poisoning are reduced through "fidelity" factors and
> credits. 
> - New methods of commercial flagging (either local or distributed) can be
> seamlessly incorporated. 
> - The available CPU power could be used for new extremely processor
> intensive flagging methods. 
>   
> Drawbacks: 
> - Requires central server. 
> - Requires many participants. 
> - Requires frequent communications with the server (albeit, not much data s
> transferred). 
> - Requires changing commercial flagging to acquire partnership data. 
> - Requires changing commercial flagging to allow for flagging just parts of
> the show. 
> - Requires interface changes to alert users when they are about to "fail in
> their partnership duties" by not recording and flagging a show. 
> - The central DCF server stores recording habits of users. It's anonymous,
> but still concerning. 
> - Requires similar "content streams". Anecdotal experience (hearing the same
> commercials over the phone with friends) make me suspect that commercial
> *times* don't vary within the same Service Provider. Analysis of submitted
> data will be the acid test. If they server never finds suitable
> partnerships, then everybody's content streams must be different, and the
> whole project is a failure. 
>   
> - If the project is successful, content providers will further vary the
> content streams. 
>   
> Progress: 
> - I have built a local database to store the DCF data. 
> - I am building a sql script to populate the DCF database from mythconverg. 
> - I will be collecting data (via emailed output of the sql script) from
> other users. 
>  
> - I have outlined a solution for time synchronization. Basically, partners
> submit the machines local time with every transaction.- I am defining a
> secure and anonymous interface for the DCF server. 
> - I am defining factors which I believe should effect the "fidelity" of data
> submitted. 
>   
> Ideas, questions, comments, criticisms are welcome. 
>   
>   
> -- 
> Christopher David Petersen 
> Member of PoORMUG http://poormug.bitbucket.com/ 
>   
>   
>   
>   
>   
>   
>   
> _______________________________________________
> mythtv-users mailing list
> mythtv-users at mythtv.org
> http://mythtv.org/cgi-bin/mailman/listinfo/mythtv-users
> 
> 
>


More information about the mythtv-users mailing list