[mythtv-users] Commercial Flagging Idea - Distributed Commercial Flagging (long)

Mike Benoit ipso at snappymail.ca
Thu Apr 14 19:56:45 UTC 2005


It all sounds waaay more complicated then it needs to be. In my
experience keeping it as simple as possible for the first few versions
is a much better approach.

Why not:

When commercial flagging is done, it uploads its data to the central
server. If/When a manual cut list is made, it uploads that separately.

When commercial flagging starts, it checks the central server to see if
there are any available "recommended cut lists". If so, it downloads
them and uses'em.

I don't see any point in going beyond that until it is proven to work,
and actually turns out to be useful. Doing so is probably going to be a
waste of time, and complicate things so much that people are turned off.

As I mentioned in another thread, collecting and distributing the data
is the easy part. The problem is figuring out exactly what data we're
gonna distribute and then modifying the commercial flagging code to work
with said data efficiently.

Has anyone stepped up to the plate that is familiar with the commflag
code and willing to dedicate time to this project?


On Thu, 2005-04-14 at 12:33 -0700, Christopher David Petersen wrote:
> I'm starting this thread so I don't pollute the discussion of
> fingerprinting for commercial flagging (which is a brilliant, but
> separate idea).
>  
> This thread is meant to discuss that idea of distributed commercial
> flagging (DCF) via existing algorithms to reduce load and increase
> accuracy.
>  
> In brief, here's the basic idea:
>  
> 1) Collect commercial flagging information from participating users at
> a central server (hopefully, this isn't a DMCA violation).
> 2) Analyze the data to determine groups of users who have performed
> duplicate work.
> 3) Analyze the data to predict groups of users who will be performing
> duplicate work.
> 4) Distribute the future duplicate work among the users to reduce each
> users individual load.
>  
>  
> Here's an example using _Lost_ and _Alias_ (chosen for their short
> names):
>  
> Givens:
> - 23 users with Comcast Analog Basic Cable service in Portland, OR
> record and flag new episodes of _Lost_ and _Alias_ each week.
> - These users use a variety of commercial flagging methods.
> - The machines have a variety of available CPU power.
>  
> Scenario:
> - 23 users submit data to the DCF server via a secured and anonymous
> interface. This data includes, which shows they flagged and the start
> and end times of each commercial segment. All times are synchronized,
> to the DCF server's highly-accurate clock (more on this later).
>  
> - After submitting each show's data, the DCF server indicates to the
> client whether the client can join a "partnership".
>  
> - "Partnerships" are created when the DCF server determines that 2 or
> more users are performing (and will perform) duplicate work (with
> similar output) for 1 or more shows.
>  
> - 10 users are invited to join the new partnership for new episodes of
> _Lost_ and _Alias_ on Comcast Analog Basic Cable, Portland, OR. These
> ten users are invited because they're machines are of similar power
> (i.e. commercial flagging occurs after a similar delay and in a
> similar amount of time). These users are now "Partners" within the
> "Partnership".
>  
> - At first, none of the partners are "trusted" or have earned any
> "credits" within the partnership. As partners submit more data they
> earn more credits. The exact amount they earn per submission is
> weighted by how much they are trusted (their "fidelity" factor") and
> the accuracy of the submitted data (how similar it is to other data).
> Once partners have earned enough credits, they can "purchase" data
> from the partnership.
>  
> - After N weeks, only 7 users have earned enough credits to share
> data.
>  
> - 3 partners are selected to flag next week's episode of _Lost_.
>  
> - 4 partners are selected to flag next week's episode of _Alias_.
>  
> - Of the 3 partners selected to flag _Lost_, all do so and submit
> their data.
>  
> - Of the 4 partners selected to flag _Alias_, only 3 do so and submit
> their data. The 1 user who did not submit data has lessened his
> "fidelity" factor.
>  
> - The 6 partners who submitted data, earn credits and increase their
> fidelity factor.
>  
> - The 3 partners who *do not* have _Lost_ flag data spend credits to
> receive this data (at a discounted cost, because of their increased
> fidelity).
>  
> - The 3 partners who *do not* have _Alias_ flag data spend credits
> to receive this data (again, at the discount cost).
>  
> - The flag data is not perfect: clocks, settings, reception, etc.
> vary. The partners use the "purchased" flag data to limit their own
> commercial flagging to those suspect times within the shows (with
> perhaps a 1 minute margin before and after). The results of these
> "verification" flag jobs are submitted back to the server.
>  
> Summary:
> So, now 7 users have formed a partnership to share the load of
> flagging _Alias_ and _Lost_.
> 6 of them are significantly reduced their flagging load for these two
> shows.
> 1 partner needs to regain the trust of the partnership by submitting
> data in a timely manner.
>  
> One can easily imagine a greatly expanded model, where a particular
> user could belong to dozens of partnerships. Each partnership could
> have hundreds of users, and dozens of shows. As a result of
> participating in partnerships, the user may one be required to flag a
> few shows (in their entirety) each week.
>  
> Benefits:
> - Reduced commercial flagging for individual partners.
> - Increased accuracy of commercial flagging (via consensus).
> - "Leaching" is not allowed.
> - Negative effects of poisoning are reduced through "fidelity" factors
> and credits.
> - New methods of commercial flagging (either local or distributed) can
> be seamlessly incorporated.
> - The available CPU power could be used for new extremely processor
> intensive flagging methods.
>  
> Drawbacks:
> - Requires central server.
> - Requires many participants.
> - Requires frequent communications with the server (albeit, not
> much data s transferred).
> - Requires changing commercial flagging to acquire partnership data.
> - Requires changing commercial flagging to allow for flagging just
> parts of the show.
> - Requires interface changes to alert users when they are about to
> "fail in their partnership duties" by not recording and flagging a
> show.
> - The central DCF server stores recording habits of users. It's
> anonymous, but still concerning.
> - Requires similar "content streams". Anecdotal experience (hearing
> the same commercials over the phone with friends) make me suspect that
> commercial *times* don't vary within the same Service Provider.
> Analysis of submitted data will be the acid test. If they server never
> finds suitable partnerships, then everybody's content streams must be
> different, and the whole project is a failure.
>  
> - If the project is successful, content providers will further vary
> the content streams.
>  
> Progress:
> - I have built a local database to store the DCF data.
> - I am building a sql script to populate the DCF database from
> mythconverg.
> - I will be collecting data (via emailed output of the sql script)
> from other users.
> - I have outlined a solution for time synchronization. Basically,
> partners submit the machines local time with every transaction.
> - I am defining a secure and anonymous interface for the DCF server.
> - I am defining factors which I believe should effect the "fidelity"
> of data submitted.
>  
> Ideas, questions, comments, criticisms are welcome.
>  
>  
> -- 
> Christopher David Petersen
> Member of PoORMUG http://poormug.bitbucket.com/
>  
>  
>  
>  
>  
>  
>  
> _______________________________________________
> mythtv-users mailing list
> mythtv-users at mythtv.org
> http://mythtv.org/cgi-bin/mailman/listinfo/mythtv-users
-- 
Mike Benoit <ipso at snappymail.ca>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://mythtv.org/pipermail/mythtv-users/attachments/20050414/18b829e6/attachment.pgp


More information about the mythtv-users mailing list