[mythtv-users] Commercial Flagging Idea - Distributed Commercial
Flagging (long)
Christopher David Petersen
christopher.david.petersen at gmail.com
Thu Apr 14 19:33:06 UTC 2005
I'm starting this thread so I don't pollute the discussion of fingerprinting
for commercial flagging (which is a brilliant, but separate idea).
This thread is meant to discuss that idea of distributed commercial
flagging (DCF) via existing algorithms to reduce load and increase accuracy.
In brief, here's the basic idea:
1) Collect commercial flagging information from participating users at a
central server (hopefully, this isn't a DMCA violation).
2) Analyze the data to determine groups of users who have performed
duplicate work.
3) Analyze the data to predict groups of users who will be performing
duplicate work.
4) Distribute the future duplicate work among the users to reduce each users
individual load.
Here's an example using _Lost_ and _Alias_ (chosen for their short names):
Givens:
- 23 users with Comcast Analog Basic Cable service in Portland, OR record
and flag new episodes of _Lost_ and _Alias_ each week.
- These users use a variety of commercial flagging methods.
- The machines have a variety of available CPU power.
Scenario:
- 23 users submit data to the DCF server via a secured and anonymous
interface. This data includes, which shows they flagged and the start and
end times of each commercial segment. All times are synchronized, to the DCF
server's highly-accurate clock (more on this later).
- After submitting each show's data, the DCF server indicates to the client
whether the client can join a "partnership".
- "Partnerships" are created when the DCF server determines that 2 or more
users are performing (and will perform) duplicate work (with similar output)
for 1 or more shows.
- 10 users are invited to join the new partnership for new episodes of
_Lost_ and _Alias_ on Comcast Analog Basic Cable, Portland, OR. These ten
users are invited because they're machines are of similar power (i.e.
commercial flagging occurs after a similar delay and in a similar amount of
time). These users are now "Partners" within the "Partnership".
- At first, none of the partners are "trusted" or have earned any "credits"
within the partnership. As partners submit more data they earn more credits.
The exact amount they earn per submission is weighted by how much they are
trusted (their "fidelity" factor") and the accuracy of the submitted data
(how similar it is to other data). Once partners have earned enough credits,
they can "purchase" data from the partnership.
- After N weeks, only 7 users have earned enough credits to share data.
- 3 partners are selected to flag next week's episode of _Lost_.
- 4 partners are selected to flag next week's episode of _Alias_.
- Of the 3 partners selected to flag _Lost_, all do so and submit their
data.
- Of the 4 partners selected to flag _Alias_, only 3 do so and submit their
data. The 1 user who did not submit data has lessened his "fidelity" factor.
- The 6 partners who submitted data, earn credits and increase their
fidelity factor.
- The 3 partners who *do not* have _Lost_ flag data spend credits to
receive this data (at a discounted cost, because of their increased
fidelity).
- The 3 partners who *do not* have _Alias_ flag data spend credits to
receive this data (again, at the discount cost).
- The flag data is not perfect: clocks, settings, reception, etc. vary. The
partners use the "purchased" flag data to limit their own commercial
flagging to those suspect times within the shows (with perhaps a 1 minute
margin before and after). The results of these "verification" flag jobs are
submitted back to the server.
Summary:
So, now 7 users have formed a partnership to share the load of flagging
_Alias_ and _Lost_.
6 of them are significantly reduced their flagging load for these two shows.
1 partner needs to regain the trust of the partnership by submitting data in
a timely manner.
One can easily imagine a greatly expanded model, where a particular user
could belong to dozens of partnerships. Each partnership could have hundreds
of users, and dozens of shows. As a result of participating in partnerships,
the user may one be required to flag a few shows (in their entirety) each
week.
Benefits:
- Reduced commercial flagging for individual partners.
- Increased accuracy of commercial flagging (via consensus).
- "Leaching" is not allowed.
- Negative effects of poisoning are reduced through "fidelity" factors and
credits.
- New methods of commercial flagging (either local or distributed) can be
seamlessly incorporated.
- The available CPU power could be used for new extremely processor
intensive flagging methods.
Drawbacks:
- Requires central server.
- Requires many participants.
- Requires frequent communications with the server (albeit, not much data s
transferred).
- Requires changing commercial flagging to acquire partnership data.
- Requires changing commercial flagging to allow for flagging just parts of
the show.
- Requires interface changes to alert users when they are about to "fail in
their partnership duties" by not recording and flagging a show.
- The central DCF server stores recording habits of users. It's anonymous,
but still concerning.
- Requires similar "content streams". Anecdotal experience (hearing the same
commercials over the phone with friends) make me suspect that commercial
*times* don't vary within the same Service Provider. Analysis of submitted
data will be the acid test. If they server never finds suitable
partnerships, then everybody's content streams must be different, and the
whole project is a failure.
- If the project is successful, content providers will further vary the
content streams.
Progress:
- I have built a local database to store the DCF data.
- I am building a sql script to populate the DCF database from mythconverg.
- I will be collecting data (via emailed output of the sql script) from
other users.
- I have outlined a solution for time synchronization. Basically, partners
submit the machines local time with every transaction.
- I am defining a secure and anonymous interface for the DCF server.
- I am defining factors which I believe should effect the "fidelity" of data
submitted.
Ideas, questions, comments, criticisms are welcome.
--
Christopher David Petersen
Member of PoORMUG http://poormug.bitbucket.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mythtv.org/pipermail/mythtv-users/attachments/20050414/d109f695/attachment.htm
More information about the mythtv-users
mailing list