当前位置:首页 > Ben-Zrihem - Approximate - Nearest - Neighbor - 2015 - CVPR - paper - 图文
ApproximateNearestNeighborFieldsinVideo
NirBen-Zrihem
DepartmentofElectricalEngineering
Technion,Israel
bentzinir@gmail.com
LihiZelnik-Manor
DepartmentofElectricalEngineering
Technion,Israel
lihi@ee.technion.ac.il
Abstract
WeintroduceRIANN(RingIntersectionApproximateNearestNeighborsearch),analgorithmformatchingpatchesofavideotoasetofreferencepatchesinreal-time.Foreachquery,RIANN?ndspotentialmatchesbyintersect-ingringsaroundkeypointsinappearancespace.Itssearchcomplexityisreverselycorrelatedtotheamountoftempo-ralchange,makingitagood?tforvideos,wheretypicallymostpatcheschangeslowlywithtime.ExperimentsshowthatRIANNisuptotwoordersofmagnitudefasterthanpreviousANNmethods,andistheonlysolutionthatoper-atesinreal-time.WefurtherdemonstratehowRIANNcanbeusedforreal-timevideoprocessingandprovideexam-plesforarangeofreal-timevideoapplications,includingcolorization,denoising,andseveralartisticeffects.
Figure1.VideoANNFields:(a)Input:Livestreamofvideoframes.(b)Areferencesetofimagepatches.(c)Foreachframe,adenseANNFieldisproducedbymatchingpatchesfromtheref-erenceset.Thisisdoneinrealtime.
1.Introduction
TheApproximateNearestNeighbor(ANN)problemcouldbede?nedasfollows:givenasetofreferencepointsandincomingquerypoints,quicklyreportthereferencepointclosesttoeachquery.Approximatesolutionsperformthetaskfast,butdonotguaranteetheexactnearestneigh-borwillbefound.Thecommonsolutionsarebasedondatastructuresthatenablefastsearchsuchasrandomprojec-tions[20,19],orkd-trees[3,27].
Whenthesetofqueriesconsistsofallpatchesofanim-agetheresultisanANN-Field(ANNF).Barnesetal.[4]de-velopedthePatchMatchapproachforANNF,whichshowsthatspatialcoherencycanbeharnessedtoobtainfastcom-putation.AlgorithmsthatintegratetraditionalANNmeth-odswithPatchMatch’spatialcoherencyyieldevenfasterruntimes[16,21].Itisthusnotsurprisingthatpatch-matchingmethodshavebecomeverypopularandnowlieintheheartofmanycomputervisionapplications,e.g.,texturesynthesis[12],imagedenoising[9],super-resolution[13],andimageediting[4,5,28]tonameafew.
Inthispaperwepresentanef?cientANNFalgorithmforvideo.Theproblemsetupweaddressismatchingall
patchesofavideotoasetofreferencepatches,inreal-time,asillustratedinFigure1.Inourformulationthereferencesetis?xedfortheentirevideo.Infact,wecouldusethesamereferencesetfordifferentvideos.Additionally,ourreferencesetofpatchesisnotrestrictedtooriginatefromasingleimageoravideoframe,asiscommonlyassumedforimagePatchMatching.Instead,itconsistsofanon-orderedcollectionofpatches,e.g.,adictionary[2,15].Thissetupenablesreal-timecomputationoftheANNFieldsforvideo.Weshowempirically,thatitdoesnotharmaccuracy.Ouralgorithmisdesignedtoachievelowrun-timebyre-lyingontwokeyideas.First,weleveragethefactthatthereferencesetis?xed,andhencemoderatepre-processingisacceptable.Atpre-processingweconstructadatastructureofthereferenceset,thatenablesef?cientindexingduringrun-time.Second,werelyontemporal-coherencytoadaptourhashingfunctionstothedata.Thehashingweuseisnot?xeda-prioriasinpreviousworks[4,21,18,16],butratheritistunedperquerypatchduringruntime.Inregionswithhightemporalchangethehashingistunedtobecoarse,whichleadstohighercomputationtimes(largerbinstrans-latetocheckingoutmorecandidatematches).Onthecon-trary,inregionswithlowchange,thehashingis?nerandhencethecomputationtimeislower.Werefertothisap-proachas“query-sensitivehashing”.Ourhashingisgenerictoanydistancemetric.Weshowexamplesforworking
with2Dspatialpatches,however,ouralgorithmiseasilyextendedtoworkwith3Dspatio-temporalpatchesaswell.Tocon?rmtheusefulnessoftheproposedapproachwefurtherdiscusshowitcanbeadoptedforreal-timevideoprocessing.Weshowthatabroadrangeofpatch-basedim-agetransformationscanbeapproximatedusingournearestneighbormatching.Speci?callyweprovideexamplesofdenoising,colorizationandseveralstylingeffects.
Therestofthispaperisorganizedasfollows.WestartbyreviewingrelevantpreviousworkonimageandvideoANNinSection2.Wethenpresentourhashingtechniqueanditsintegrationintoavideo-ANNFieldssolutioninSection3.WecomparetheperformanceofouralgorithmtopreviousworkinSection4andsuggestseveralapplicationsinSec-tion5.FurtherdiscussionisprovidedinSection6andourconclusionsarelaidoutinSection7.
2.RelatedWork
ThegeneralproblemofApproximateNearestNeighbormatchingreceivedseveralexcellentsolutionsthathavebe-comehighlypopular[20,19,22,3,32].Noneofthese,however,reachreal-timecomputationofANNFieldsinvideo.Image-speci?cmethodsforcomputingtheANNFieldbetweenapairofimagesachieveshorterrun-timesbyfurtherexploitingpropertiesofnaturalimages[4,5,21,16,18,27].Inparticular,theyrelyonspatialcoherencyinimagestopropagategoodmatchesbetweenneighbor-ingpatchesintheimageplane.Whilesuf?cientlyfastformostinteractiveimage-editingapplications,thesemethodsarefarfromrunningatconventionalvideoframerates.Itisonlyfairtosaythatthesemethodswerenotdesignedforvideoanddonotleveragestatisticalpropertiesofvideo.AnextensionfromimagestovideowasproposedbyLiu&Freeman[24]fortheproposeofvideodenoisingthroughnon-local-means.ForeachpatchinthevideotheysearchforkApproximateNearestNeighborswithinthesameframeorinnearbyframes.Thisisdonebypropagat-ingcandidatematchesbothtemporally,usingoptical?ow,andspatiallyinasimilarmannertoPatchMatch[4].Onecanthinkofthisproblemsetupassimilartoours,butwithavaryingreferenceset.Whilewekeepa?xedreferencesetfortheentirevideo,[24]useasdifferentsetofreferencepatchesforeachvideoframe.
Anothergroupofworksthat?ndmatchesbetweenpatchesinvideoarethosethatestimatetheoptical?ow?eld[35,30,17,6].Severaloftheseachievereal-timeperformance,oftenviaGPUimplementation.However,theproblemde?nitionforoptical?owisdifferentfromtheonewepose.Theoptical?ow?eldaimstocapturethemotioninavideosequence.Assuch,matchesarecomputedonlybetweenconsecutivepairsofframes.Inaddition,smalldis-placementsandsmoothmotionareusuallyassumed.So-lutionsforlarge-displacementoptical-?owhavealsobeen
proposed[8,34,10,31].Themethodsof[8,34,31]inte-gratekeypointdetectionandmatchingintotheoptical?owestimationtoaddresslargedisplacements.Chenetal.[10]initiatethe?owestimationwithANN?eldsobtainedbyCSH[21].Noneofthesemethodsarenearreal-timeperfor-mance,withthefastestalgorithmsrunningatafewsecondsperframe.Furthermore,whileallowingforlargedisplace-ments,theirgoalisstillcomputingasmoothmotion?eld,whileoursisobtainingsimilaritybasedmatches.
InSection5weshowhowourANNFframeworkcanbeusedforvideoprocessing.TheideaofusingANNFforvideoprocessinghasbeenproposedbefore,andseveralworksmakeuseofit.Sun&Liu[29]suggestanapproachtovideodeblockingthatconsidersbothoptical?owesti-mationandANNsfoundusingakd-tree.Anapproachthatutilizestemporalpropagationforvideosuper-resolutionisproposedin[25].Theyaswellrelyonoptical?owestima-tionforthis.Thequalityoftheresultsobtainedbythesemethodsishighbutthiscomesatthepriceofverylongrun-times,ofteninthehours.
Ourworkisalsosomewhatrelatedtomethodsforhighdimensional?ltering,whichcanberunonpatchesinvideoforcomputingnon-local-means.Broxetal.[7]speed-uptraditionalnon-local-meansbyclusteringthepatchesinatreestructure.Adamsetal.[1]andGastaletal.[14]proposeef?cientsolutionsforhigh-dimensional?lteringbasedonGaussiankd-trees.Noneofthesemethodsprovidereal-timeperformancewhenthe?lterisnon-local-meansonpatches(unlessharshdimensionalityreductionisapplied).
3.RingIntersectionHashing
CurrentsolutionstocomputingtheANNFieldbetweenapairofimagesutilizetwomechanismsto?ndcandidatematchesforeachquerypatch:spatialcoherency,andap-pearancebasedindexing.Somealgorithmsrelyonlyonthe?rst,whileothersintegrateboth.Toallowforagenericref-erenceset(ratherthananimage)weavoidrelianceonspa-tialcoherencyaltogether.Instead,werelyonanimportantcharacteristicofvideosequences:thereisastrongtemporalcoherencebetweenconsecutivevideoframes.Weharnessthisforef?cientappearance-basedindexing.
3.1.TemporalCoherency
Letqx,y,tdenotethequerypatchqatpositionx,yinframetoftheinputvideo.Our?rstobservationisthatthereisstrongtemporalcoherencybetweenconsecutivepatchesqx,y,t?1,qx,y,t.Inotherwords,mostpatcheschangeslowlywithtime,andhencepatchqx,y,t?1isoftensimilartopatchqx,y,t.Oursecondobservationisthattemporalcoherencyimpliescoherencyinappearancespace:ifpatchqx,y,t?1wasmatchedtopatchriofthereferenceset,thenpatchqx,y,tishighlylikelytobematchedtopatchessimilartoriinappearance.ThisisillustratedvisuallyinFigure2.(a).
(a)(b)
Figure2.Coherencyinappearance:(a)Whenconsecutivequerypatchesqx,y,t?1andqx,y,taresimilar,theirexactNNsintheref-erencesetriandrjarealsosimilar.(b)Thehistogramdepictsthedistributionofdistancesbetweensuchpairsofreferencepatchesriandrj.Itshowsthatformostqueriesqx,y,t,theexactNNliesinad=1radiusballaroundtheNNofthepredecessorpatchqx,y,t?1.
Toevaluatethestrengthofcoherencyinappearanceweperformedthefollowingexperiment.Foreach8×8patchinagiventargetvideowe?nditsexactnearestneighborinareferencesetofpatches.Forthisexperimentthereferencesetconsistsofallpatchesofarandomframeofthevideo.Letpatchesri,rjbetheexactNNsofpatchesqx,y,t?1andqx,y,t,respectively.Wethencomputethedistanceinap-pearancebetweenthematches:d=??ri?rj??2.Thiswasre-peatedoverpairsofconsecutivepatchesfrom20randomlychosenvideosfromtheHollywood2[26]data-set.Thedis-tributionofdistancesdispresentedinFigure2.(b),afterex-cludingpatcheswhereri=rj(effectivelyexcludingstaticbackgroundpatches).Ascanbeseen,for~85%ofthepatchesd≤1.ThisimpliesthatcoherencyinappearanceisastrongcuethatcanbeutilizedforvideoANN.
ri,theringradiuswillbesmallandwillincludeonlyafewcandidatereferencepatches.Thewidthofthering,denotedby2ε,couldalsobetunedtoadaptthesearchspace.Wetakethewidthεtobeproportionaltothera-diusε=α·dist(ri,qx,y,t).Inourimplementationandallexperimentsα=0.25.Ourringsarethuswiderinareasoflargechangesandnarrowerinregionsoflittlechange.
AscanbeseeninFigure4.(a),theringaroundriin-cludestheneighborsofqx,y,t,butitalsoincludesrefer-encepatchesthatareveryfarfromqx,y,t,e.g.,ontheothersideofthering.Toexcludethesepatchesfromthesetofcandidatesweemployfurtherconstraints.Notethatourobservationsregardingriaretruealsoforanyotherpointinthereferenceset.Thatis,ifdist(rj,qx,y,t)issmall,thendist(rk,rj)≈dist(rk,qx,y,t)foranypatchrkofthereferenceset.Therefore,wefurtherdrawringsofradiusdist(rk,qx,y,t)±ε,aroundpointsrkselectedatrandomfromthecurrentsetofcandidates.The?nalcandidateNNsarethosethatlieintheintersectionofallrings,asillus-tratedinFigure4.(b).ForeachquerywecontinuetoaddringsuntilthenumberofcandidateNNsisbelowagiventhresholdL.Inallourexperimentsthethresholdis?xedtoL=20candidates.The?nalmatchistheoneclosesttoqx,y,tamongthecandidateNNs.
3.3.ComputationalComplexity
Asalaststepofourconstruction,weneedtomakesurethattheringsandtheirintersectionscanbecomputedef-?cientlyduringruntime.Therefore,atthepre-processingstagewecomputeforeachreferencepatchasortedlistofitsdistancesfromallotherreferencepatches.Thecompu-tationtimeofthislistandthespaceittakesarebothO(n2),wherenisthesizeofthereferenceset.
Havingthesesortedlistsavailable,signi?cantlyspeedsupthecomputationatruntime.Foreachquerypatchqx,y,t,wecomputethedistanced=dist(qx,y,t,ri),whereriisthelastknownmatch.Weaddallreferencepointsofdistanced±εfromritothesetofcandidateNN.Thus,thecomputationcomplexityforeachringincludesonedis-tancecalculation,andtwobinarysearchesinasortedarray,O(logn).Wecontinuetoaddringsandcomputetheinter-sectionbetweenthem,untilthenumberofcandidates≤L.Figure5exploresempiricallytherelationbetweenthenumberofringsandtheamountoftemporalchange.Itshowsthatthelargerthechange,themoreringsareneeded,andthehigherthecomputationtimeis.AswasshowninFigure2.(b),formostpatchesthechangeisverysmall,andhencetheoverallcomputationtimeissmall.
3.2.AdaptiveHashing
Theseobservationsimplythatcandidatematchesforpatchqx,y,tshouldbesearchednearri,thematchofqx,y,t?1.Furthermore,thesearchregionshouldbeadaptedperquery.Inareasoflowtemporalchange,alocalsearchnearrishouldsuf?ce,whereas,inareasofhightemporalchangethesearchshouldconsiderabroaderrange.
So,howdowedeterminethesearchareaforeachquery?Theanswerrequiresonefurtherobservation.Iftherefer-encesetisadequatethenqx,y,tanditsNNrjareveryclosetoeachotherinappearancespace.Thissuggeststhatthedistancebetweenriandrjisclosetothedistancebetweenriandqx,y,t,i.e.,dist(ri,rj)≈dist(ri,qx,y,t).Thisisil-lustratedandveri?edempiricallyinFigure3,forthreesizesofreferencesets.Ascanbeseen,forsetsizeof900patchesormore,thisassertionholdswithveryhighprobability.Basedontheabove,to?ndthematchrjweshouldsearchthrougha“fat”ringofradius≈dist(ri,qx,y,t),aroundri,asillustratedinFigure4.(a).Inareaswithsig-ni?cantchangeqx,y,twillbefarfromri,theringradiuswillbelargeandwillincludemanyreferencepatches.Onthecontrary,inareaswithlittlechangeqx,y,twillbenear
3.4.RIANN-OurAlgorithm
WenamethisapproachRIANN(RingIntersectionANN).RIANNisoutlinedinAlgorithm1.Viewingthisprocessasahashingscheme,wesaythatRIANNisaquery-
|dist(ri,rj)?dist(ri,qx,y,t)|
Figure3.Predictingsearchradius:(Left)Thedistancedist(ri,qx,y,t)isagoodpredictorforthedistancedist(ri,rj).(Right)Histogramsof|dist(ri,rj)?dist(ri,qx,y,t)|forthreesizesofreferencesets.Thissuggeststhatdist(ri,qx,y,t)isastrongpredictorfordist(ri,rj).Thecorrelationbecomesstrongerasweusealargerreferenceset.Toexcludestaticbackgroundfromouranalysisweincludeonlyquerieswhereri=rj.Statisticswerecomputedover20videosfromtheHollywood2database.
Figure4.RingIntersectionHashing:(a)To?ndcandidateneigh-borsforqx,y,twedrawaringofradiusd=dist(ri,qx,y,t)aroundri.Hereriisthematchfoundforqx,y,t?1.(b)Toexcludecan-didatesthatarefarfromqx,y,t,wedrawanotherring,thistimearoundrk,oneofthecurrentcandidates.Wecontinuetoaddrings,andleaveinthecandidatesetonlythoseintheintersection.
Figure5.Searchcomplexitydeterminedbytemporalchange:(Top)Patch-levelanalysis:Themeannumberofringintersectionsasafunctionofthetemporalchange??qt?1?qt??.(Bottom)Frame-levelanalysis:Theredcurveindicatesthetotalnumberofringintersectionsperformedoverallqueriesineachframe.Thebluecurverepresentstheoveralltemporaldifferencebetweenconsecu-tiveframes.Thisshowsaclearcorrelationbetweenthenumberofringsandthetemporaldifferencebetweenframes.
sensitivehashingsinceitbuildsbinsaroundthequeries,asopposedtonon-sensitivehashingthatcreatesthebinsbe-forehand.Query-sensitivehashingavoidsissueswhereaqueryliesataboundaryofabin,thusreducingthechanceofbettercandidateslyinginneighborbins.Temporalco-herencyofadjacentframesleadstomostquerieslyinginthevicinityofthecurrentbestmatch.Here,fewintersec-tionsareoftenenoughtoprovideveryfewcandidates.
RIANNcan?ndANNsforeachquerypatch,giventheANNofitspredecessorpatchandareferenceset.Hence,tocompleteoursolutionforcomputingadenseANN?eldforvideo,weneedtode?netwomorecomponents:(i)howthereferencesetisconstructed,and(ii)whatwedotoinitializethe?rstframe.
Buildingareferencemodel:Westartbycollectingalargesetofpatches.Tobuildaglobalreferenceset,thatcanbeusedformanytargetvideos,thepatchesareextractedran-domlyfromanumberofnaturalimages.Sincenaturalim-agesexhibithighredundancy,thecollectedsetofpatchesislikelytoincludemanysimilarities.Inspiredbydictionaryconstructionmethodssuchas[2,15]weseekamorecom-pactsetthatrepresentswellthesepatches.Todilutepatches
withhighresemblanceweclusterthepatchesusingahighdimensionalregressiontree[15].Wethentakethemedianofeachclusterastheclusterrepresentativeandnormalizeittobeoflength1.Our?nalreferencesetconsistsofthenormalizedrepresentatives(querypatchesarenormalizedatruntime).Last,wecalculatethepairwisedistancematrixandsortitsuchthatcolumnicontainsanascendingorderofdistancesfrompatchi.
Insomeapplicationsonemightwanttouseareferencesetthatconsistsonlyofpatchestakenfromthequeryvideoitself.Forexample,formethodsinspiredbyNon-Local-Means.Whenthisisthecase,werandomlyselectasingleframefromthevideoandcollectallitspatches.Wethenagainclusterthesepatchesusingahighdimensional
共分享92篇相关文档