> stats-data-ia > AI > hindsight-experience-replay-how-can-we-be-bad-teachers-two-minute-papers

Hindsight Experience Replay | Two Minute Papers #192

Two Minute Papers - 2017-09-27

The paper "Hindsight Experience Replay" is available here:
https://arxiv.org/pdf/1707.01495.pdf

Our Patreon page with the details:
https://www.patreon.com/TwoMinutePapers

Recommended for you:
Deep Reinforcement Terrain Learning - https://www.youtube.com/watch?v=wBrwN4dS-DA&t=109s
Digital Creatures Learn To Walk - https://www.youtube.com/watch?v=kQ2bqz3HPJE
Task-based Animation of Virtual Characters - https://www.youtube.com/watch?v=ZHoNpxUHewQ
Real-Time Character Control With Phase-Functioned Neural Networks - https://www.youtube.com/watch?v=wlndIQHtiFw
DeepMind's AI Learns Locomotion From Scratch - https://www.youtube.com/watch?v=14zkfDTN_qo

We would like to thank our generous Patreon supporters who make Two Minute Papers possible:
Andrew Melnychuk, Brian Gilman, Dave Rushton-Smith, Dennis Abts, Esa Turkulainen, Evan Breznyik, Kaben Gabriel Nanlohy, Michael Albrecht, Michael Jensen, Michael Orenstein, Steef, Sunil Kim, Torsten Reil.
https://www.patreon.com/TwoMinutePapers

Two Minute Papers Merch:
US: http://twominutepapers.com/
EU/Worldwide: https://shop.spreadshirt.net/TwoMinutePapers/

Music: Antarctica by Audionautix is licensed under a Creative Commons Attribution license (https://creativecommons.org/licenses/by/4.0/)
Artist: http://audionautix.com/ 

Thumbnail background image credit: https://pixabay.com/photo-1193318/
Splash screen/thumbnail design: Felícia Fehér - http://felicia.hu

Károly Zsolnai-Fehér's links:
Facebook: https://www.facebook.com/TwoMinutePapers/
Twitter: https://twitter.com/karoly_zsolnai
Web: https://cg.tuwien.ac.at/~zsolnai/

Two Minute Papers - 2017-09-26

https://media.npr.org/assets/img/2013/12/16/her-fp-0880_wide-a2357d8b67f51632428366d65194a1ee59272dbc.jpg?s=1400

force_inline - 2017-09-27

HER

F Eryth - 2017-09-28

my god, Karoly, we're both thinking about her too, haha.

Reflections Observer - 2017-09-27

Fascinating ! I was quite hopeful about future of Ai before. It wasn't hard if you consider exponential growth etc.
But it was difficult for me to imagine all of these intermediary steps towards that future. These videos proved beyond shadow of a doubt how this is happening. Truly amazing !

RazorbackPT - 2017-09-27

I'm so glad this channel exists. Keep up the great work.

Two Minute Papers - 2017-09-27

Thank you so much for the kind words. Happy to have you in our growing club of Fellow Scholars! :)

Firefox Metzger - 2017-09-27

Lovely =) The video footage in the end reminds me of openAI's Physical Spam Detection though. If you haven't seen it yet:
https://www.youtube.com/watch?v=k21Wtjp074c (its not listed in the search)
Scientists do have a sense of humor...

Two Minute Papers - 2017-09-27

Very fun indeed. Haven't seen this one yet, thank you!

Roger AB - 2017-09-27

it's the same room and robot, must be from the same people.

Donaldo - 2017-09-27

I like the longer videos with more information like this one

Two Minute Papers - 2017-09-28

Noted. Thanks for the feedback! :)

Fred Keebox - 2017-10-01

More detail please! I know you're keeping it broad for laymen, but, could you spend just a /few/ seconds more on what the contribution of the paper is and details? Anyway, this channel is an awesome idea; I've patreoned.

Jiansen Zheng - 2017-10-14

This is truly amazing, gona donate 10 dollars.

조성현 - 2017-09-28

Thank you so much for your effort. I'm not very fluent in English, but I can understand your mentions easily!

Two Minute Papers - 2017-09-28

Thanks for watching, happy to have you in our growing club of Fellow Scholars! :)

Ahmed Kachkach - 2017-09-29

Unfortunately, this video does not even start to explain how a binary reward is used to find an optimal strategy, other than saying that it does different runs with different goals.
I love these videos, they are *awesome*, but it would be great if a simple explanation of what the paper does would be included, without necessarily having to explain everything from scratch (reinforcement learning, etc.) each time.

Jiansen Zheng - 2017-10-14

Ahmed Kachkach Well, they provided the link, which is already enough.

Billy Monday - 2017-10-01

isnt that technique just a generalized reward function again?

douglas fairmeadow - 2017-09-29

@2.34 "Very apt" - lol.

Think you laboured the careless teacher metaphor. We get it - it's about contextual learning as opposed to useless pass/fail feedback.. Also would be great if you had gone into the mechanics of HER and what you meant by sparseness. Didn't walk away with enough air to blow through my didgeridoo there.

But great and thoroughly illuminating! Really, always turn my mind on to new things :D

Haku Sansaku - 2017-10-02

I think you could add a paper focused more on the enonomic side of things. Like impact or cost reduction or faster assembly times.

Haku Sansaku - 2017-10-02

During university I had the oppurtunity to work with industrial robots for 2 weeks. Task was to assemle a USB stick.
The biggest problem was that the system did not improve at all, it just did what you programmed it to do. With this AI Tech companies like KuKa can create robots which are lightyears ahead of whats currently available on the market.

SECONDQUEST - 2017-09-27

The weirdly Assassin's Creed Yellow Man returns!

Mr.sunflower - 2017-12-21

love u

Bronn - 2017-09-27

This is what I've been waiting for someone to develop. The most revolutionary breakthrough of them all, though it might not seem as something so special at first

Two Minute Papers - 2017-09-28

I love this too. A piece of gem, even if the presentation doesn't contain the "usual" visual fireworks.

Tuc - 2017-09-27

I realized the moment a bot in Dota2 was juking, baiting, and antipating human behaviors to gain an advantage, that things we hold unique to and special to life aren't really that unique and special. Jokes, jealousy, affection, love, hate, everything can all be replicated in a neural network to the point the the physical output and written words become indistinguishable from natural human behavior, as long as the behavior conforms to the AI's fitness function.

Its sad for me to look around at all the people that don't understand the potential for AI applications in the future, or what machine learning really is. I feel there were too many cries of wolf in the past.. This is break-through technology on a completely different level than a hard-coded "smart" software and applications.

Ivan Lovell - 2017-09-28

You seem to neglect the possibility that these things are unique to life. And, by extension, a computer which can achieve them could be alive.

douglas fairmeadow - 2017-09-29

@Tuc You android?

Why place so much meaning on the public and private record of words and emoti(c)ons expressed?

Maybe me, but always found ze eternal why of things fascinating and the carrot that algos were chasing like track dogs. Ie, human activity is propelled by a deep concern for the whys and hows of an event, rather than what's happens.

Shit, cow just got loose..

Ethan9750 - 2017-09-29

Thiago Braga actually no, coz the bot communicates with the machine interface directly sending data, while humans will have to not only physically move to respond, but also wait for their peripherals to communicate for them, which in itself adds input lag. The bot also applies edge detection, and spacial mapping algorithms directly to data it receives, while humans also have to wait for monitor response times and refresh rates, all these can make a lot of steps especially like juking easier for bots due to the fluidity of communication. I often find, on lower refresh rate and slightly laggier setups, its much harder to juke than on faster setups. Not to mention that juking also is effective with respect to the obviousness of the animation, shadow fiend for example, in the demo game raises his hand quite obviously to attack as compared to a character like sniper.

TacticalmanDK Ns - 2017-09-27

I have been thinking after watching the Dota 2 video, about what happenes if you take an AI an put it in a human stoneage simulation i.e game. Like Minecraft or Medival Engineers. Many AI devs talk about how an AI in order to become human like needs to have the same inputs as a human, it needs to see, hear, smell, feel, but with an Dota 2 like bot irl would still take years to learn how to become human but what if we made a game, a suvival game with tribe men, the goal for the DOTA like AI is to suvive and procreate, the game have stuff like food needs, shelter, animals, etc. Just run a hundreds of tousands of these games. After a while will it not act like a human? Or what if it was supposed to act like an animal?

TacticalmanDK Ns - 2017-09-27

If just I was rich I would work on this. But can some AI devs atleast try to put an ai into an suvival game, would be so interresting.

TacticalmanDK Ns - 2017-09-27

My toughs yesterday at bed really made me start thinking so many thinks, I really think we can learn a ton about humans and have we're made if we succeed at doing this stuff too. thoughts like why do we have desires, needs. Why don't we just all work together etc, but they are all kinds of reasons and research like this can explain some of it.

SierraSierraFoxtrot - 2017-09-27

This is straight out of The Hitchiker's Guide to The Galaxy.
A computer answers a question that is not properly understood or formulated by the operators.

ProCactus - 2017-09-28

Do not play cup and balls with the Terminator

logosfabula - 2017-09-27

How does it know that, as in the example, the puck had to be shot more to the right if the feedback is binary?

SECONDQUEST - 2017-09-27

logosfabula The goals are changed to various situations and previous actions are compared to new situations.
Don't quote me, I'm a fish.

brianorca - 2017-09-28

After a failed simulation run, it checks that same run against alternative goals. If some of those goals have a positive result, then that combination can be added to the learning without running the whole simulation again.

Kunj Patel - 2017-09-27

SIR THIS ALL VIDEOS ARE AWESOME BUT PLEASE TELL WHERE TO START IN LEARNING MACHINE LEARNING....... WHICH LANGUAGE TO LEARN BEFORE ???? WHICH SOFTWARE WE WILL NEED ????? WHERE ARE SOME GOOD MACHINE LEARNING COURSE ????? SIR PROVIDE THESE DETAILS WHICH MAY HELP THOUSANDS OF YOUR USERES TO CREATE SUCH MACHINE LEARNING PROJECTS

EctoMorpheus - 2017-09-27

kunj patel the powers of machine learning are only available to those who know how to turn off caps lock. Furthermore, learn python. I'd start with neural networks as that opens up a lot of possibilities. I'm pretty sure that there is a really nice tutorial if you google 'python neural network tutorial', can't remember the name though...

F Eryth - 2017-09-28

Learn some basic python, then look up Siraj Raval on YT.

D Vee - 2017-09-28

You should know some math and basic linear algebra its not necessary if you just want to implement algorithms but if you want to understand how they work then you need math and further down the line youll need probability. Id recommend andrew ngs deep learning course on courserea you can take it for free by auditing it.

Locut0s - 2017-09-28

IMHO while I don't envision sentience and true artificial consciousness any time soon, the pace of AI is so fast that it's only a matter of time till we start to see some potentially terrifying consequences. I'm not talking about skynet. I'm talking about the same kind of problems we have today with software bugs, some of which can be life threatening. Only instead of a bug in coding, with smart AIs it becomes a miscommunication between what the AI understands. And what the programmer thinks the AI understands. For example if you put all of air traffic control on the planet into the hands of an AI right now it would probably vastly improve flight scheduling, routes, ways to board and load planes etc. But suppose you also didn't realize that the AI also thought that the really long end goal was to crash all the planes and maximize human casualties. And trust me this wouldn't be an obvious, oh we forgot to close a parenthesis programming error. Indeed in hind sight if you knew what the program was "thinking" you too would also agree the end result flowed naturally.