I began the project with sketching some action examples that I wanted to implement. My original thought processes were using a camera or set of cameras positioned above me so that they observe me with a birds eyes view. I would be able to walk left to indicate leftward motion, walk right to indicate rightward motion, jump and crouch to indicate those two different actions, etc. I drew a small red circle to indicate a border of what visual information would be used by the cameras. This vould be a literal physical marker that I segment from the rest of the video frames, or it could be the part of the image that intersects between the multiple cameras. Either way, I figured this setup would be good at differentiating some actions like walking left vs. right, but it may struggle in differentiating between standing still, jumping , and crouching, as these are related to differences in my vertical position, which the cameras might not pick up well if they are positioned to see me from birds-eye where my lateral movements are more discernible. With enough samples, this likely would not be an issue though, because slight differences in my position would be learned as different actions. This is especially the case if I use multiple cameras in slightly different positions.

To being the engineering part of the project I thought of just getting a simple version of the system working such that the software could open the game, navigate the menu, and start a level. I accomplished this with using a python keyboard input module that can simply trigger different keyboard events. I used emulation software to run a version of New Super Mario Bros. for the Nintendo DS. After managing to complete the automation of opening the game with python keyboard events, I looked to implement the computer vision segment. I decided on using the existing MediaPipe Computer Vision library developed by Google for this task. It has an existing framework for pose and gesture estimation using a landmark system. This means it attaches points to parts of the body that it recognizes and then normalizes them in relation to each other so that the parts of the hand are recognizable agnostic to the background and the body postion/scale. With these landmarks, you can then train a neural network to recognize distinct poses and gestures and classify based on the landmarks. It was at this point that I ran into my first major problem. Pose estimation using the full body landmarks was pretty hard to correctly implement. MediaPipe at the time did not offer a full-release version of the pose estimation so there more issues tended to come up that were harder to look into than the older frameworks (like image classification and hand landmarks). The normalization did not work as expected so I ended up opting to use the hand landmark version instead and base my inputs just off of hand gestures. While this does not incorporate the full-body actuation that I originally had in less mind, the hands gestures still provide a robust way of using intuitive physical movements unrelated to button pressing to move the game character. At this point I had implemented my commands and the trained model and got the first semblance of controlling my character via hand gestures. The character was moving way too rapidly so I implemented a small lag between every action so the human player could reliably change gestures before another game action was made. If this lag was too long though, the character would not respond to hand gestures quick enough to avoid obstacles, so I manually tuned this value to find a middle-ground I could work with.

At this point I had a decently working version of the input system working, but given my laptop’s low processing power, the communication between the model and game inputs was a bit slower than I would like. While I did not implement these changes, I could try to lower the amount of image data being processed at every step to make the communication faster.

I just needed to implement the output portion of the system, which would be communication to the hand via electrical muscle stimulation. This part was pretty difficult because the particular EMS system (Neuralaxy’s Neurostimduino) I was using was one I had not worked with before. So when the stimulation did not occur, I was not sure if my issue was with the hardware or software. After a few weeks of troubleshooting email chains showing images of my hardware and problems and making slight adjustments, we finally figured out that the main issue with the device was that some of the soldering pins on the EMS circuit board were making direct contact with the usb insert on the arduino board directly below it. By shaving the pins a bit and putting some insulating tape between the space they contacted, the EMS worked amazingly.

This was great because both systems worked separately, and I just needed to enable their communication. This part proved troublesome as well, because Neurostimduinuo used I2C commands that directly communicated with its circuit serial ports, and I could not find an effective way for getting python to directly communicate with it via its own serial communication libraries.

This mean that while I could get the EMS to work, it did not happen depending on the game state of the character. I could manually trigger it though, which I did for demonstration purposes. This project was one of my most intensive looks into computer vision, gesture recognition and input, and input/output loops that did not require manual keyboard typing. It was an absolute blast to work on and I learned a lot about the separate systems. I would like to return to this project in the future though to implement the full body pose recognition and game-state based electrical alerts.

ArtAbstractor

CharacterControl

PowerUp