|
文章目录前言一、姿势估计1.1姿态关键点1.2旧版solutionAPI1.3新版solutionAPI1.4俯卧撑计数二、手部追踪2.1手部姿态2.2API使用2.3识别手势含义参考前言Mediapipe是谷歌出品的一种开源框架,旨在为开发者提供一种简单而强大的工具,用于实现各种视觉和感知应用程序。它包括一系列预训练的机器学习模型和用于处理多媒体数据的工具,可以用于姿势估计、手部追踪、人脸检测与跟踪、面部标志、对象检测、图片分割和语言检测等任务Mediapipe是支持跨平台的,可以部署在手机端(Android,iOS),web,desktop,edgedevices,IoT等各种平台,编程语言也支持C++,Python,Java,Swift,Objective-C,Javascript等在本文中,我们将通过Python实现Mediapipe在姿势估计和手部追踪不同领域的应用github地址:https://github.com/google/mediapipe一、姿势估计1.1姿态关键点序号部位PoseLandmark0鼻子PoseLandmark.NOSE1左眼(内侧)PoseLandmark.LEFT_EYE_INNER2左眼PoseLandmark.LEFT_EYE3左眼(外侧)PoseLandmark.LEFT_EYE_OUTER4右眼(内侧)PoseLandmark.RIGHT_EYE_INNER5右眼PoseLandmark.RIGHT_EYE6右眼(外侧)PoseLandmark.RIGHT_EYE_OUTER7左耳PoseLandmark.LEFT_EAR8右耳PoseLandmark.RIGHT_EAR9嘴巴(左侧)PoseLandmark.MOUTH_LEFT10嘴巴(右侧)PoseLandmark.MOUTH_RIGHT11左肩PoseLandmark.LEFT_SHOULDER12右肩PoseLandmark.RIGHT_SHOULDER13左肘PoseLandmark.LEFT_ELBOW14右肘PoseLandmark.RIGHT_ELBOW15左腕PoseLandmark.LEFT_WRIST16右腕PoseLandmark.RIGHT_WRIST17左小指PoseLandmark.LEFT_PINKY18右小指PoseLandmark.RIGHT_PINKY19左食指PoseLandmark.LEFT_INDEX20右食指PoseLandmark.RIGHT_INDEX21左拇指PoseLandmark.LEFT_THUMB22右拇指PoseLandmark.RIGHT_THUMB23左臀PoseLandmark.LEFT_HIP24右臀PoseLandmark.RIGHT_HIP25左膝PoseLandmark.LEFT_KNEE26右膝PoseLandmark.RIGHT_KNEE27左踝PoseLandmark.LEFT_ANKLE28右踝PoseLandmark.RIGHT_ANKLE29左脚跟PoseLandmark.LEFT_HEEL30右脚跟PoseLandmark.RIGHT_HEEL31左脚趾PoseLandmark.LEFT_FOOT_INDEX32右脚趾PoseLandmark.RIGHT_FOOT_INDEX1.2旧版solutionAPIMediapipe提供solutionAPI来实现快速检测,不过这种方式在2023年5月10日停止更新了,不过目前还可以使用,可通过mediapose.solutions.pose.Pose来实现,配置参数如下选项含义值范围默认值static_image_mode如果设置为False,会将输入图像视为视频流。它将尝试检测第一张图像中最突出的人,并在成功检测后进一步定位姿势。在随后的图像中,它只是跟踪这些标记,而不调用另一个检测,直到它失去跟踪,从而减少计算和延迟。如果设置为True,则人员检测将运行每个输入图像,非常适合处理一批静态(可能不相关的)图像BooleanFalsemodel_complexity模型的复杂度,准确性和推理延迟通常随着模型复杂性的增加而增加{0,1,2}1smooth_landmarks如果设置为True,则solution过滤器会在不同的输入图像中设置标记以减少抖动,但如果static_image_mode也设置为True,则忽略该筛选器BooleanTrueenable_segmentation如果设置为True,则除了姿态标记外,还会生成分割蒙版BooleanFalsesmooth_segmentation如果设置为True,则会过滤不同输入图像中的分割掩码,以减少抖动。如果enable_segmentation为false或static_image_mode为True,则忽略BooleanTruemin_detection_confidence人员检测模型的最小置信度值,用于将检测视为成功Float[0.0,1.0]0.5min_tracking_confidence来自姿态跟踪模型的最小置信度值,用于将姿态标记视为成功跟踪,否则将在下一个输入图像上自动调用人员检测。将其设置为更高的值可以提高解决方案的可靠性,但代价是延迟更高。如果static_image_mode为True,则忽略,其中人员检测仅对每个图像运行。Float[0.0,1.0]0.5importcv2importnumpyasnpimportmediapipeasmpdefmain():FILE_PATH='data/1.png'img=cv2.imread(FILE_PATH)mp_pose=mp.solutions.posepose=mp_pose.Pose(static_image_mode=True,min_detection_confidence=0.5,min_tracking_confidence=0.5)res=pose.process(img)img_copy=img.copy()ifres.pose_landmarksisnotNone:mp_drawing=mp.solutions.drawing_utils#mp_drawing.draw_landmarks(#img_copy,res.pose_landmarks,mp.solutions.pose.POSE_CONNECTIONS)mp_drawing.draw_landmarks(img_copy,res.pose_landmarks,mp_pose.POSE_CONNECTIONS,#frozenset,定义了哪些关键点要连接mp_drawing.DrawingSpec(color=(255,255,255),#姿态关键点thickness=2,circle_radius=2),mp_drawing.DrawingSpec(color=(174,139,45),#连线颜色thickness=2,circle_radius=2),)cv2.imshow('MediaPipePoseEstimation',img_copy)cv2.waitKey(0)if__name__=='__main__':main()123456789101112131415161718192021222324252627282930313233343536importcv2importnumpyasnpimportmediapipeasmpdefvideo():#读取摄像头#cap=cv2.VideoCapture(0)#读取视频cap=cv2.VideoCapture('data/1.mp4')mp_pose=mp.solutions.posepose=mp_pose.Pose(static_image_mode=False,min_detection_confidence=0.5,min_tracking_confidence=0.5)whilecap.isOpened():ret,frame=cap.read()ifnotret:break#摄像头#continue#将BGR图像转换为RGBrgb_frame=cv2.cvtColor(frame,cv2.COLOR_BGR2RGB)#进行姿势估计results=pose.process(rgb_frame)ifresults.pose_landmarksisnotNone:#绘制关键点和连接线mp_drawing=mp.solutions.drawing_utilsmp_drawing.draw_landmarks(frame,results.pose_landmarks,mp_pose.POSE_CONNECTIONS)#显示结果cv2.imshow('MediaPipePoseEstimation',frame)ifcv2.waitKey(1)&0xFF==ord('q'):break#释放资源cap.release()cv2.destroyAllWindows()if__name__=='__main__':video()12345678910111213141516171819202122232425262728293031323334353637383940414243441.3新版solutionAPI旧版API并不能检测多个姿态,新版API可以实现多个姿态检测选项含义值范围默认值running_mode设置任务的运行模式,有三种模式可选:IMAGE:单一照片输入.VIDEO:视频.LIVE_STREAM:输入数据(例如来自摄像机)为实时流。在此模式下,必须调用resultListener来设置侦听器以异步接收结果.{IMAGE,VIDEO,LIVE_STREAM}IMAGEnum_poses姿势检测器可以检测到的最大姿势数Integer>01min_pose_detection_confidence姿势检测被认为是成功的最小置信度得分Float[0.0,1.0]0.5min_pose_presence_confidence姿态检测中的姿态存在分数的最小置信度分数Float[0.0,1.0]0.5min_tracking_confidence姿势跟踪被视为成功的最小置信度分数Float[0.0,1.0]0.5output_segmentation_masks是否为检测到的姿势输出分割掩码BooleanFalseresult_callback将结果侦听器设置为在PoseLandmark处于LIVE_STREAM模式时异步接收Landmark结果。仅当运行模式设置为LIVE_STREAM时才能使用ResultListenerN/Afrommediapipeimportsolutionsfrommediapipe.framework.formatsimportlandmark_pb2importcv2importnumpyasnpimportmediapipeasmpmp_drawing=mp.solutions.drawing_utilsmp_pose=mp.solutions.posedefdraw_landmarks_on_image(rgb_image,detection_result):pose_landmarks_list=detection_result.pose_landmarksannotated_image=np.copy(rgb_image)#Loopthroughthedetectedposestovisualize.foridxinrange(len(pose_landmarks_list)):pose_landmarks=pose_landmarks_list[idx]#Drawtheposelandmarks.pose_landmarks_proto=landmark_pb2.NormalizedLandmarkList()pose_landmarks_proto.landmark.extend([landmark_pb2.NormalizedLandmark(x=landmark.x,y=landmark.y,z=landmark.z)forlandmarkinpose_landmarks])solutions.drawing_utils.draw_landmarks(annotated_image,pose_landmarks_proto,solutions.pose.POSE_CONNECTIONS,solutions.drawing_styles.get_default_pose_landmarks_style())returnannotated_imagedefnewSolution():BaseOptions=mp.tasks.BaseOptionsPoseLandmarker=mp.tasks.vision.PoseLandmarkerPoseLandmarkerOptions=mp.tasks.vision.PoseLandmarkerOptionsVisionRunningMode=mp.tasks.vision.RunningModemodel_path='data/pose_landmarker_heavy.task'options=PoseLandmarkerOptions(base_options=BaseOptions(model_asset_path=model_path),running_mode=VisionRunningMode.IMAGE,num_poses=10)FILE_PATH='data/4.jpg'image=cv2.imread(FILE_PATH)img=mp.Image.create_from_file(FILE_PATH)withPoseLandmarker.create_from_options(options)asdetector:res=detector.detect(img)image=draw_landmarks_on_image(image,res)cv2.imshow('MediaPipePoseEstimation',image)cv2.waitKey(0)if__name__=='__main__':newSolution()123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354551.4俯卧撑计数通过计算胳膊弯曲角度来判断状态,并计算俯卧撑个数importcv2importmediapipeasmpimportnumpyasnpmp_drawing=mp.solutions.drawing_utilsmp_pose=mp.solutions.posedefcalculate_angle(a,b,c):radians=np.arctan2(c.y-b.y,c.x-b.x)-\np.arctan2(a.y-b.y,a.x-b.x)angle=np.abs(np.degrees(radians))returnangleifangle160:status=Truereturncounter,statusdefmain():cap=cv2.VideoCapture('data/test.mp4')counter=0status=Falsewithmp_pose.Pose(min_detection_confidence=0.7,min_tracking_confidence=0.7)aspose:whilecap.isOpened():success,image=cap.read()ifnotsuccess:print("emptycamera")breakresult=pose.process(image)ifresult.pose_landmarks:mp_drawing.draw_landmarks(image,result.pose_landmarks,mp_pose.POSE_CONNECTIONS)counter,status=count_push_up(result.pose_landmarks.landmark,counter,status)cv2.putText(image,text=str(counter),org=(100,100),fontFace=cv2.FONT_HERSHEY_SIMPLEX,fontScale=4,color=(255,255,255),thickness=2,lineType=cv2.LINE_AA)cv2.imshow("push-upcounter",image)key=cv2.waitKey(1)ifkey==ord('q'):breakcap.release()if__name__=='__main__':main()123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566二、手部追踪2.1手部姿态2.2API使用照片选项含义值范围默认值static_image_mode如果设置为False,会将输入图像视为视频流。它将尝试在第一个输入图像中检测手,并在成功检测后进一步定位手部标志。在随后的图像中,一旦检测到所有max_num_hands手并定位了相应的手部标志,它就会简单地跟踪这些标志,而不会调用其他检测,直到它失去对任何手的跟踪。这减少了延迟,是处理视频帧的理想选择。如果设置为True,则对每个输入图像运行手动检测,非常适合处理一批静态(可能不相关的)图像BooleanFalsemax_num_hands要检测的最大手数Integer2model_complexity模型的复杂度,准确性和推理延迟通常随着模型复杂性的增加而增加{0,1}1min_detection_confidence检测模型的最小置信度值,用于将检测视为成功Float[0.0,1.0]0.5min_tracking_confidence来自手部跟踪模型的最小置信度值,用于将手部标记视为成功跟踪,否则将在下一个输入图像上自动调用检测。将其设置为更高的值可以提高解决方案的可靠性,但代价是延迟更高。如果static_image_mode为True,则忽略,其中手部检测仅对每个图像运行。Float[0.0,1.0]0.5importcv2importmediapipeasmpmp_hands=mp.solutions.handsdefmain():cv2.namedWindow("MediaPipeHand",cv2.WINDOW_NORMAL)hands=mp_hands.Hands(static_image_mode=False,max_num_hands=2,min_detection_confidence=0.5,min_tracking_confidence=0.5)img=cv2.imread('data/finger/1.jpg')rgb_frame=cv2.cvtColor(img,cv2.COLOR_BGR2RGB)#进行手部追踪results=hands.process(rgb_frame)ifresults.multi_hand_landmarks:#绘制手部关键点和连接线forhand_landmarksinresults.multi_hand_landmarks:mp_drawing=mp.solutions.drawing_utilsmp_drawing.draw_landmarks(img,hand_landmarks,mp_hands.HAND_CONNECTIONS)#显示结果cv2.imshow('MediaPipeHand',img)cv2.waitKey(0)if__name__=='__main__':main()123456789101112131415161718192021222324252627282930importcv2importmediapipeasmpmp_hands=mp.solutions.handsdefvideo():hands=mp_hands.Hands(static_image_mode=False,max_num_hands=2,min_detection_confidence=0.4,min_tracking_confidence=0.4)#读取视频cap=cv2.VideoCapture('data/hand.mp4')whilecap.isOpened():ret,frame=cap.read()ifnotret:break#将BGR图像转换为RGBrgb_frame=cv2.cvtColor(frame,cv2.COLOR_BGR2RGB)#进行手部追踪results=hands.process(rgb_frame)ifresults.multi_hand_landmarks:#绘制手部关键点和连接线forhand_landmarksinresults.multi_hand_landmarks:mp_drawing=mp.solutions.drawing_utilsmp_drawing.draw_landmarks(frame,hand_landmarks,mp_hands.HAND_CONNECTIONS)#显示结果cv2.imshow('MediaPipeHandTracking',frame)ifcv2.waitKey(1)&0xFF==ord('q'):break#释放资源cap.release()cv2.destroyAllWindows()if__name__=='__main__':video()12345678910111213141516171819202122232425262728293031323334353637383940414243442.3识别手势含义使用KNN对手势进行预测importmediapipeasmpimportnumpyasnpimportcv2frommediapipe.framework.formats.landmark_pb2importNormalizedLandmarkListfromsklearn.neighborsimportKNeighborsClassifiermp_drawing=mp.solutions.drawing_utilsmp_drawing_styles=mp.solutions.drawing_stylesmp_hands=mp.solutions.hands#压缩特征点classEmbedder(object):def__init__(self):self._landmark_names=mp.solutions.hands.HandLandmarkdef__call__(self,landmarks):#modifythecallfunccanbothhandlea3-dimdatasetandasinglereferencingresult.ifisinstance(landmarks,np.ndarray):iflandmarks.ndim==3:#fordatasetembeddings=[]forlmksinlandmarks:embedding=self.__call__(lmks)embeddings.append(embedding)returnnp.array(embeddings)eliflandmarks.ndim==2:#forinferenceassertlandmarks.shape[0]==len(list(self._landmark_names)),'Unexpectednumberoflandmarks:{}'.format(landmarks.shape[0])#Normalizelandmarks.landmarks=self._normalize_landmarks(landmarks)#Getembedding.embedding=self._get_embedding(landmarks)returnembeddingelse:print('ERROR:CanNOTembeddingthedatayouprovided!')else:ifisinstance(landmarks,list):#fordatasetembeddings=[]forlmksinlandmarks:embedding=self.__call__(lmks)embeddings.append(embedding)returnnp.array(embeddings)elifisinstance(landmarks,NormalizedLandmarkList):#forinference#Normalizelandmarks.landmarks=np.array([[lmk.x,lmk.y,lmk.z]forlmkinlandmarks.landmark],dtype=np.float32)assertlandmarks.shape[0]==len(list(self._landmark_names)),'Unexpectednumberoflandmarks:{}'.format(landmarks.shape[0])landmarks=self._normalize_landmarks(landmarks)#Getembedding.embedding=self._get_embedding(landmarks)returnembeddingelse:print('ERROR:CanNOTembeddingthedatayouprovided!')def_get_center(self,landmarks):#MIDDLE_FINGER_MCP:9returnlandmarks[9]def_get_size(self,landmarks):landmarks=landmarks[:,:2]max_dist=np.max(np.linalg.norm(landmarks-self._get_center(landmarks),axis=1))returnmax_dist*2def_normalize_landmarks(self,landmarks):landmarks=np.copy(landmarks)#Normalizecenter=self._get_center(landmarks)size=self._get_size(landmarks)landmarks=(landmarks-center)/sizelandmarks*=100#optional,butmakesdebuggingeasier.returnlandmarksdef_get_embedding(self,landmarks):#wecanaddanddeleteanyembeddingfeaturestest=np.array([np.dot((landmarks[2]-landmarks[0]),(landmarks[3]-landmarks[4])),#thumbbentnp.dot((landmarks[5]-landmarks[0]),(landmarks[6]-landmarks[7])),np.dot((landmarks[9]-landmarks[0]),(landmarks[10]-landmarks[11])),np.dot((landmarks[13]-landmarks[0]),(landmarks[14]-landmarks[15])),np.dot((landmarks[17]-landmarks[0]),(landmarks[18]-landmarks[19]))]).flatten()returntestdefinit_knn(file='data/dataset_embedded.npz'):npzfile=np.load(file)X=npzfile['X']y=npzfile['y']neigh=KNeighborsClassifier(n_neighbors=5)neigh.fit(X,y)returnneighdefhand_pose_recognition(stream_img):#Forstaticimages:stream_img=cv2.cvtColor(stream_img,cv2.COLOR_BGR2RGB)embedder=Embedder()neighbors=init_knn()withmp_hands.Hands(static_image_mode=True,max_num_hands=2,min_detection_confidence=0.5)ashands:results=hands.process(stream_img)ifnotresults.multi_hand_landmarks:return['no_gesture'],stream_imgelse:annotated_image=stream_img.copy()multi_landmarks=results.multi_hand_landmarks#KNNinferenceembeddings=embedder(multi_landmarks)hand_class=neighbors.predict(embeddings)#hand_class_prob=neighbors.predict_proba(embeddings)#print(hand_class_prob)forlandmarksinresults.multi_hand_landmarks:mp_drawing.draw_landmarks(annotated_image,landmarks,mp_hands.HAND_CONNECTIONS,mp_drawing_styles.get_default_hand_landmarks_style(),mp_drawing_styles.get_default_hand_connections_style())returnhand_class,annotated_image#手势有10种,数字有8种,1-10之间7和9没有,还有两种是OK手势,和蜘蛛侠spide手势#`eight_sign`,`five_sign`,`four_sign`,`ok`,`one_sign`,`six_sign`,`spider`,`ten_sign`,`three_sign`,`two_sign`defimage():FILE_PATH='data/ok.png'img=cv2.imread(FILE_PATH)handclass,img_final=hand_pose_recognition(img)cv2.putText(img_final,text=handclass[0],org=(200,50),fontFace=cv2.FONT_HERSHEY_SIMPLEX,fontScale=2,color=(255,255,255),thickness=2,lineType=cv2.LINE_AA)cv2.imshow('test',cv2.cvtColor(img_final,cv2.COLOR_RGB2BGR))cv2.waitKey(0)defvideo():cap=cv2.VideoCapture('data/ok.mp4')whilecap.isOpened():ret,frame=cap.read()ifnotret:breakhandclass,img_final=hand_pose_recognition(frame)cv2.putText(img_final,text=handclass[0],org=(50,50),fontFace=cv2.FONT_HERSHEY_SIMPLEX,fontScale=2,color=(255,0,0),thickness=2,lineType=cv2.LINE_AA)cv2.imshow('test',cv2.cvtColor(img_final,cv2.COLOR_RGB2BGR))ifcv2.waitKey(1)&0xFF==ord('q'):breakif__name__=='__main__':video()123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159参考https://developers.google.cn/mediapipe/solutions/https://github.com/googlesamples/mediapipehttps://github.com/Furkan-Gulsen/Sport-With-AIhttps://github.com/Chuanfang-Neptune/DLAV-G9
|
|