opencorpo- µµ±¸ ¼øÀ§ ¹× ¿ä¾à
- ƯÇã:
- MIT/X Consortium Lic...
- °Ô½ÃÀÚ À̸§:
- Mikhail Korobov
- °Ô½ÃÀÚ À¥»çÀÌÆ®:
- http://bitbucket.org/kmike/
opencorpo- µµ±¸ ű×
opencorpo- µµ±¸ ¼³¸í
OpenCora-Tools´Â http://opencorpora.org/installationpip ¼³Ä¡ opencorpo-toolsif¸¦ °¡Áø python ÀÎÅÍÆäÀ̽º¸¦ Á¦°øÇÏ´Â ¸ðµâÀÔ´Ï´Ù. python <27 °¡Áö°í ÀÖ½À´Ï´Ù argparse ¹× ordereddict ÆÐÅ°Áö°¡ ÇÊ¿äÇÕ´Ï´Ù pip ¼³Ä¡ argparsepip install ordereddictusageobtaining http¿¡¼ xml¿¡¼ ÀÛµ¿ÇÕ´Ï´Ù : opencorporaorgyou´Â xml ¼öµ¿À¸·Î ´Ù¿î·ÎµåÇÏ¿© ²¨³¾ ¼ö ÀÖ½À´Ï´Ù ( '´Ù¿î·Îµå'ÆäÀÌÁö¿¡¼) Á¦°øµÈ ¸í·É ÁÙ utrut : opencorpora downloadrun opencorpora download-help corgetainitialize :/>>> °¡Á® ¿À±â >>> copers = opencorpa.corpus = opencorpo.corpus = opencorpo.corpus = opencorps.corpus = coperpus.catalog () >>> doc_id, doc_title = Ä«Å»·Î±× >>> Àμâ doc_id1610 >>> doc_title24105 ¬¤¬Ö¬â¬Þ¬Ñ¬ß ¬¤¬â¬Ö¬æ ¬ã¬à¬Ó¬Ö¬ä¬å¬Ö¬ä ¬â¬à¬ã¬ã¬Ú¬ñ¬ß¬Ñ¬Þ¬ß¬Ö ¬ã¬å¬Ö¬ä¬Ú¬ä¬î¬ã¬ñ¬ã ¬Ó¬Ñ¬Ý¬ð¬ä¬à¬ÛWork ¹®¼¿Í >>> = ¹®¼ ÄÚÆÛ½º >>> Àμâ doc.title () 24,105 ¬¤¬Ö¬â¬Þ¬Ñ¬ß ¬¤¬â¬Ö¬æ ¬ã¬à¬Ó¬Ö¬ä¬å¬Ö¬ä ¬â¬à¬ã¬ã¬Ú¬ñ¬ß¬Ñ¬Þ¬ß¬Ö ¬ã¬å¬Ö¬ä¬Ú¬ä¬î¬ã¬ñ¬ã ¬Ó¬Ñ¬Ý¬ð¬ä¬à¬Û >> > Àμâ doc.words () ¬³¬Ò¬Ö¬â¬Ò¬Ñ¬ß¬Ü¬Ñ >>> doc.sents () : ¬¤¬Ö¬â¬Þ¬Ñ¬ß ¬¤¬â¬Ö¬æ ¬ã¬à¬Ó¬Ö¬ä¬å¬Ö¬ä ¬â¬à¬ã¬ã¬Ú¬ñ¬ß¬Ñ¬Þ¬ß¬Ö ¬ã¬å¬Ö¬ä¬Ú¬ä¬î¬ã¬ñ¬ã ¬Ó¬Ñ¬Ý¬ð¬ä¬à¬Û >>> Àμâ doc.paras () ¬¤¬Ö¬â¬Þ¬Ñ¬ß ¬¤¬â¬Ö¬æ ¬ã¬à¬Ó¬Ö¬ä¬å¬Ö¬ä ¬â¬à¬ã¬ã¬Ú¬ñ¬ß¬Ñ¬Þ¬ß¬Ö ¬ã¬å¬Ö¬ä¬Ú¬ä¬î¬ã¬ñ¬ã ¬Ó¬Ñ¬Ý¬ð¬ä¬à¬Û ¬±¬â¬Ö ¡¤ ¬Ú¬Õ¬Ö¬ß¬ä ¬³¬Ò¬Ö¬â¬Ò¬Ñ¬ß¬Ü¬Ñ ¬å¬Ó¬Ö¬â¬Ö¬ß, ¬é¬ä¬à ¬Ó ¬Ò¬Ý¬Ú¬Ø¬Ñ¬Û¬ê¬Ö¬Ö ¬Ó¬â¬Ö¬Þ¬ñ ¬ß¬Ñ ¬Ó¬Ñ¬Ý¬ð¬ä¬ß¬í¬ç ¬â¬í¬ß¬Ü¬Ñ¬ç ¬ã¬à¬ç¬â¬Ñ¬ß¬Ú¬ä¬ã¬ñ ¬£¬í¬ã¬à¬Ü¬Ñ¬ñ ¬Ó¬à¬Ý¬Ñ¬ä¬Ú¬Ý¬î¬ß¬à¬ã¬ä¬î ¬Ú¬º¬Ñ¬â¬Ñ¬ç¬Ñ¬ß¬î¬Ö.corpora, ¹®¼, ´Ü¶ô ¹× ¹®Àå ¼ö¾÷Àº ´ÙÀ½ ¹æ¹ýÀ» Áö¿øÇÕ´Ï´Ù (¿¹ : ¹®Àå¿¡´Â ´Ü¶ôÀÌ ¾ø½À´Ï´Ù) : - ´Ü¾î () - ´Ü¾î¿Í ´Ù¸¥ ÅäÅ« ¸ñ·ÏÀ» ¹ÝȯÇÕ´Ï´Ù. - sents () - ¹®Àå ÀνºÅϽº ¸ñ·ÏÀ» ¹ÝȯÇÕ´Ï´Ù. - Paras () - ´Ü¶ô ÀνºÅϽº ¸ñ·ÏÀ» ¹ÝȯÇÕ´Ï´Ù. - ¹®¼ () - ¹®¼ ÀνºÅϽº ¸ñ·ÏÀ» ¸®ÅÏÇÕ´Ï´Ù (ÀÌ°ÍÀº ¸Þ¸ð¸® µÅÁö!); - tagged_words () - (str, str); - tagged_sents ()ÀÇ ¸ñ·ÏÀ» ¸®ÅÏÇÕ´Ï´Ù. - (str, str) ¸ñ·ÏÀÇ ¸ñ·ÏÀ» ¹ÝȯÇÕ´Ï´Ù. tagged_paras () - (str, str), itersentents (), iterparass (), iterdocuments (), iter_tagged_words, iter_tagged_paras - iter_tag_sents, iter_tag_sents, iter_tag_words, iter_tag_sents, iter_tagged_paras - ´Ü¾î¸¦ ¹ÝȯÇÕ´Ï´Ù. ¹®Àå, ´Ü¶ô ¶Ç´Â ¹®¼; ȸ»ç, ¹®¼, ´Ü¶ô ¹× ¹®ÀåÀ» ¹Ýº¹ ÇÒ ¼öµµ ÀÖ½À´Ï´Ù (ÀÌ°ÍÀº ¹®¼, ´Ü¶ô, ¹®Àå ¹× ´Ü¾î¸¦ »êÃâÇÕ´Ï´Ù), ¿¹ : >>> ¼Û½Å = doc.sents () >>> " Àü¼Û µÈ : ... Àμ⠴ܾî ... ¬¤¬Ö¬â¬Þ¬Ñ¬ß¬Ô¬â¬Ö¬æ¬ã¬à¬Ó¬Ö¬ä¬å¬Ö¬ä¬â¬à¬ã¬Ú¬ñ¬ß¬Ñ¬Þ¬ß¬Ö¬ã¬å¬Ö¬ä¬Ú¬ä¬î¬ã¬ñ¬ã¬Ó¬Ñ¬Ý¬ð¬ä¬à¬Ûthe api´Â nltkÀÇ corpusreader api.it ±×°ÍÀÌ Á¤È®È÷ µ¿ÀÏÇÏÁö´Â ¾ÊÁö¸¸ ¸Å¿ì À¯»çÇÕ´Ï´Ù. ¿¹¸¦ µé¾î, opencorpo-toolsÀÇ sents () nltkÀÇ sentence ÀνºÅϽº ¸ñ·ÏÀ» ¹ÝȯÇÏ°í nltkÀÇ sents ()´Â ¹®ÀÚ¿ ¸ñ·Ï ¸ñ·ÏÀ» ¸®ÅÏÇÏÁö¸¸, ¹®Àå ÀνºÅϽº´Â ¼ýÀÚ ¸ñ·Ï (À妽Ì, ¹Ýº¹ µî)À̹ǷΠopencorpoaÀÔ´Ï´Ù. Corpora API´Â NLTK CorpusReader API.ProductÀÇ È¨ÆäÀÌÁöÀÇ ¼öÆÛ ¼¼Æ®·Î º¼ ¼ö ÀÖ½À´Ï´Ù.
opencorpo- µµ±¸ °ü·Ã ¼ÒÇÁÆ®¿þ¾î