| ÀÚ¿¬ ¾ð¾î ÅøŶ Natural Language ToolkitÀº Python ¶óÀ̺귯¸® ¹× ±âÈ£ ¹× Åë°èÀû ÀÚ¿¬ ¾ð¾î 󸮸¦À§ÇÑ ÇÁ·Î±×·¥ ½ºÀ§Æ®ÀÔ´Ï´Ù. |
Áö±Ý ´Ù¿î·Îµå |
ÀÚ¿¬ ¾ð¾î ÅøŶ ¼øÀ§ ¹× ¿ä¾à
ÀÚ¿¬ ¾ð¾î ÅøŶ ű×
ÀÚ¿¬ ¾ð¾î ÅøŶ ¼³¸í
Natural Language ToolkitÀº Python ¶óÀ̺귯¸® ¹× »ó¡Àû ¹× Åë°èÀû ÀÚ¿¬ ¾ð¾î 󸮸¦À§ÇÑ ÇÁ·Î±×·¥ ½ºÀ§Æ®ÀÔ´Ï´Ù. Natural Language ToolkitÀº Python ¶óÀ̺귯¸® ¹× »ó¡Àû ¹× Åë°èÀû ÀÚ¿¬ ¾ð¾î 󸮸¦À§ÇÑ ÇÁ·Î±×·¥ ½ºÀ§Æ®ÀÔ´Ï´Ù. NLTK¿¡´Â ±×·¡ÇÈ µ¥¸ð ¹× »ùÇà Data.It°¡ Æ÷ÇԵǾî ÀÖ½À´Ï´Ù. Toolkit.documentation¿¡¼ Áö¿øÇÏ´Â ¾ð¾î ó¸® ÀÛ¾÷ µÚ¿¡ÀÖ´Â ±âº» °³³äÀ» ¼³¸íÇÏ´Â ÀÚ½À¼¸¦ Æ÷ÇÔÇÏ¿© ±¤¹üÀ§ÇÑ ¹®¼°¡ ¼ö¹ÝµË´Ï´Ù. NLTK Ȩ¿¡¼ NLTK¸¦ »ç¿ëÇÏ´Â ¹æ¹ý¿¡ ´ëÇÑ »ó´çÇÑ ¾çÀÇ ¹®¼¸¦ »ç¿ëÇÒ ¼ö ÀÖ½À´Ï´Ù. ÆäÀÌÁö : ƯÈ÷ NLTK Ȩ ÆäÀÌÁö¿¡´Â ¼¼ °¡Áö À¯ÇüÀÇ ¼³¸í¼°¡ Æ÷ÇԵǾî ÀÖ½À´Ï´Ù. ¡¤ ÀÚ½À¼´Â ƯÁ¤ ÀÛ¾÷À» ¼öÇàÇÏ´Â ÄÁÅؽºÆ®¿¡¼ ÅøŶÀ» »ç¿ëÇÏ´Â ¹æ¹ýÀ» Çлýµé¿¡°Ô °¡¸£Ä¨´Ï´Ù. ÅøŶÀ» »ç¿ëÇÏ´Â ¹æ¹ýÀ» ¹è¿ì°íÀÚÇÏ´Â »ç¶÷¿¡°Ô ÀûÇÕÇÕ´Ï´Ù. ¡¤ ToolkitÀÇ ÂüÁ¶ ¹®¼´Â ÅøŶÀÇ ¸ðµç ¸ðµâ, ÀÎÅÍÆäÀ̽º, Ŭ·¡½º, ¸Þ¼µå, ÇÔ¼ö ¹× º¯¼ö¸¦ ¼³¸íÇÕ´Ï´Ù. ÀÌ ¹®¼´Â »ç¿ëÀÚ¿Í °³¹ßÀÚ ¸ðµÎ¿¡°Ô À¯¿ëÇؾßÇÕ´Ï´Ù. ¡¤ ¸¹Àº ±â¼ú º¸°í¼¸¦ »ç¿ëÇÒ ¼ö ÀÖ½À´Ï´Ù. ÀÌ º¸°í¼´Â ÅøŶÀÇ ¼³°è ¹× ±¸ÇöÀ» ¼³¸íÇÏ°í Á¤´çÈÇÕ´Ï´Ù. ÅøŶÀÇ °Ç¼³À» ¾È³»ÇÏ°í ¹®¼ÈÇϱâ À§ÇØ ÅøŶ °³¹ßÀÚ°¡ »ç¿ëÇÕ´Ï´Ù. ÇлýµéÀº ÅøŶÀÌ µðÀÚÀÎ µÈ ¹æ½Ä°ú ±× ÀÌÀ¯°¡ ¹«¾ùÀÎÁö¿¡ ´ëÇÑ ÀÚ¼¼ÇÑ Á¤º¸¸¦ ¿øÇÒ °æ¿ì ÇлýµéÀº ÀÌ·¯ÇÑ º¸°í¼¸¦ ÂüÁ¶ ÇÒ ¼ö ÀÖ½À´Ï´Ù. ÀÌ ¸±¸®½ºÀÇ »õ·Î¿î ±â´É : NLTK : - ù ¹ø° ÁÖ¹® ³í¸®, ¼±Çü ·ÎÁ÷, Á¢ÂøÁ¦ ÀǹÌ, DRT, LFG (Dan Garrette)¿¡ ´ëÇÑ È®Àå µÈ Semantics ÆÐÅ°Áö - WordNetÀÇ »õ·Î¿î Wordsense Ŭ·¡½º . °¨Áö Å°¿¡¼ Synsets¿¡ ´ëÇÑ ¾×¼¼½º¸¦ Áö¿øÇÏ°í ¼¾½º Ä«¿îÆ® (Joel Nothman)¿¡ ¾×¼¼½º Áö¿ø (Joel Nothman) - NLTK.TAG.CRF (Sense Counts) - MISC ¹ö±× ¼öÁ¤, SYNSET, MAXENT - CUNKERS¿¡ ´ëÇÑ MAXENT °³¼± µÈ Áö¿ø À¯¿¬ÇÑ Ã»Å© ÄÚÆÛ½º Æ÷ÇÔ Reader, New Rule Type : ChunkRulewithContext- Pos-Tag-Tag Contordancing NLTK.Draw.POS_Concordance - RegexP chunkers °³¹ßÀ»À§ÇÑ »õ·Î¿î GUI NLTK.Draw.RechUnkParser - Conll.py¿¡¼ ConllChunkCorpusReader¿¡ ConllChunkCorpusReader¸¦ Ãß°¡ÇÏ´Â Bio_Sents () ¹× Bio_Words () ¸Þ¼µå°¡ Ãß°¡µÇ¾ú½À´Ï´Ù. µ¶¼ (Word, Tag, Chunk_Typ) Conll-2000 CorpusÀÇ Æ©ÇÃ. ¶ÇÇÑ ÀÌ·¯ÇÑ º¯°æ »çÇ×À» Áö¿øÇϱâ À§ÇØ ConllChunkCorpusView¸¦ ¼öÁ¤Çß½À´Ï´Ù. - ºÎ¸ð Æ÷ÀÎÅÍ (nltk.tree.parentedtree ¹× nltk.tree.multiparentedtree)¸¦ ÀÚµ¿À¸·Î À¯ÁöÇÏ´Â ºÎ¹«Àû ÀÎ ³ª¹« (Jussi Salmela, Paul Bone) - °ÔÀ¸¸¥ ½ÃÄö½º¿¡ ´ëÇÑ Áö¿øÀÌ Çâ»óµÇ¾ú½À´Ï´Ù. ºê·¡Å¶ÀÌÀÖ´Â ¹®ÀÚ¿À» Æ®¸®¿¡ º¯È¯ÇϱâÀ§ÇÑ À¯¿¬ÇÑ Æļ - DocStrings to DocStrings to DocStrings (ÁøÇàÁßÀÎ ÀÛ¾÷) - »õ·Î¿î NLG ÆÐÅ°Áö, FUF / ¼Áö (PETRO Verkhogliad) - »õ·Î¿î Á¾¼Ó Æļ ÆÐÅ°Áö (Jason Narad) - »õ·Î¿î Coreference ÆÐÅ°Áö ACE-2, MUC-6 ¹× MUC-7 CONFORA (Joseph Frazee) - CCG Æļ (Graeme Gange) - ÃÖÃÊ ÁÖ¹® ÇØ»óµµ ÀÌ·ÐÀû ÀÎ Prover (Dan Garrette) µ¥ÀÌÅÍ : - NNW NPS äÆà ÄÚÆÛ½º A ND Corpus Reader (nltk.corpus.nps_chat) - ConllCorpusReader´Â ÀÌÁ¦ Conll 2004 ¹× 2005 Corpora¸¦ Àд µ¥ »ç¿ëÇÒ ¼ö ÀÖ½À´Ï´Ù .- API.pyÀÇ NLTK_CONTRIB.COREF ¿ë HMM ±â¹Ý TreeBank POS Tagger ¹× Pharase Chunker ±¸Çö. ÀÌ·¯ÇÑ °´Ã¼ÀÇ Çǵå¹é ¹öÀüÀº µ¥ÀÌÅÍ / ÅÂ±× »ç¿ëÀÚ ¹× µ¥ÀÌÅÍ / chunkers.Book¿¡¼ È®Àε˴ϴÙ. -ÀÌ ¸±¸®½º¿¡¼´Â »õ·Î¿î ±â´ÉÀÇ Çǵå¹é¿¡ ´ëÇÑ ÀÀ´äÀ¸·Î ±âŸ ¼öÁ¤ »çÇ× : ¡¤ÀÌ ¹öÀüÀº NLTKÀÇ API¸¦ 2.0 ¸±¸®½º ¹× NLTK ºÏÀÇ ÃâÆǺ¸´Ù ¾Õ¼ ¸¶¹«¸®ÇÕ´Ï´Ù. ¼ö½Ê °³ÀÇ »ç¼ÒÇÑ °³¼± »çÇ×°ú ¹ö±× ¼öÁ¤ÀÌÀÖ¾ú½À´Ï´Ù. NLTK.FOO.BAR ¾ç½ÄÀÇ ¸¹Àº À̸§À» NLTK.BAR·Î »ç¿ëÇÒ ¼ö ÀÖ½À´Ï´Ù. ÀÇ»ç °áÁ¤ Æ®¸®, ¹è¿ ¹× µµ±¸ »óÀÚ ¸ðµâ¿¡ È®Àå µÈ ±â´ÉÀÌ ÀÖ½À´Ï´Ù. »õ·Î¿î ¹ø¿ª Àå³°¨ nltk.misc.babelfish°¡ Ãß°¡µÇ¾ú½À´Ï´Ù. »õ·Î¿î ¸ðµâ NLTK.Help ÅÂ±× ¼¼Æ® ¼³¸í¼¿¡ ¾×¼¼½º ÇÒ ¼ö ÀÖ½À´Ï´Ù. NLTK°¡ TKInter¾øÀÌ ºôµåÇÏ°í ¼³Ä¡ÇÒ ¼ö ÀÖµµ·Ï ¼öÁý µÈ °¡Á® ¿À±â°¡ ¹ß»ýÇÕ´Ï´Ù (¼¹ö ½ÇÇà). »õ µ¥ÀÌÅÍ¿¡´Â ÃÖ´ë ¿£Æ®·ÎÇÇ chunker ¸ðµ¨ ¹× ¾÷µ¥ÀÌÆ® µÈ ¹®¹ýÀÌ Æ÷ÇԵ˴ϴÙ. NLTK ContribÀº Coreference Package (Joseph Frazee) ¹× ISRI ¾Æ¶ø¾î ½ºÅ×¸Ó (HOSAM ALGASAIER)¿¡ ´ëÇÑ ¾÷µ¥ÀÌÆ®°¡ Æ÷ÇԵ˴ϴÙ. ÀÌ Ã¥Àº ÃÖÁ¾ ÃâÆǺ¸´Ù ½ÇÁúÀûÀÎ »ç¼³ ¼öÁ¤À» °ÅÃƽÀ´Ï´Ù.
ÀÚ¿¬ ¾ð¾î ÅøŶ °ü·Ã ¼ÒÇÁÆ®¿þ¾î