New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Text classification : Target 4294967295 is out of bounds (CPU) #2369
Comments
It looks like we've gotten two of these - #2368. @LittleLittleCloud any ideas about possible root cause? |
We seem to be hitting max int. @AlbelTec Can you give more details about your dataset? What is the size? |
Hi @beccamc Sorry for the delay I was off. The dataset is very simple based on financial emails (text & label) for multiclassification purpose. The dataset contains 2588 texts, labels. each text could contain more than 2000 characters. |
@JakeRadMSFT Thoughts? This doesn't sound like a very large dataset causing the problem. |
@beccamc I did a small test where I tried once again but with very small dataset (16 rows) and I'm not getting the error. I'm wondering if long texts (more than 2000 characters) could raise the issue. |
@AlbelTec Are you able to share your dataset? |
@beccamc Unfortunately not as it contains sensitive data. Actually, texts are emails and aren't cleaned up for purpose (signatures are included as well as all replies / forwards). |
@beccamc Same error. I can share mine :) |
@v-Hailishi Can you try to repro with Soarc's dataset? |
@beccamc By using Soarc's dataset test-data.csv, I can repro this issue on the latest main build 16.14.1.2262701 |
@LittleLittleCloud Can you take a look at this? |
4294967295 = 2^32 -1, so it should be caused by a wrong type casting? After examining the code base, this place looks really suspicious. when @michaelgsharp Would you take a closer look at this issue, especially verify if UpdateThe root cause is MapValueToKey will produce a key whose value is 0 when the value is 'NaN' or not exist in term map. And currently TextClassification will be break on all dataset which contains a 0-value key as label. I created an issue in ml.net repo, in the meanwhile, a temp fix in model builder can be filter out rows where label is nan/empty. |
Scale up this issue to Priority:0 as this issue might affect all text-classification scenarios. |
Can we get a repro with just ML.NET? |
Closing this issue since it should be resolved in the framework. Tracking issue dotnet/machinelearning#6534 |
@luisquintanilla - Im confused by the comment stating that this "should be resolved in the framework". I am encountering this issue today. Can you clarify the fix? Thank you. |
@scottyboiler Model Builder is tooling for the ML.NET Framework (Microsoft.ML set of NuGet packages). The issue is at the framework level, not tooling. Therefore, fixing it there would also fix it for Model Builder. I hope that clarifies it. |
@luisquintanilla clarifying ML.NET fw vs. .NET fw is a helpful distinction. Thanks for the quick response. |
to be linked to : #2369 (comment)
System Information (please complete the following information):
Describe the bug
To Reproduce
Steps to reproduce the behavior:
Expected behavior
A clear and concise description of what you expected to happen and what is causing this error
Screenshots
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: